Commits · a780ffaac29e9d38db75ba9ba7f74617a2e59ba4 · Roger Ferrer / llvm-epi

Mar 23, 2017
- [AMDGPU] Emit kernel debug properties as code object metadata · a780ffaa
  Konstantin Zhuravlyov authored Mar 22, 2017
```
Differential Revision: https://reviews.llvm.org/D30969

llvm-svn: 298558
```
  a780ffaa
Mar 22, 2017

[AMDGPU] Emit kernel code properties as code object metadata · ca0e7f64

Konstantin Zhuravlyov authored Mar 22, 2017

  - These are not required for low level runtime

Differential Revision: https://reviews.llvm.org/D29949

llvm-svn: 298556

ca0e7f64

[AMDGPU] Restructure code object metadata creation · 7498cd61

Konstantin Zhuravlyov authored Mar 22, 2017

  - Rename runtime metadata -> code object metadata
  - Make metadata not flow
  - Switch enums to use ScalarEnumerationTraits
  - Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc
  - Introduce in-memory representation for attributes
  - Code object metadata streamer
  - Create metadata for isa and printf during EmitStartOfAsmFile
  - Create metadata for kernel during EmitFunctionBodyStart
  - Finalize and emit metadata to .note during EmitEndOfAsmFile
  - Other minor improvements/bug fixes

Differential Revision: https://reviews.llvm.org/D29948

llvm-svn: 298552

7498cd61

Mar 21, 2017

AMDGPU: Rename SI_RETURN · 5b20fbb7

Matt Arsenault authored Mar 21, 2017

This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.

llvm-svn: 298452

5b20fbb7

SplitKit: Fix subreg copy related problems · 8445cbd1

Matthias Braun authored Mar 21, 2017

Fix two problems related to r298025:
- SplitKit would create duplicate VNIs in some cases leading to crashs
  when hoisting copies.
- VirtRegMap could fail expanding copies at the beginning of a basic
  block.

This fixes http://llvm.org/PR32353

llvm-svn: 298448

8445cbd1

AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel · 3dbeefa9

Matt Arsenault authored Mar 21, 2017

Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444

3dbeefa9

Let llvm.objectsize be conservative with null pointers · 56c7e88c

George Burgess IV authored Mar 21, 2017

This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.

This fixes PR23277.

Differential Revision: https://reviews.llvm.org/D28494

llvm-svn: 298430

56c7e88c

AMDGPU: Buffer descriptor changes for GFX9 · 5c7a61d2

Marek Olsak authored Mar 21, 2017

Reviewers: arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr

Differential Revision: https://reviews.llvm.org/D31158

llvm-svn: 298397

5c7a61d2

AMDGPU: Always use VGPR indexing on GFX9 · e22fdb9c

Marek Olsak authored Mar 21, 2017

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr

Differential Revision: https://reviews.llvm.org/D31157

llvm-svn: 298396

e22fdb9c

AMDGPU: Fix asserting on 0 dmask for image intrinsics · f8fb605a
Matt Arsenault authored Mar 21, 2017
```
Fold these to undef during lowering so users get eliminated.

llvm-svn: 298387
```
f8fb605a
AMDGPU: Convert image intrinsic uses in tests · 964a8485
Matt Arsenault authored Mar 21, 2017
```
llvm-svn: 298386
```
964a8485
DAG: Fold bitcast/extract_vector_elt of undef to undef · dce313c3
Matt Arsenault authored Mar 21, 2017
```
Fixes not eliminating store when intrinsic is lowered to undef.

llvm-svn: 298385
```
dce313c3
[AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler · fd4c410f
Valery Pykhtin authored Mar 21, 2017
```
Differential revision: https://reviews.llvm.org/D31046

llvm-svn: 298368
```
fd4c410f

[ADMGPU] SDWA peephole optimization pass. · f60ad58d

Sam Kolton authored Mar 21, 2017

Summary:
First iteration of SDWA peephole.

This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
'''
V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
'''
Into:
'''
V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
'''

Pass structure:
1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
3. Iterate over all potential instructions and check if they can be converted into SDWA.
4. Convert instructions to SDWA.

This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
There are several ways this pass can be improved:
1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
2. Introduce more SDWA patterns
3. Introduce mnemonics to limit when SDWA patterns should apply

Reviewers: vpykhtin, alex-t, arsenm, rampitec

Subscribers: wdng, nhaehnle, mgorny

Differential Revision: https://reviews.llvm.org/D30038

llvm-svn: 298365

f60ad58d

Mar 20, 2017
- [AMDGPU] Run always inliner early in opt · 2534bc07
  Konstantin Zhuravlyov authored Mar 20, 2017
```
Differential Revision: https://reviews.llvm.org/D31141

llvm-svn: 298281
```
  2534bc07
- Revert "[AMDGPU] Run always inliner early in opt" · 8a67eb14
  Konstantin Zhuravlyov authored Mar 20, 2017
```
This reverts commit r297958, it breaks device-libs build.

llvm-svn: 298239
```
  8a67eb14
Mar 19, 2017

[GlobalISel] Don't select trivially dead instructions. · 931904d7

Ahmed Bougacha authored Mar 19, 2017

Folding instructions when selecting can cause them to become dead.
Don't select these dead instructions (if they don't have other side
effects, and don't define physical registers).

Preserve existing tests by adding COPYs.

In some tests, the G_CONSTANT vregs never get constrained to a class:
the only use of the vreg was folded into another instruction, so the
G_CONSTANT, now dead, never gets selected.

llvm-svn: 298224

931904d7

Mar 18, 2017

[AMDGPU] Add address space based alias analysis pass · 8e45acfc

Stanislav Mekhanoshin authored Mar 17, 2017

This is direct port of HSAILAliasAnalysis pass, just cleaned for
style and renamed.

Differential Revision: https://reviews.llvm.org/D31103

llvm-svn: 298172

8e45acfc

Mar 17, 2017

AMDGPU: Fix handling of constant phi input loop conditions · e70d5dcf

Matt Arsenault authored Mar 17, 2017

If the loop condition was an i1 phi with a constantexpr input, this
would add a loop intrinsic fed by a phi dependent on a call to
if.break in the same block. Insert the call in the loop header.

llvm-svn: 298121

e70d5dcf

SplitKit: Correctly implement partial subregister copies · f0b68d3f

Matthias Braun authored Mar 17, 2017

- This fixes a bug where subregister incompatible with the vregs register
  class where used.
- Implement the case where multiple copies are necessary to cover a
  given lanemask.

Differential Revision: https://reviews.llvm.org/D30438

llvm-svn: 298025

f0b68d3f

Mar 16, 2017

[AMDGPU] Run always inliner early in opt · f8050797

Stanislav Mekhanoshin authored Mar 16, 2017

We can mark functions to always inline early in the opt. Since we do not have
call support this early inlining creates opportunities for inter-procedural
optimizations which would not occur otherwise.

Differential Revision: https://reviews.llvm.org/D31016

llvm-svn: 297958

f8050797

AMDGPU: Allow sinking of addressing modes for atomic_inc/dec · 7dc01c96
Matt Arsenault authored Mar 15, 2017
```
llvm-svn: 297913
```
7dc01c96

Mar 15, 2017

CodeGenPrepare: Sink addressing modes for atomics · 02d915be
Matt Arsenault authored Mar 15, 2017
```
llvm-svn: 297903
```
02d915be

AMDGPU: Fix unnecessary ands when packing f16 vectors · 86e02ce2

Matt Arsenault authored Mar 15, 2017

computeKnownBits didn't handle fp_to_fp16 to report
the high bits as 0. ARM maps the generic node to an instruction
that does not modify the high bits of the register, so introduce
a target node where the high bits are known 0.

llvm-svn: 297873

86e02ce2

Mar 14, 2017

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. · 54e22f33

Nirav Dave authored Mar 14, 2017

    Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 297695

54e22f33

Mar 13, 2017
- AMDGPU: Treat 0 as private null pointer in addrspacecast lowering · 971c85eb
  Matt Arsenault authored Mar 13, 2017
```
llvm-svn: 297658
```
  971c85eb
Mar 11, 2017

AMDGPU: Remove packf16 intrinsic · dd905b0e
Matt Arsenault authored Mar 11, 2017
```
llvm-svn: 297557
```
dd905b0e

AMDGPU: Keep track of modifiers when converting v_mac to v_mad · 3cb9ff88

Matt Arsenault authored Mar 11, 2017

Since v_max_f32_e64/v_max_f16_e64 can be folded if the target
instruction supports the clamp bit, we also need to maintain
modifiers when converting v_mac to v_mad.

This fixes a rendering issue with Dirt Rally because a v_mac
instruction with the clamp bit set was converted to a v_mad
but that bit was lost during the conversion.

Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit")

Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>

llvm-svn: 297556

3cb9ff88

[AMDGPU] Remove getBidirectionalReasonRank · 79da2a76

Stanislav Mekhanoshin authored Mar 11, 2017

This method inverts the Reason field of a scheduling candidate.
It does right comparison between RegCritical and RegExcess, but
everything else is broken. In fact it can prefer less strong reason
such as Weak over RegCritical because Weak > -RegCritical.

The CandReason enum is properly sorted, so just remove artificial
ranking.

Differential Revision: https://reviews.llvm.org/D30557

llvm-svn: 297536

79da2a76

Mar 09, 2017
- DAG: Check no signed zeros instead of unsafe math attribute · 9a3fd875
  Matt Arsenault authored Mar 09, 2017
```
llvm-svn: 297354
```
  9a3fd875
Mar 08, 2017

AMDGPU: Don't wait at end of block with a trivial successor · 52d1b62a

Matt Arsenault authored Mar 08, 2017

If there is only one successor, and that successor only
has one predecessor the wait can obviously be delayed until
uses or the end of the next block. This avoids code quality
regressions when there are trivial fallthrough blocks inserted
for structurization.

llvm-svn: 297251

52d1b62a

AMDGPU: Constant fold rcp node · d8ed207a

Matt Arsenault authored Mar 08, 2017

When doing arcp optimization with a constant denominator,
this was leaving behind rcps with constant inputs.

llvm-svn: 297248

d8ed207a

AMDGPU/SI: Do not insert EndCf in an unreachable block · 6b49fa4c
Changpeng Fang authored Mar 07, 2017
```
Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D22025

llvm-svn: 297243
```
6b49fa4c

Mar 07, 2017

Revert "AMDGPU: Set MCAsmInfo::PointerSize" · e8aaab8a

Konstantin Zhuravlyov authored Mar 07, 2017

It breaks line tables because the patch is not complete, working on a complete one at the moment

This reverts commit r294031.

llvm-svn: 297118

e8aaab8a

Mar 06, 2017

AMDGPU/R600: Fix ALU clause markers use detection · 3ea17044

Jan Vesely authored Mar 06, 2017

also exit early on kill instead of redefinition.

Differential Revision: https://reviews.llvm.org/D30230

llvm-svn: 297060

3ea17044

Mar 03, 2017

[SDAG] Revert r296476 (and r296486, r296668, r296690). · ce52b807

Chandler Carruth authored Mar 03, 2017

This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

llvm-svn: 296862

ce52b807

Mar 02, 2017

Revert "AMDGPU: Re-do update for branch-relaxation test" · 02d86b80

Tobias Grosser authored Mar 02, 2017

This commit also relied on r296812, which I just reverted. We should probably
apply it again, after the r296812 has been discussed and been reapplied in some
variant.

llvm-svn: 296820

02d86b80

LiveRegMatrix: Fix some subreg interference checks · dbcf9e2e

Matthias Braun authored Mar 02, 2017

Surprisingly, one of the three interference checks in LiveRegMatrix was
using the main live range instead of the apropriate subregister range
resulting in unnecessarily conservative results.

llvm-svn: 296722

dbcf9e2e

Mar 01, 2017

[DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine · e1b2d314

Artur Pilipenko authored Mar 01, 2017

Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336).

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 296651

e1b2d314

AMDGPU: Re-do update for branch-relaxation test · 103af900

Matt Arsenault authored Mar 01, 2017

Modify the test so that it is still testing something
closer to what it was intended to originally.

I think the original intent was to test the situation where
there was a branch on execz and then unconditional branch
required relaxing.With the change in r296539,
there was no longer and execz branch.

Change the test so that there is now an execz branch inserted.
There is no longer an unconditional branch after the execz branch,
so this might need to be tricked in some other way to keep that
there.

llvm-svn: 296574

103af900