Commits · 89653dfd2afa0708ad3f093371aa27b57a366d5c · Lorenzo Albano / LLVM bpEVL

Mar 30, 2017

[AMDGPU] Add GlobalOpt parameter to Always Inliner pass · 89653dfd

Stanislav Mekhanoshin authored Mar 30, 2017

If set to false it does not remove global aliases. With this parameter
set to false it should be safe to run the pass before link.

Differential Revision: https://reviews.llvm.org/D31489

llvm-svn: 299108

89653dfd

Mar 27, 2017

[AMDGPU] Get address space mapping by target triple environment · 1a14bfa0

Yaxun Liu authored Mar 27, 2017

As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.

The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.

Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.

Differential Revision: https://reviews.llvm.org/D31284

llvm-svn: 298846

1a14bfa0

Mar 24, 2017

AMDGPU: Unify divergent function exits. · b8f8dbc2

Matt Arsenault authored Mar 24, 2017

StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.

llvm-svn: 298729

b8f8dbc2

Mar 21, 2017

[ADMGPU] SDWA peephole optimization pass. · f60ad58d

Sam Kolton authored Mar 21, 2017

Summary:
First iteration of SDWA peephole.

This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
'''
V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
'''
Into:
'''
V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
'''

Pass structure:
1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
3. Iterate over all potential instructions and check if they can be converted into SDWA.
4. Convert instructions to SDWA.

This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
There are several ways this pass can be improved:
1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
2. Introduce more SDWA patterns
3. Introduce mnemonics to limit when SDWA patterns should apply

Reviewers: vpykhtin, alex-t, arsenm, rampitec

Subscribers: wdng, nhaehnle, mgorny

Differential Revision: https://reviews.llvm.org/D30038

llvm-svn: 298365

f60ad58d

Mar 18, 2017

[AMDGPU] Add address space based alias analysis pass · 8e45acfc

Stanislav Mekhanoshin authored Mar 17, 2017

This is direct port of HSAILAliasAnalysis pass, just cleaned for
style and renamed.

Differential Revision: https://reviews.llvm.org/D31103

llvm-svn: 298172

8e45acfc

Feb 18, 2017
- AMDGPU: Merge initial gfx9 support · e823d92f
  Matt Arsenault authored Feb 18, 2017
```
llvm-svn: 295554
```
  e823d92f
Feb 09, 2017
- AMDGPU: Add pass to expand memcpy/memmove/memset · 0699ef39
  Matt Arsenault authored Feb 09, 2017
```
llvm-svn: 294635
```
  0699ef39
Jan 27, 2017

[AMDGPU] Turn AMDGPUUnifyMetadata back into module pass · f6c1feb8

Stanislav Mekhanoshin authored Jan 27, 2017

With the adjustPassManager interface that is now possible to use
custom early module passes.

Differential Revision: https://reviews.llvm.org/D29189

llvm-svn: 293300

f6c1feb8

Jan 24, 2017

[AMDGPU] Add VGPR copies post regalloc fix pass · 22a56f2f

Stanislav Mekhanoshin authored Jan 24, 2017

Regalloc creates COPY instructions which do not formally use VALU.
That results in v_mov instructions displaced after exec mask modification.
One pass which do it is SIOptimizeExecMasking, but potentially it can be
done by other passes too.

This patch adds a pass immediately after regalloc to add implicit exec
use operand to all VGPR copy instructions.

Differential Revision: https://reviews.llvm.org/D28874

llvm-svn: 292956

22a56f2f

Dec 08, 2016

[AMDGPU] Add amdgpu-unify-metadata pass · 50ea93a2

Stanislav Mekhanoshin authored Dec 08, 2016

Multiple metadata values for records such as opencl.ocl.version, llvm.ident
and similar are created after linking several modules. For some of them, notably
opencl.ocl.version, this creates semantic problem because we cannot tell which
version of OpenCL the composite module conforms.

Moreover, such repetitions of identical values often create a huge list of
unneeded metadata, which grows bitcode size both in memory and stored on disk.
It can go up to several Mb when linked against our OpenCL library. Lastly, such
long lists obscure reading of dumped IR.

The pass unifies metadata after linking.

Differential Revision: https://reviews.llvm.org/D25381

llvm-svn: 289092

50ea93a2

Oct 10, 2016

Move the global variables representing each Target behind accessor function · f42454b9

Mehdi Amini authored Oct 09, 2016

This avoids "static initialization order fiasco"

Differential Revision: https://reviews.llvm.org/D25412

llvm-svn: 283702

f42454b9

Oct 03, 2016
- [AMDGPU] Pass optimization level to SelectionDAGISel · 60a83737
  Konstantin Zhuravlyov authored Oct 03, 2016
```
llvm-svn: 283133
```
  60a83737
Sep 29, 2016

AMDGPU: Partially fix control flow at -O0 · e6740754

Matt Arsenault authored Sep 29, 2016

Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667

e6740754

Aug 22, 2016

AMDGPU: Split SILowerControlFlow into two pieces · 78fc9daf

Matt Arsenault authored Aug 22, 2016

Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.

One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
if they will be lowered to v_mov_b32.

This has a positive effect on SGPR usage in shader-db.

llvm-svn: 279464

78fc9daf

Aug 11, 2016
- AMDGPU: Prune includes · 2ffe8fd2
  Matt Arsenault authored Aug 11, 2016
```
llvm-svn: 278391
```
  2ffe8fd2
Jul 20, 2016

AMDGPU: Change fdiv lowering based on !fpmath metadata · a1fe17c9

Matt Arsenault authored Jul 19, 2016

If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.

Simplify the lowering tests by using per function
subtarget features.

llvm-svn: 276051

a1fe17c9

Jul 14, 2016
- AMDGPU/R600: Delete/rename intrinsics no longer used by mesa · ca7f5701
  Matt Arsenault authored Jul 14, 2016
```
Use the replacement pass to update the tests, and delete old names.

llvm-svn: 275375
```
  ca7f5701
Jun 24, 2016

AMDGPU: Add stub custom CodeGenPrepare pass · 86de486d

Matt Arsenault authored Jun 24, 2016

This will do various things including ones
CodeGenPrepare does, but with knowledge of uniform
values.

llvm-svn: 273657

86de486d

Jun 10, 2016
- AMDGPU: Properly initialize SIShrinkInstructions · c3a01ec9
  Matt Arsenault authored Jun 09, 2016
```
llvm-svn: 272336
```
  c3a01ec9
May 31, 2016
- AMDGPU: Remove unused address space · ec30eb50
  Matt Arsenault authored May 31, 2016
```
Also return a single StringRef instead of building a string.

llvm-svn: 271296
```
  ec30eb50
May 13, 2016

AMDGPU/EG,CM: Add instruction to read from constant AS (VTX2) · 81f1b300

Jan Vesely authored May 13, 2016

Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19785

llvm-svn: 269473

81f1b300

May 10, 2016
- [AMDGPU][NFC] Rename SIInsertNops -> SIDebuggerInsertNops · a7919321
  Konstantin Zhuravlyov authored May 10, 2016
```
Differential Revision: http://reviews.llvm.org/D20117

llvm-svn: 269098
```
  a7919321
Apr 14, 2016

AMDGPU: Remove SIFixSGPRLiveRanges pass · 723b73b4

Nicolai Haehnle authored Apr 14, 2016

Summary:
This pass is unnecessary and overly conservative. It was motivated by
situations like

  def %vreg0:SGPR_32
  ...
if-block:
  ..
  def %vreg1:SGPR_32
  ...
else-block:
  ...
  use %vreg0:SGPR_32
  ...

and similar situations with uses after the non-uniform control flow, where
we are not allowed to assign %vreg0 and %vreg1 to the same physical register,
even though in the original, thread/workitem-based CFG, it looks like the
live ranges of these registers do not overlap.

However, by the time register allocation runs, we have moved to a wave-based
CFG that accurately represents the fact that the wave may run through both
the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already
overlap even without the SIFixSGPRLiveRanges pass.

In addition to proving this change correct, I have tested it with Piglit
and a small number of other tests.

Reviewers: arsenm, tstellarAMD

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19041

llvm-svn: 266345

723b73b4

Apr 06, 2016

AMDGPU: Add a shader calling convention · df3a20cd

Nicolai Haehnle authored Apr 06, 2016

This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.

Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Differential Revision: http://reviews.llvm.org/D18559

llvm-svn: 265589

df3a20cd

Mar 21, 2016

AMDGPU: Add SIWholeQuadMode pass · 213e87f2

Nicolai Haehnle authored Mar 21, 2016

Summary:
Whole quad mode is already enabled for pixel shaders that compute
derivatives, but it must be suspended for instructions that cause a
shader to have side effects (i.e. stores and atomics).

This pass addresses the issue by storing the real (initial) live mask
in a register, masking EXEC before instructions that require exact
execution and (re-)enabling WQM where required.

This pass is run before register coalescing so that we can use
machine SSA for analysis.

The changes in this patch expose a problem with the second machine
scheduling pass: target independent instructions like COPY implicitly
use EXEC when they operate on VGPRs, but this fact is not encoded in
the MIR. This can lead to miscompilation because instructions are
moved past changes to EXEC.

This patch fixes the problem by adding use-implicit operands to
target independent instructions. Some general codegen passes are
relaxed to work with such implicit use operands.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18162

llvm-svn: 263982

213e87f2

Mar 11, 2016

AMDGPU: R600 code splitting cleanup · 6b6a2c37

Matt Arsenault authored Mar 11, 2016

Move a few functions only used by R600 to R600 specific code,
fix header macros to stop using R600, mark classes as final.

llvm-svn: 263204

6b6a2c37

Mar 03, 2016

AMDGPU: Insert two S_NOP instructions for every high level source statement. · cc7067a6

Tom Stellard authored Mar 03, 2016

Patch by: Konstantin Zhuravlyov

Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.

Reviewers: tstellarAMD, arsenm

Subscribers: echristo, dblaikie, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17454

llvm-svn: 262579

cc7067a6

Feb 13, 2016

AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions · bc4497b1

Tom Stellard authored Feb 12, 2016

Reviewers: arsenm

Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16603

llvm-svn: 260765

bc4497b1

Feb 12, 2016
- AMDGPU: Initialize SILowerControlFlow · 55d49cfe
  Matt Arsenault authored Feb 12, 2016
```
llvm-svn: 260645
```
  55d49cfe
Feb 05, 2016

AMDGPU/SI: Correctly initialize SIInsertWaits pass · 6e1967ef

Tom Stellard authored Feb 05, 2016

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16724

llvm-svn: 259894

6e1967ef

Jan 30, 2016

AMDGPU: Fix emitting invalid workitem intrinsics for HSA · e0132464

Matt Arsenault authored Jan 30, 2016

The AMDGPUPromoteAlloca pass was emitting the read.local.size
calls, which with HSA was incorrectly selected to reading from
the offset mesa uses off of the kernarg pointer.

Error on intrinsics which aren't supported by HSA, and start
emitting the correct IR to read the workgroup size
out of the dispatch pointer.

Also initialize the pass so it can be tested with opt, and
start moving towards not depending on the subtarget as an
argument.

Start emitting errors for the intrinsics not handled with HSA.

llvm-svn: 259297

e0132464

Jan 20, 2016

Correctly initialize SIAnnotateControlFlow · 77a17772

Tom Stellard authored Jan 20, 2016

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16304

llvm-svn: 258319

77a17772

Jan 13, 2016

Fix struct/class mismatch for MachineSchedContext · 81efb6b4
Hans Wennborg authored Jan 13, 2016
```
llvm-svn: 257648
```
81efb6b4

AMDGPU/SI: Add SI Machine Scheduler · 02c32915

Nicolai Haehnle authored Jan 13, 2016

Summary:
It is off by default, but can be used
with --misched=si

Patch by: Axel Davy

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D11885

llvm-svn: 257609

02c32915

Dec 15, 2015

AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF instructions · a6f24c65

Tom Stellard authored Dec 15, 2015

Summary:
We were previously selecting all constant loads to SMRD instructions and legalizing
the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass.

This new solution is more simple and also generates much better code, because
the instruction selector is able to take advantage of all the MUBUF addressing
modes that are legalization pass wasn't able to.

We also no longer need to generate v_add_* instructions when we
have a uniform pointer and a non-uniform offset, as this is now folded into the
MUBUF instruction during instruction selection.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15425

llvm-svn: 255672

a6f24c65

Dec 10, 2015

AMDGPU/SI: Emit constant arrays in the .text section · c93fc11f

Tom Stellard authored Dec 10, 2015

Summary:
This allows us to remove the END_OF_TEXT_LABEL hack we had been using
and simplifies the fixups used to compute the address of constant
arrays.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15257

llvm-svn: 255204

c93fc11f

Nov 30, 2015

AMDGPU: Remove SIPrepareScratchRegs · 0e3d3893

Matt Arsenault authored Nov 30, 2015

It does not work because of emergency stack slots.
This pass was supposed to eliminate dummy registers for the
spill instructions, but the register scavenger can introduce
more during PrologEpilogInserter, so some would end up
left behind if they were needed.

The potential for spilling the scratch resource descriptor
and offset register makes doing something like this
overly complicated. Reserve registers to use for the resource
descriptor and use them directly in eliminateFrameIndex.

Also removes creating another scratch resource descriptor
when directly selecting scratch MUBUF instructions.

The choice of which registers are reserved is temporary.
For now it attempts to pick the next available registers
after the user and system SGPRs.

llvm-svn: 254329

0e3d3893

Nov 06, 2015

AMDGPU: Add pass to detect used kernel features · 3931948b

Matt Arsenault authored Nov 06, 2015

Mark kernels that use certain features that require user
SGPRs to support with kernel attributes. We need to know
before instruction selection begins because it impacts
the kernel calling convention lowering.

For now this only detects the workitem intrinsics.

llvm-svn: 252323

3931948b

Nov 03, 2015
- AMDGPU: Initialize SIFixSGPRCopies so -print-after works · 782c03bb
  Matt Arsenault authored Nov 03, 2015
```
llvm-svn: 251995
```
  782c03bb
Aug 08, 2015

AMDGPU: Add pass to lower OpenCL image and sampler arguments. · fd25395c

Tom Stellard authored Aug 07, 2015

The pass adds new kernel arguments for image attributes, and
resolves calls to dummy attribute and resource id getter functions.

Patch by: Zoltan Gilian

llvm-svn: 244372

fd25395c