Commits · 89653dfd2afa0708ad3f093371aa27b57a366d5c · Lorenzo Albano / LLVM bpEVL

Mar 30, 2017

[AMDGPU] Add GlobalOpt parameter to Always Inliner pass · 89653dfd

Stanislav Mekhanoshin authored Mar 30, 2017

If set to false it does not remove global aliases. With this parameter
set to false it should be safe to run the pass before link.

Differential Revision: https://reviews.llvm.org/D31489

llvm-svn: 299108

89653dfd

Mar 28, 2017

[AMDGPU] Split -amdgpu-early-inline-all option · 9053f22e

Stanislav Mekhanoshin authored Mar 28, 2017

Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.

Differential Revision: https://reviews.llvm.org/D31429

llvm-svn: 298935

9053f22e

Mar 27, 2017

[AMDGPU] Get address space mapping by target triple environment · 1a14bfa0

Yaxun Liu authored Mar 27, 2017

As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.

The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.

Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.

Differential Revision: https://reviews.llvm.org/D31284

llvm-svn: 298846

1a14bfa0

Mar 25, 2017

[AMDGPU] Switch data layout by triple environment amdgiz · 14834c3e

Yaxun Liu authored Mar 25, 2017

Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0.

amdgiz is for non-OpenCL environment where generic address space is 0.

amdgizcl is for OpenCL environment where generic address space is 0.

Differential Revision: https://reviews.llvm.org/D31211

llvm-svn: 298758

14834c3e

Mar 24, 2017

AMDGPU: Unify divergent function exits. · b8f8dbc2

Matt Arsenault authored Mar 24, 2017

StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.

llvm-svn: 298729

b8f8dbc2

[AMDGPU] Add AMDGPUAliasAnalysis to opt pipeline · a27b2cac

Stanislav Mekhanoshin authored Mar 24, 2017

Previously it was added only to the BE.

Differential Revision: https://reviews.llvm.org/D31323

llvm-svn: 298721

a27b2cac

Mar 21, 2017

[AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler · fd4c410f
Valery Pykhtin authored Mar 21, 2017
```
Differential revision: https://reviews.llvm.org/D31046

llvm-svn: 298368
```
fd4c410f

[ADMGPU] SDWA peephole optimization pass. · f60ad58d

Sam Kolton authored Mar 21, 2017

Summary:
First iteration of SDWA peephole.

This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
'''
V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
'''
Into:
'''
V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
'''

Pass structure:
1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
3. Iterate over all potential instructions and check if they can be converted into SDWA.
4. Convert instructions to SDWA.

This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
There are several ways this pass can be improved:
1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
2. Introduce more SDWA patterns
3. Introduce mnemonics to limit when SDWA patterns should apply

Reviewers: vpykhtin, alex-t, arsenm, rampitec

Subscribers: wdng, nhaehnle, mgorny

Differential Revision: https://reviews.llvm.org/D30038

llvm-svn: 298365

f60ad58d

Mar 20, 2017
- [AMDGPU] Run always inliner early in opt · 2534bc07
  Konstantin Zhuravlyov authored Mar 20, 2017
```
Differential Revision: https://reviews.llvm.org/D31141

llvm-svn: 298281
```
  2534bc07
- Revert "[AMDGPU] Run always inliner early in opt" · 8a67eb14
  Konstantin Zhuravlyov authored Mar 20, 2017
```
This reverts commit r297958, it breaks device-libs build.

llvm-svn: 298239
```
  8a67eb14
Mar 18, 2017

[AMDGPU] Add address space based alias analysis pass · 8e45acfc

Stanislav Mekhanoshin authored Mar 17, 2017

This is direct port of HSAILAliasAnalysis pass, just cleaned for
style and renamed.

Differential Revision: https://reviews.llvm.org/D31103

llvm-svn: 298172

8e45acfc

Mar 17, 2017

Only unswitch loops with uniform conditions · ee2dd785

Stanislav Mekhanoshin authored Mar 17, 2017

Loop unswitching can be extremely harmful for a SIMT target. In case
if hoisted condition is not uniform a SIMT machine will execute both
clones of a loop sequentially. Therefor LoopUnswitch checks if the
condition is non-divergent.

Since DivergenceAnalysis adds an expensive PostDominatorTree analysis
not needed for non-SIMT targets a new option is added to avoid unneded
analysis initialization. The method getAnalysisUsage is called when
TargetTransformInfo is not yet available and we cannot use it here.
For that reason a new field DivergentTarget is added to PassManagerBuilder
to control the behavior and set this field from a target.

Differential Revision: https://reviews.llvm.org/D30796

llvm-svn: 298104

ee2dd785

Mar 16, 2017

[AMDGPU] Run always inliner early in opt · f8050797

Stanislav Mekhanoshin authored Mar 16, 2017

We can mark functions to always inline early in the opt. Since we do not have
call support this early inlining creates opportunities for inter-procedural
optimizations which would not occur otherwise.

Differential Revision: https://reviews.llvm.org/D31016

llvm-svn: 297958

f8050797

Feb 18, 2017
- AMDGPU: Merge initial gfx9 support · e823d92f
  Matt Arsenault authored Feb 18, 2017
```
llvm-svn: 295554
```
  e823d92f
Feb 15, 2017

[AMDGPU] Revert failed scheduling · 582a5237

Stanislav Mekhanoshin authored Feb 15, 2017

This patch reverts region's scheduling to the original untouched state
in case if we have have decreased occupancy.

In addition it switches to use TargetRegisterInfo occupancy callback
for pressure limits instead of gradually increasing limits which were
just passed by. We are going to stay with the best schedule so we do
not need to tolerate worsened scheduling anymore.

Differential Revision: https://reviews.llvm.org/D29971

llvm-svn: 295206

582a5237

Feb 09, 2017
- AMDGPU: Add pass to expand memcpy/memmove/memset · 0699ef39
  Matt Arsenault authored Feb 09, 2017
```
llvm-svn: 294635
```
  0699ef39
Feb 08, 2017
- AMDGPU: Enable InferAddressSpaces · 417e0072
  Matt Arsenault authored Feb 08, 2017
```
llvm-svn: 294408
```
  417e0072
Jan 30, 2017

Re-commit AMDGPU/GlobalISel: Add support for simple shaders · ca16621b

Tom Stellard authored Jan 30, 2017

Fix build when global-isel is disabled and fix a warning.

Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.

Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm

Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D26730

llvm-svn: 293551

ca16621b

[AMDGPU] Internalize non-kernel symbols · a3b72798

Stanislav Mekhanoshin authored Jan 30, 2017

Since we have no call support and late linking we can produce code
only for used symbols. This saves compilation time, size of the final
executable, and size of any intermediate dumps.

Run Internalize pass early in the opt pipeline followed by global
DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option.

Differential Revision: https://reviews.llvm.org/D29214

llvm-svn: 293549

a3b72798

AMDGPU: Run AMDGPUCodeGenPrepare after inlining · 0c329384

Matt Arsenault authored Jan 30, 2017

With leaf functions, this makes nonsensical decisions
based on the uniformity of the arguments.

llvm-svn: 293525

0c329384

Revert "AMDGPU/GlobalISel: Add support for simple shaders" · 7a19d56f
Tom Stellard authored Jan 30, 2017
```
This reverts commit r293503.

Revert while I investigate some of the buildbot failures.

llvm-svn: 293509
```
7a19d56f

AMDGPU/GlobalISel: Add support for simple shaders · e48f60ae

Tom Stellard authored Jan 30, 2017

Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.

Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm

Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D26730

llvm-svn: 293503

e48f60ae

Jan 27, 2017

[AMDGPU] Turn AMDGPUUnifyMetadata back into module pass · f6c1feb8

Stanislav Mekhanoshin authored Jan 27, 2017

With the adjustPassManager interface that is now possible to use
custom early module passes.

Differential Revision: https://reviews.llvm.org/D29189

llvm-svn: 293300

f6c1feb8

Jan 26, 2017

Replace addEarlyAsPossiblePasses callback with adjustPassManager · 81598117

Stanislav Mekhanoshin authored Jan 26, 2017

This change introduces adjustPassManager target callback giving a
target an opportunity to tweak PassManagerBuilder before pass
managers are populated.

This generalizes and replaces addEarlyAsPossiblePasses target
callback. In particular that can be used to add custom passes to
extension points other than EP_EarlyAsPossible.

Differential Revision: https://reviews.llvm.org/D28336

llvm-svn: 293189

81598117

Jan 25, 2017

AMDGPU: Implement early ifcvt target hooks. · 9f5e0ef0

Matt Arsenault authored Jan 25, 2017

Leave early ifcvt disabled for now since there are some
shader-db regressions.

This causes some immediate improvements, but could be better.
The cost checking that the pass does is based on critical path
length for out of order CPUs which we do not want so it skips out
on many cases we want.

llvm-svn: 293016

9f5e0ef0

Jan 24, 2017

[AMDGPU] Add VGPR copies post regalloc fix pass · 22a56f2f

Stanislav Mekhanoshin authored Jan 24, 2017

Regalloc creates COPY instructions which do not formally use VALU.
That results in v_mov instructions displaced after exec mask modification.
One pass which do it is SIOptimizeExecMasking, but potentially it can be
done by other passes too.

This patch adds a pass immediately after regalloc to add implicit exec
use operand to all VGPR copy instructions.

Differential Revision: https://reviews.llvm.org/D28874

llvm-svn: 292956

22a56f2f

Dec 12, 2016

[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You... · 6a9226d9

Eugene Zelenko authored Dec 12, 2016

[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).

llvm-svn: 289475

6a9226d9

Dec 08, 2016

[AMDGPU] Add amdgpu-unify-metadata pass · 50ea93a2

Stanislav Mekhanoshin authored Dec 08, 2016

Multiple metadata values for records such as opencl.ocl.version, llvm.ident
and similar are created after linking several modules. For some of them, notably
opencl.ocl.version, this creates semantic problem because we cannot tell which
version of OpenCL the composite module conforms.

Moreover, such repetitions of identical values often create a huge list of
unneeded metadata, which grows bitcode size both in memory and stored on disk.
It can go up to several Mb when linked against our OpenCL library. Lastly, such
long lists obscure reading of dumped IR.

The pass unifies metadata after linking.

Differential Revision: https://reviews.llvm.org/D25381

llvm-svn: 289092

50ea93a2

[AMDGPU] Scalarization of global uniform loads. · 18009560

Alexander Timofeev authored Dec 08, 2016

Summary:
LC can currently select scalar load for uniform memory access
basing on readonly memory address space only. This restriction
originated from the fact that in HW prior to VI vector and scalar caches
are not coherent. With MemoryDependenceAnalysis we can check that the
memory location corresponding to the memory operand of the LOAD is not
clobbered along the all paths from the function entry.

Reviewers: rampitec, tstellarAMD, arsenm

Subscribers: wdng, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D26917

llvm-svn: 289076

18009560

Dec 06, 2016

AMDGPU: Don't required structured CFG · ad55ee58

Matt Arsenault authored Dec 06, 2016

The structured CFG is just an aid to inserting exec
mask modification instructions, once that is done
we don't really need it anymore. We also
do not analyze blocks with terminators that
modify exec, so this should only be impacting
true branches.

llvm-svn: 288744

ad55ee58

Nov 28, 2016

MachineScheduler: Export function to construct "default" scheduler. · 115efcd3

Matthias Braun authored Nov 28, 2016

This makes the createGenericSchedLive() function that constructs the
default scheduler available for the public API. This should help when
you want to get a scheduler and the default list of DAG mutations.

This also shrinks the list of default DAG mutations:
{Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer
added by default. Targets can easily add them if they need them. It also
makes it easier for targets to add alternative/custom macrofusion or
clustering mutations while staying with the default
createGenericSchedLive(). It also saves the callback back and forth in
TargetInstrInfo::enableClusterLoads()/enableClusterStores().

Differential Revision: https://reviews.llvm.org/D26986

llvm-svn: 288057

115efcd3

Nov 17, 2016
- Revert "AMDGPU: Enable ConstrainCopy DAG mutation" · 0a1a7b6b
  Konstantin Zhuravlyov authored Nov 17, 2016
```
This reverts commit r287146.

This breaks few conformance tests.

llvm-svn: 287233
```
  0a1a7b6b
Nov 16, 2016

AMDGPU: Enable ConstrainCopy DAG mutation · 3b36bb1d

Matt Arsenault authored Nov 16, 2016

This fixes a probably unintended divergence from the default
scheduler behavior.

llvm-svn: 287146

3b36bb1d

Nov 15, 2016

AMDGPU: Enable store clustering · d4bb5e48

Matt Arsenault authored Nov 15, 2016

Also respect the TII hook for these like the generic code does
in case we want a flag later to disable this.

llvm-svn: 287021

d4bb5e48

Oct 10, 2016

Move the global variables representing each Target behind accessor function · f42454b9

Mehdi Amini authored Oct 09, 2016

This avoids "static initialization order fiasco"

Differential Revision: https://reviews.llvm.org/D25412

llvm-svn: 283702

f42454b9

Oct 06, 2016
- BranchRelaxation: Support expanding unconditional branches · 6bc43d86
  Matt Arsenault authored Oct 06, 2016
```
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.

llvm-svn: 283464
```
  6bc43d86
Oct 03, 2016
- [AMDGPU] Pass optimization level to SelectionDAGISel · 60a83737
  Konstantin Zhuravlyov authored Oct 03, 2016
```
llvm-svn: 283133
```
  60a83737
Sep 30, 2016
- [AMDGPU] Do not run scalar optimization passes at "-O0" · 4658e5f7
  Konstantin Zhuravlyov authored Sep 30, 2016
```
Differential Revision: https://reviews.llvm.org/D25055

llvm-svn: 282873
```
  4658e5f7
Sep 29, 2016

AMDGPU: Partially fix control flow at -O0 · e6740754

Matt Arsenault authored Sep 29, 2016

Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667

e6740754

Sep 10, 2016
- AMDGPU: Run LoadStoreVectorizer pass by default · 0efdd06b
  Matt Arsenault authored Sep 09, 2016
```
llvm-svn: 281112
```
  0efdd06b