Commits · 582a5237f95a3852cead5208f28a84b4cab0efb2 · Roger Ferrer / llvm-epi

Feb 15, 2017

[AMDGPU] Revert failed scheduling · 582a5237

Stanislav Mekhanoshin authored Feb 15, 2017

This patch reverts region's scheduling to the original untouched state
in case if we have have decreased occupancy.

In addition it switches to use TargetRegisterInfo occupancy callback
for pressure limits instead of gradually increasing limits which were
just passed by. We are going to stay with the best schedule so we do
not need to tolerate worsened scheduling anymore.

Differential Revision: https://reviews.llvm.org/D29971

llvm-svn: 295206

582a5237

Feb 09, 2017
- AMDGPU: Add pass to expand memcpy/memmove/memset · 0699ef39
  Matt Arsenault authored Feb 09, 2017
```
llvm-svn: 294635
```
  0699ef39
Feb 08, 2017
- AMDGPU: Enable InferAddressSpaces · 417e0072
  Matt Arsenault authored Feb 08, 2017
```
llvm-svn: 294408
```
  417e0072
Jan 30, 2017

Re-commit AMDGPU/GlobalISel: Add support for simple shaders · ca16621b

Tom Stellard authored Jan 30, 2017

Fix build when global-isel is disabled and fix a warning.

Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.

Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm

Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D26730

llvm-svn: 293551

ca16621b

[AMDGPU] Internalize non-kernel symbols · a3b72798

Stanislav Mekhanoshin authored Jan 30, 2017

Since we have no call support and late linking we can produce code
only for used symbols. This saves compilation time, size of the final
executable, and size of any intermediate dumps.

Run Internalize pass early in the opt pipeline followed by global
DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option.

Differential Revision: https://reviews.llvm.org/D29214

llvm-svn: 293549

a3b72798

AMDGPU: Run AMDGPUCodeGenPrepare after inlining · 0c329384

Matt Arsenault authored Jan 30, 2017

With leaf functions, this makes nonsensical decisions
based on the uniformity of the arguments.

llvm-svn: 293525

0c329384

Revert "AMDGPU/GlobalISel: Add support for simple shaders" · 7a19d56f
Tom Stellard authored Jan 30, 2017
```
This reverts commit r293503.

Revert while I investigate some of the buildbot failures.

llvm-svn: 293509
```
7a19d56f

AMDGPU/GlobalISel: Add support for simple shaders · e48f60ae

Tom Stellard authored Jan 30, 2017

Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.

Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm

Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D26730

llvm-svn: 293503

e48f60ae

Jan 27, 2017

[AMDGPU] Turn AMDGPUUnifyMetadata back into module pass · f6c1feb8

Stanislav Mekhanoshin authored Jan 27, 2017

With the adjustPassManager interface that is now possible to use
custom early module passes.

Differential Revision: https://reviews.llvm.org/D29189

llvm-svn: 293300

f6c1feb8

Jan 26, 2017

Replace addEarlyAsPossiblePasses callback with adjustPassManager · 81598117

Stanislav Mekhanoshin authored Jan 26, 2017

This change introduces adjustPassManager target callback giving a
target an opportunity to tweak PassManagerBuilder before pass
managers are populated.

This generalizes and replaces addEarlyAsPossiblePasses target
callback. In particular that can be used to add custom passes to
extension points other than EP_EarlyAsPossible.

Differential Revision: https://reviews.llvm.org/D28336

llvm-svn: 293189

81598117

Jan 25, 2017

AMDGPU: Implement early ifcvt target hooks. · 9f5e0ef0

Matt Arsenault authored Jan 25, 2017

Leave early ifcvt disabled for now since there are some
shader-db regressions.

This causes some immediate improvements, but could be better.
The cost checking that the pass does is based on critical path
length for out of order CPUs which we do not want so it skips out
on many cases we want.

llvm-svn: 293016

9f5e0ef0

Jan 24, 2017

[AMDGPU] Add VGPR copies post regalloc fix pass · 22a56f2f

Stanislav Mekhanoshin authored Jan 24, 2017

Regalloc creates COPY instructions which do not formally use VALU.
That results in v_mov instructions displaced after exec mask modification.
One pass which do it is SIOptimizeExecMasking, but potentially it can be
done by other passes too.

This patch adds a pass immediately after regalloc to add implicit exec
use operand to all VGPR copy instructions.

Differential Revision: https://reviews.llvm.org/D28874

llvm-svn: 292956

22a56f2f

Dec 12, 2016

[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You... · 6a9226d9

Eugene Zelenko authored Dec 12, 2016

[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).

llvm-svn: 289475

6a9226d9

Dec 08, 2016

[AMDGPU] Add amdgpu-unify-metadata pass · 50ea93a2

Stanislav Mekhanoshin authored Dec 08, 2016

Multiple metadata values for records such as opencl.ocl.version, llvm.ident
and similar are created after linking several modules. For some of them, notably
opencl.ocl.version, this creates semantic problem because we cannot tell which
version of OpenCL the composite module conforms.

Moreover, such repetitions of identical values often create a huge list of
unneeded metadata, which grows bitcode size both in memory and stored on disk.
It can go up to several Mb when linked against our OpenCL library. Lastly, such
long lists obscure reading of dumped IR.

The pass unifies metadata after linking.

Differential Revision: https://reviews.llvm.org/D25381

llvm-svn: 289092

50ea93a2

[AMDGPU] Scalarization of global uniform loads. · 18009560

Alexander Timofeev authored Dec 08, 2016

Summary:
LC can currently select scalar load for uniform memory access
basing on readonly memory address space only. This restriction
originated from the fact that in HW prior to VI vector and scalar caches
are not coherent. With MemoryDependenceAnalysis we can check that the
memory location corresponding to the memory operand of the LOAD is not
clobbered along the all paths from the function entry.

Reviewers: rampitec, tstellarAMD, arsenm

Subscribers: wdng, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D26917

llvm-svn: 289076

18009560

Dec 06, 2016

AMDGPU: Don't required structured CFG · ad55ee58

Matt Arsenault authored Dec 06, 2016

The structured CFG is just an aid to inserting exec
mask modification instructions, once that is done
we don't really need it anymore. We also
do not analyze blocks with terminators that
modify exec, so this should only be impacting
true branches.

llvm-svn: 288744

ad55ee58

Nov 28, 2016

MachineScheduler: Export function to construct "default" scheduler. · 115efcd3

Matthias Braun authored Nov 28, 2016

This makes the createGenericSchedLive() function that constructs the
default scheduler available for the public API. This should help when
you want to get a scheduler and the default list of DAG mutations.

This also shrinks the list of default DAG mutations:
{Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer
added by default. Targets can easily add them if they need them. It also
makes it easier for targets to add alternative/custom macrofusion or
clustering mutations while staying with the default
createGenericSchedLive(). It also saves the callback back and forth in
TargetInstrInfo::enableClusterLoads()/enableClusterStores().

Differential Revision: https://reviews.llvm.org/D26986

llvm-svn: 288057

115efcd3

Nov 17, 2016
- Revert "AMDGPU: Enable ConstrainCopy DAG mutation" · 0a1a7b6b
  Konstantin Zhuravlyov authored Nov 17, 2016
```
This reverts commit r287146.

This breaks few conformance tests.

llvm-svn: 287233
```
  0a1a7b6b
Nov 16, 2016

AMDGPU: Enable ConstrainCopy DAG mutation · 3b36bb1d

Matt Arsenault authored Nov 16, 2016

This fixes a probably unintended divergence from the default
scheduler behavior.

llvm-svn: 287146

3b36bb1d

Nov 15, 2016

AMDGPU: Enable store clustering · d4bb5e48

Matt Arsenault authored Nov 15, 2016

Also respect the TII hook for these like the generic code does
in case we want a flag later to disable this.

llvm-svn: 287021

d4bb5e48

Oct 10, 2016

Move the global variables representing each Target behind accessor function · f42454b9

Mehdi Amini authored Oct 09, 2016

This avoids "static initialization order fiasco"

Differential Revision: https://reviews.llvm.org/D25412

llvm-svn: 283702

f42454b9

Oct 06, 2016
- BranchRelaxation: Support expanding unconditional branches · 6bc43d86
  Matt Arsenault authored Oct 06, 2016
```
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.

llvm-svn: 283464
```
  6bc43d86
Oct 03, 2016
- [AMDGPU] Pass optimization level to SelectionDAGISel · 60a83737
  Konstantin Zhuravlyov authored Oct 03, 2016
```
llvm-svn: 283133
```
  60a83737
Sep 30, 2016
- [AMDGPU] Do not run scalar optimization passes at "-O0" · 4658e5f7
  Konstantin Zhuravlyov authored Sep 30, 2016
```
Differential Revision: https://reviews.llvm.org/D25055

llvm-svn: 282873
```
  4658e5f7
Sep 29, 2016

AMDGPU: Partially fix control flow at -O0 · e6740754

Matt Arsenault authored Sep 29, 2016

Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667

e6740754

Sep 10, 2016
- AMDGPU: Run LoadStoreVectorizer pass by default · 0efdd06b
  Matt Arsenault authored Sep 09, 2016
```
llvm-svn: 281112
```
  0efdd06b
Aug 29, 2016

AMDGPU/SI: Implement a custom MachineSchedStrategy · 0d23ebe8

Tom Stellard authored Aug 29, 2016

Summary:
GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
a different method to compute the excess and critical register
pressure limits.

It's not enabled by default, to enable it you need to pass -misched=gcn
to llc.

Shader DB stats:

32464 shaders in 17874 tests
Totals:
SGPRS: 1542846 -> 1643125 (6.50 %)
VGPRS: 1005595 -> 904653 (-10.04 %)
Spilled SGPRs: 29929 -> 27745 (-7.30 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 36688188 -> 37034900 (0.95 %) bytes
LDS: 1913 -> 1913 (0.00 %) blocks
Max Waves: 254101 -> 265125 (4.34 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 1338220 -> 1438499 (7.49 %)
VGPRS: 886221 -> 785279 (-11.39 %)
Spilled SGPRs: 29869 -> 27685 (-7.31 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 34315716 -> 34662428 (1.01 %) bytes
LDS: 1551 -> 1551 (0.00 %) blocks
Max Waves: 188127 -> 199151 (5.86 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23688

llvm-svn: 279995

0d23ebe8

AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler · c2ff0eb6

Tom Stellard authored Aug 29, 2016

Summary:
The SILoadStoreOptimizer can now look ahead more then one instruction when
looking for instructions to merge, which greatly improves the number of
loads/stores that we are able to merge.

Moving the pass before scheduling avoids increasing register pressure after
the scheduler, so that the scheduler's register pressure estimates will be
more accurate.  It also gives more consistent results, since it is no longer
affected by minor scheduling changes.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23814

llvm-svn: 279991

c2ff0eb6

Aug 22, 2016

AMDGPU: Split SILowerControlFlow into two pieces · 78fc9daf

Matt Arsenault authored Aug 22, 2016

Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.

One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
if they will be lowered to v_mov_b32.

This has a positive effect on SGPR usage in shader-db.

llvm-svn: 279464

78fc9daf

Aug 17, 2016

[PM] Port the always inliner to the new pass manager in a much more · 67fc52f0

Chandler Carruth authored Aug 17, 2016

minimal and boring form than the old pass manager's version.

This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:

- Array alloca merging
  - To support the above, bottom-up inlining with careful history
    tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
  Instead, it focuses on inlining functions with that attribute.

The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.

The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.

The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.

One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.

Anyways, hopefully a reasonable starting point for this pass.

Differential Revision: https://reviews.llvm.org/D23299

llvm-svn: 278896

67fc52f0

[AMDGPU] Remove duplicate initialization of SIDebuggerInsertNops pass · e0b87181
Konstantin Zhuravlyov authored Aug 16, 2016
```
Differential Revision: https://reviews.llvm.org/D23556

llvm-svn: 278863
```
e0b87181

Aug 11, 2016
- AMDGPU: Prune includes · 2ffe8fd2
  Matt Arsenault authored Aug 11, 2016
```
llvm-svn: 278391
```
  2ffe8fd2
Jul 27, 2016

[GlobalISel] Introduce an instruction selector. · 6756a2c9

Ahmed Bougacha authored Jul 27, 2016

And implement it for AArch64, supporting x/w ADD/OR.

Differential Revision: https://reviews.llvm.org/D22373

llvm-svn: 276875

6756a2c9

Jul 22, 2016

GlobalISel: implement legalization pass, with just one transformation. · 33b07d67

Tim Northover authored Jul 22, 2016

This adds the actual MachineLegalizeHelper to do the work and a trivial pass
wrapper that legalizes all instructions in a MachineFunction. Currently the
only transformation supported is splitting up a vector G_ADD into one acting on
smaller vectors.

llvm-svn: 276461

33b07d67

Jul 20, 2016

AMDGPU: Change fdiv lowering based on !fpmath metadata · a1fe17c9

Matt Arsenault authored Jul 19, 2016

If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.

Simplify the lowering tests by using per function
subtarget features.

llvm-svn: 276051

a1fe17c9

Jul 14, 2016
- AMDGPU/R600: Delete/rename intrinsics no longer used by mesa · ca7f5701
  Matt Arsenault authored Jul 14, 2016
```
Use the replacement pass to update the tests, and delete old names.

llvm-svn: 275375
```
  ca7f5701
Jul 13, 2016

AMDGPU/SI: Add support for R_AMDGPU_GOTPCREL · 418beb76

Tom Stellard authored Jul 13, 2016

Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21484

llvm-svn: 275268

418beb76

Jul 01, 2016
- AMDGPU: Add option to run the load/store vectorizer · 908b9e26
  Matt Arsenault authored Jul 01, 2016
```
llvm-svn: 274329
```
  908b9e26
Jun 28, 2016
- AMDGPU: Fix global isel crashes · eb9025d6
  Matt Arsenault authored Jun 28, 2016
```
llvm-svn: 274039
```
  eb9025d6
- AMDGPU: Fix typo · 254a6450
  Matt Arsenault authored Jun 28, 2016
```
llvm-svn: 274034
```
  254a6450