Commits · 422fc5603ab5a93a814d9652201e4582f18f8136 · Lorenzo Albano / LLVM bpEVL

Aug 12, 2021

[llvm][Inline] Refactor out InlineOrder · 422fc560

Liqiang Tao authored Aug 12, 2021

Move InlineOrder to separated file.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D107831

422fc560

Aug 11, 2021

[Attributor][NFC] Try to make the windows build bots happy · fc32a5c8

Johannes Doerfert authored Aug 11, 2021

Failed for some reason, potentially because of the inner type
declaration in combination with the `using`. This might help.

Failure:
https://lab.llvm.org/buildbot/#/builders/127/builds/15432

fc32a5c8

[Attributor][FIX] Handle recurrences (PHIs) in AAPointerInfo explicitly · e7e3585c

Johannes Doerfert authored Aug 10, 2021

PHI nodes are not pass through but change their value, we have to
account for that to avoid missing stores.

Follow up for D107798 to fix PR51249 for good.

Differential Revision: https://reviews.llvm.org/D107808

e7e3585c

[Attributor][FIX] Only avoid visiting PHI uses multiple times (PR51249) · 96da6dd6

Johannes Doerfert authored Aug 09, 2021

AAPointerInfoFloating needs to visit all uses and some multiple times if
we go through PHI nodes. Attributor::checkForAllUses keeps a visited set
so we don't recurs endlessly. We now allow recursion for non-phi uses so
we track all pointer offsets via PHI nodes properly without endless
recursion.

This replaces the first attempt D107579.

Differential Revision: https://reviews.llvm.org/D107798

96da6dd6

[OpenMP][FIX] Disabled optimizations have to be made known · e0c5d83a

Johannes Doerfert authored Aug 10, 2021

To avoid simplification with wrong constants we need to make sure we
know that we won't perform specific optimizations based on the users
request. The non-SPMDzation and non-CustomStateMachine flags did only
prevent the final transformation but allowed to value simplification
to go ahead.

Differential Revision: https://reviews.llvm.org/D107862

e0c5d83a

Aug 06, 2021

[FuncSpec] Return changed if function is changed by tryToReplaceWithConstant · 0fd03feb

Chuanqi Xu authored Aug 06, 2021

The may get changed before specialization by RunSCCPSolver. In other
words, the pass may change the function without specialization happens.
Add test and comment to reveal this.
And it may return No Changed if the function get changed by
RunSCCPSolver before the specialization. It looks like a potential bug.

Test Plan: check-all

Reviewed By: https://reviews.llvm.org/D107622

Differential Revision: https://reviews.llvm.org/D107622

0fd03feb

[NFC] [FuncSpec] Remove unused variables in isArgumentInteresting · 62fc3e0a
Chuanqi Xu authored Aug 06, 2021

62fc3e0a

[FuncSpec] Move invariant computation for spec cost out of loop (NFC-ish) · cc3f40bb

Chuanqi Xu authored Aug 06, 2021

Noticed that the computation for function specialization cost of a
function wouldn't change during the traversal of the arguments for the
function. We could hoist the computation out of the traversal. I
observed about ~1% improvement on compile time for spec2017. But I guess
it may not be precise. This should be NFC and fine.

Reviewed By: Sjoerd Meijer

Differential Revision: https://reviews.llvm.org/D107621

cc3f40bb

[NFC] [FuncSpec] Update the Todo list for recursive functions · 82ca845b

Chuanqi Xu authored Aug 06, 2021

Now the recursive functions may get specialized many times when
`func-specialization-max-iters` increases. See discussion in
https://reviews.llvm.org/D106426 for details.

82ca845b

Aug 04, 2021

[OpenMPOpt] Expand SPMDization with guarding for target parallel regions · 29a3e3dd

Giorgis Georgakoudis authored Aug 03, 2021

This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106892

29a3e3dd

[FuncSpec] Support specialising recursive functions · 30fbb069

Sjoerd Meijer authored Aug 03, 2021

This adds support for specialising recursive functions. For example:

    int Global = 1;
    void recursiveFunc(int *arg) {
      if (*arg < 4) {
        print(*arg);
        recursiveFunc(*arg + 1);
      }
    }
    void main() {
      recursiveFunc(&Global);
    }

After 3 iterations of function specialisation, followed by inlining of the
specialised versions of recursiveFunc, the main function looks like this:

    void main() {
      print(1);
      print(2);
      print(3);
    }

To support this, the following has been added:
- Update the solver and state of the new specialised functions,
- An optimisation to propagate constant stack values after each iteration of
  function specialisation, which is necessary for the next iteration to
  recognise the constant values and trigger.

Specialising recursive functions is (at the moment) controlled by option
-func-specialization-max-iters and is opt-in for compile-time reasons. I.e.,
the default is -func-specialization-max-iters=1, but for the example above we
would need to use -func-specialization-max-iters=3. Future work is to see if we
can increase the default, or improve the cost-model/heuristics to control
compile-times.

Differential Revision: https://reviews.llvm.org/D106426

30fbb069

[GlobalOpt] Fix the load types when OptimizeGlobalAddressOfMalloc · 2d9759c7

Shimin Cui authored Aug 03, 2021

Currently, in OptimizeGlobalAddressOfMalloc, the transformation for global loads assumes that they have the same Type. With the support of ConstantExpr (https://reviews.llvm.org/D106589), this may not be true any more (as seen in the test case), and we miss the code to handle this, This is to fix that.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107397

2d9759c7

Aug 03, 2021

ThinLTO: Fix inline assembly references to static functions with CFI · 7ce1c4da

Sami Tolvanen authored Aug 03, 2021

Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Relands 700d07f8 with -msvc targets
fixed.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058

7ce1c4da

[GlobalOpt] Fix the assert for stored once non-pointer to global address · 7ce98cf5

Shimin Cui authored Aug 02, 2021

This is to fix the assert @bjope reported due to the code change of https://reviews.llvm.org/D106589. The test case from @bjope is also included.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107302

7ce98cf5

Aug 01, 2021

[GlobalOpt] support ConstantExpr use of global address for OptimizeGlobalAddressOfMalloc · 732b0555

Shimin Cui authored Jul 31, 2021

I'm working on extending the OptimizeGlobalAddressOfMalloc to handle some more general cases. This is to add support of the ConstantExpr use of the global variables. The function allUsesOfLoadedValueWillTrapIfNull is now iterative with the added CE use of GV. Also, the recursive function valueIsOnlyUsedLocallyOrStoredToOneGlobal is changed to iterative using a worklist with the GEP case added.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D106589

732b0555

Jul 30, 2021

[OpenMP] Adding flags for disabling the following optimizations:... · cd0dd8ec

Joseph Huber authored Jul 29, 2021

[OpenMP] Adding flags for disabling the following optimizations: Deglobalization SPMDization State machine rewrites Folding

This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following:
 - openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared.
 - openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode.
 - openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall.
 - openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106802

cd0dd8ec

Jul 29, 2021

[Attributor] Change function internalization to not replace uses in internalized callers · adbaa39d

Joseph Huber authored Jul 27, 2021

The current implementation of function internalization creats a copy of each
function and replaces every use. This has the downside that the external
versions of the functions will call into the internalized versions of the
functions. This prevents them from being fully independent of eachother. This
patch replaces the current internalization scheme with a method that creates
all the copies of the functions intended to be internalized first and then
replaces the uses as long as their caller is not already internalized.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106931

adbaa39d

Jul 28, 2021

[OpenMP] Folding threadLimit and numThreads when single value in kernels · 5ab6aedd

Jose M Monsalve Diaz authored Jul 27, 2021

The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block`
and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and
`NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions.
The code checks all the kernels, and if their attributes match, the functions are folded.

In the future we will explore specializing for multiple values of NumThreads and NumTeams.

Depends on D106390

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D106033

5ab6aedd

Reapply "[Attributor] Disable simplification AAs if a callback is present"" · 3dca8396

Johannes Doerfert authored Jul 27, 2021

This reapplies commit cbb709e2 and
includes the use of the lookup method instead of operator[] to avoid
accidentally setting (empty) simplification callbacks.

This reverts commit aa27430a.

3dca8396

Revert "[Attributor] Disable simplification AAs if a callback is present" · aa27430a
Johannes Doerfert authored Jul 27, 2021
```
This reverts commit cbb709e2 as it
breaks the tests, which was not supposed to happen. Investigating now.
```
aa27430a
[Attributor] Verify `checkForAllUses` return value properly · fd520e75
Johannes Doerfert authored Jul 27, 2021
```
Also do not emit more than one remark after Heap2Stack failed.
```
fd520e75

[Attributor] Disable simplification AAs if a callback is present · cbb709e2

Johannes Doerfert authored Jul 27, 2021

AAValueSimplify, AAValueConstantRange, and AAPotentialValues all look at
the IR by default. If queried for a IR position which has a
simplification callback we should either look at the callback return, or
give up. We do the latter for now.

cbb709e2

Jul 27, 2021

Add jump-threading optimization for deterministic finite automata · 02077da7

Alexey Zhikhartsev authored Jul 27, 2021

The current JumpThreading pass does not jump thread loops since it can
result in irreducible control flow that harms other optimizations. This
prevents switch statements inside a loop from being optimized to use
unconditional branches.

This code pattern occurs in the core_state_transition function of
Coremark. The state machine can be implemented manually with goto
statements resulting in a large runtime improvement, and this transform
makes the switch implementation match the goto version in performance.

This patch specifically targets switch statements inside a loop that
have the opportunity to be threaded. Once it identifies an opportunity,
it creates new paths that branch directly to the correct code block.
For example, the left CFG could be transformed to the right CFG:

```
          sw.bb                        sw.bb
        /   |   \                    /   |   \
   case1  case2  case3          case1  case2  case3
        \   |   /                /       |       \
        latch.bb             latch.2  latch.3  latch.1
         br sw.bb              /         |         \
                           sw.bb.2     sw.bb.3     sw.bb.1
                            br case2    br case3    br case1
```

Co-author: Justin Kreiner @jkreiner
Co-author: Ehsan Amiri @amehsan

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D99205

02077da7

[OpenMP] Try to simplify all loads in device code · 70b75f62

Johannes Doerfert authored Jul 26, 2021

Eliminating loads/stores in the device code is worth the extra effort,
especially for the new device runtime.

At the same time we do not compute AAExecutionDomain for non-device code
anymore, there is no point.

Differential Revision: https://reviews.llvm.org/D106845

70b75f62

[Attributor][FIX] Copy all members in the assignment operator · c55e1882
Johannes Doerfert authored Jul 27, 2021
```
Also improve debug output slightly.
```
c55e1882

[Attributor] Utilize the InstSimplify interface to simplify instructions · d4bfce55

Johannes Doerfert authored Jul 15, 2021

When we simplify at least one operand in the Attributor simplification
we can use the InstSimplify to work on the simplified operands. This
allows us to avoid duplication of the logic.

Depends on D106189

Differential Revision: https://reviews.llvm.org/D106190

d4bfce55

[CSSPGO] Tweak ICP threshold in top-down inliner · f0d41b58

wlei authored Jul 22, 2021

This change slightly relaxed the current ICP threshold in top-down inliner, specifically always allow one ICP for it. It shows some perf improvements on SPEC and our internal benchmarks. Also renamed the previous flag. We can also try to turn off PGO ICP in the future.

Reviewed By: wenlei, hoy, wmi

Differential Revision: https://reviews.llvm.org/D106588

f0d41b58

[Local] Do not introduce a new `llvm.trap` before `unreachable` · 25a3130d

Johannes Doerfert authored Jul 19, 2021

This is the second attempt to remove the `llvm.trap` insertion after
https://reviews.llvm.org/rGe14e7bc4b889dfaffb7180d176a03311df2d4ae6
reverted the first one. It is not clear what the exact issue was back
then and it might already be gone by now, it has been >5 years after
all.

Replaces D106299.

Differential Revision: https://reviews.llvm.org/D106308

25a3130d

[Attributor] Delete dead stores · 41bd26df

Johannes Doerfert authored Jul 16, 2021

D106185 allows us to determine if a store is needed easily. Using that
knowledge we can start to delete dead stores.

In AAIsDead we now track more state as an instruction can be dead (= the
old optimisitc state) or just "removable". A store instruction can be
removable while being very much alive, e.g., if it stores a constant
into an alloca or internal global. If we would pretend it was dead
instead of only removablewe we would ignore it when we determine what
values a load can see, so that is not what we want.

Differential Revision: https://reviews.llvm.org/D106188

41bd26df

[Attributor] Introduce getPotentialCopiesOfStoredValue and use it · adddd3db

Johannes Doerfert authored Jul 11, 2021

This patch introduces `getPotentialCopiesOfStoredValue` which uses
AAPointerInfo to determine all "aliases" or "potential copies" of a
value that is stored into memory. This operation can fail but if it
succeeds it means we can visit all "uses" of a value even if it is
temporarily stored in memory.

There are two users for the function:
  1) `Attributor::checkForAllUses` which will now ignore the value use
     in a store if all "potential copies" can be identified and instead
     be visited. This allows various AAs, including AAPointerInfo
     itself, to look through memory.
  2) `AANoCapture` which uses a custom use tracking through the
     CaptureTracker interface and therefore needs to be thought
     explicitly.

Differential Revision: https://reviews.llvm.org/D106185

adddd3db

[AbstractAttributor] Fold __kmpc_parallel_level if possible · e97e0a4f

Shilei Tian authored Jul 26, 2021

Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible.
Note that `__kmpc_parallel_level` doesn't take activeness into consideration,
based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead
of 0, 129, 130, etc. that also indicate activeness.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106154

e97e0a4f

[OpenMP] Run rewriteDeviceCodeStateMachine in the Module not CGSCC pass · be2b5696

Johannes Doerfert authored Jul 20, 2021

While rewriteDeviceCodeStateMachine should probably be folded into
buildCustomStateMachine, we at least need the optimization to happen.
This was not reliably the case in the CGSCC pass but in the Module pass
it seems to work reliably.

This also ports a test to the new kernel encoding (target_init/deinit),
and makes sure we cannot run the kernel in SPMD mode.

Differential Revision: https://reviews.llvm.org/D106345

be2b5696

[Attributor][FIX] Do not return CHANGED unconditionally · e6f3e648

Johannes Doerfert authored Jul 26, 2021

This caused us to rerun AAMemoryBehaviorFloating::updateImpl over and
over again. Unfortunately it turned out to be hard to reproduce the
behavior in a reasonable way.

e6f3e648

[Attributor][FIX] Track change status for AAIsDead properly · 8befd05a

Johannes Doerfert authored Jul 26, 2021

If we add a new live edge we need to indicate a change or otherwise the
new live block is not shown to users. Similarly, new known dead ends and
a changed `ToBeExploredFrom` set need to cause us to return CHANGED.

8befd05a

Jul 26, 2021

[OpenMP][NFC] Remove unncessary capture in RAII struct · e757a3b0

Joseph Huber authored Jul 26, 2021

Summary:
There was an unnecessary variable assigned to the information cache when we
only need it in the constructor to extract the function declaration.

e757a3b0

Jul 25, 2021

[OpenMP] Introduce RAII to protect certain RTL calls from DCE · 58725c12

Joseph Huber authored Jul 23, 2021

This patch introduces a new RAII struct that will temporarily make an OpenMP
RTL function have external linkage. This is done before the attributor is
invoked to prevent it from incorrectly removing some function definitions that
we will use later. For example, if we determine all calls to one function are
dead, because it has internal linkage it can safely be removed. Later when we
try to get an instance to that function to modify the source using
`getOrCreateRuntimeFunction` we will then get an empty declaration for that
function that won't be defined anywhere. This patch prevents this from
occurring.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106707

58725c12

[Attributes] Clean up handling of UB implying attributes (NFC) · 087a8eea

Nikita Popov authored Jul 25, 2021

Rather than adding methods for dropping these attributes in
various places, add a function that returns an AttrBuilder with
these attributes, which can then be used with existing methods
for dropping attributes. This is with an eye on D104641, which
also needs to drop them from returns, not just parameters.

Also be more explicit about the semantics of the method in the
documentation. Refer to UB rather than Undef, which is what this
is actually about.

087a8eea

Jul 24, 2021

[AMDGPU] Deduce attributes with the Attributor · 96709823

Kuter Dinel authored Jun 27, 2021

This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.

Reviewed By: jdoerfert, arsenm

Differential Revision: https://reviews.llvm.org/D104997

96709823

[Attributor][FIX] checkForAllInstructions, correctly handle declarations · 0cd964ff

Kuter Dinel authored Jul 22, 2021

checkForAllInstructions was not handling declarations correctly.
It should have been returning false when it gets called on a declaration

The patch also fixes a test case for AAFunctionReachability for it to be able
to pass after the changes to the checkForAllinstructions.

Differential Revision: https://reviews.llvm.org/D106625

0cd964ff

Jul 23, 2021

[AbstractAttributor] Refine logic to indicate pessimistic fixed point when... · ae69f468

Shilei Tian authored Jul 23, 2021

[AbstractAttributor] Refine logic to indicate pessimistic fixed point when folding `__kmpc_is_spmd_exec_mode`

Since we are using assumed information now, the logic should be refined to avoid
unncessary assertion.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106630

ae69f468