- Aug 12, 2021
-
-
Liqiang Tao authored
Move InlineOrder to separated file. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D107831
-
- Aug 11, 2021
-
-
Johannes Doerfert authored
Failed for some reason, potentially because of the inner type declaration in combination with the `using`. This might help. Failure: https://lab.llvm.org/buildbot/#/builders/127/builds/15432
-
Johannes Doerfert authored
PHI nodes are not pass through but change their value, we have to account for that to avoid missing stores. Follow up for D107798 to fix PR51249 for good. Differential Revision: https://reviews.llvm.org/D107808
-
Johannes Doerfert authored
AAPointerInfoFloating needs to visit all uses and some multiple times if we go through PHI nodes. Attributor::checkForAllUses keeps a visited set so we don't recurs endlessly. We now allow recursion for non-phi uses so we track all pointer offsets via PHI nodes properly without endless recursion. This replaces the first attempt D107579. Differential Revision: https://reviews.llvm.org/D107798
-
Johannes Doerfert authored
To avoid simplification with wrong constants we need to make sure we know that we won't perform specific optimizations based on the users request. The non-SPMDzation and non-CustomStateMachine flags did only prevent the final transformation but allowed to value simplification to go ahead. Differential Revision: https://reviews.llvm.org/D107862
-
- Aug 06, 2021
-
-
Chuanqi Xu authored
The may get changed before specialization by RunSCCPSolver. In other words, the pass may change the function without specialization happens. Add test and comment to reveal this. And it may return No Changed if the function get changed by RunSCCPSolver before the specialization. It looks like a potential bug. Test Plan: check-all Reviewed By: https://reviews.llvm.org/D107622 Differential Revision: https://reviews.llvm.org/D107622
-
Chuanqi Xu authored
-
Chuanqi Xu authored
Noticed that the computation for function specialization cost of a function wouldn't change during the traversal of the arguments for the function. We could hoist the computation out of the traversal. I observed about ~1% improvement on compile time for spec2017. But I guess it may not be precise. This should be NFC and fine. Reviewed By: Sjoerd Meijer Differential Revision: https://reviews.llvm.org/D107621
-
Chuanqi Xu authored
Now the recursive functions may get specialized many times when `func-specialization-max-iters` increases. See discussion in https://reviews.llvm.org/D106426 for details.
-
- Aug 04, 2021
-
-
Giorgis Georgakoudis authored
This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D106892
-
Sjoerd Meijer authored
This adds support for specialising recursive functions. For example: int Global = 1; void recursiveFunc(int *arg) { if (*arg < 4) { print(*arg); recursiveFunc(*arg + 1); } } void main() { recursiveFunc(&Global); } After 3 iterations of function specialisation, followed by inlining of the specialised versions of recursiveFunc, the main function looks like this: void main() { print(1); print(2); print(3); } To support this, the following has been added: - Update the solver and state of the new specialised functions, - An optimisation to propagate constant stack values after each iteration of function specialisation, which is necessary for the next iteration to recognise the constant values and trigger. Specialising recursive functions is (at the moment) controlled by option -func-specialization-max-iters and is opt-in for compile-time reasons. I.e., the default is -func-specialization-max-iters=1, but for the example above we would need to use -func-specialization-max-iters=3. Future work is to see if we can increase the default, or improve the cost-model/heuristics to control compile-times. Differential Revision: https://reviews.llvm.org/D106426
-
Shimin Cui authored
Currently, in OptimizeGlobalAddressOfMalloc, the transformation for global loads assumes that they have the same Type. With the support of ConstantExpr (https://reviews.llvm.org/D106589), this may not be true any more (as seen in the test case), and we miss the code to handle this, This is to fix that. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107397
-
- Aug 03, 2021
-
-
Sami Tolvanen authored
Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Relands 700d07f8 with -msvc targets fixed. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D104058
-
Shimin Cui authored
This is to fix the assert @bjope reported due to the code change of https://reviews.llvm.org/D106589. The test case from @bjope is also included. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107302
-
- Aug 01, 2021
-
-
Shimin Cui authored
I'm working on extending the OptimizeGlobalAddressOfMalloc to handle some more general cases. This is to add support of the ConstantExpr use of the global variables. The function allUsesOfLoadedValueWillTrapIfNull is now iterative with the added CE use of GV. Also, the recursive function valueIsOnlyUsedLocallyOrStoredToOneGlobal is changed to iterative using a worklist with the GEP case added. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D106589
-
- Jul 30, 2021
-
-
Joseph Huber authored
[OpenMP] Adding flags for disabling the following optimizations: Deglobalization SPMDization State machine rewrites Folding This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following: - openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared. - openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode. - openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall. - openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D106802
-
- Jul 29, 2021
-
-
Joseph Huber authored
The current implementation of function internalization creats a copy of each function and replaces every use. This has the downside that the external versions of the functions will call into the internalized versions of the functions. This prevents them from being fully independent of eachother. This patch replaces the current internalization scheme with a method that creates all the copies of the functions intended to be internalized first and then replaces the uses as long as their caller is not already internalized. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106931
-
- Jul 28, 2021
-
-
Jose M Monsalve Diaz authored
The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033
-
Johannes Doerfert authored
This reapplies commit cbb709e2 and includes the use of the lookup method instead of operator[] to avoid accidentally setting (empty) simplification callbacks. This reverts commit aa27430a.
-
Johannes Doerfert authored
This reverts commit cbb709e2 as it breaks the tests, which was not supposed to happen. Investigating now.
-
Johannes Doerfert authored
Also do not emit more than one remark after Heap2Stack failed.
-
Johannes Doerfert authored
AAValueSimplify, AAValueConstantRange, and AAPotentialValues all look at the IR by default. If queried for a IR position which has a simplification callback we should either look at the callback return, or give up. We do the latter for now.
-
- Jul 27, 2021
-
-
Alexey Zhikhartsev authored
The current JumpThreading pass does not jump thread loops since it can result in irreducible control flow that harms other optimizations. This prevents switch statements inside a loop from being optimized to use unconditional branches. This code pattern occurs in the core_state_transition function of Coremark. The state machine can be implemented manually with goto statements resulting in a large runtime improvement, and this transform makes the switch implementation match the goto version in performance. This patch specifically targets switch statements inside a loop that have the opportunity to be threaded. Once it identifies an opportunity, it creates new paths that branch directly to the correct code block. For example, the left CFG could be transformed to the right CFG: ``` sw.bb sw.bb / | \ / | \ case1 case2 case3 case1 case2 case3 \ | / / | \ latch.bb latch.2 latch.3 latch.1 br sw.bb / | \ sw.bb.2 sw.bb.3 sw.bb.1 br case2 br case3 br case1 ``` Co-author: Justin Kreiner @jkreiner Co-author: Ehsan Amiri @amehsan Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D99205
-
Johannes Doerfert authored
Eliminating loads/stores in the device code is worth the extra effort, especially for the new device runtime. At the same time we do not compute AAExecutionDomain for non-device code anymore, there is no point. Differential Revision: https://reviews.llvm.org/D106845
-
Johannes Doerfert authored
Also improve debug output slightly.
-
Johannes Doerfert authored
When we simplify at least one operand in the Attributor simplification we can use the InstSimplify to work on the simplified operands. This allows us to avoid duplication of the logic. Depends on D106189 Differential Revision: https://reviews.llvm.org/D106190
-
wlei authored
This change slightly relaxed the current ICP threshold in top-down inliner, specifically always allow one ICP for it. It shows some perf improvements on SPEC and our internal benchmarks. Also renamed the previous flag. We can also try to turn off PGO ICP in the future. Reviewed By: wenlei, hoy, wmi Differential Revision: https://reviews.llvm.org/D106588
-
Johannes Doerfert authored
This is the second attempt to remove the `llvm.trap` insertion after https://reviews.llvm.org/rGe14e7bc4b889dfaffb7180d176a03311df2d4ae6 reverted the first one. It is not clear what the exact issue was back then and it might already be gone by now, it has been >5 years after all. Replaces D106299. Differential Revision: https://reviews.llvm.org/D106308
-
Johannes Doerfert authored
D106185 allows us to determine if a store is needed easily. Using that knowledge we can start to delete dead stores. In AAIsDead we now track more state as an instruction can be dead (= the old optimisitc state) or just "removable". A store instruction can be removable while being very much alive, e.g., if it stores a constant into an alloca or internal global. If we would pretend it was dead instead of only removablewe we would ignore it when we determine what values a load can see, so that is not what we want. Differential Revision: https://reviews.llvm.org/D106188
-
Johannes Doerfert authored
This patch introduces `getPotentialCopiesOfStoredValue` which uses AAPointerInfo to determine all "aliases" or "potential copies" of a value that is stored into memory. This operation can fail but if it succeeds it means we can visit all "uses" of a value even if it is temporarily stored in memory. There are two users for the function: 1) `Attributor::checkForAllUses` which will now ignore the value use in a store if all "potential copies" can be identified and instead be visited. This allows various AAs, including AAPointerInfo itself, to look through memory. 2) `AANoCapture` which uses a custom use tracking through the CaptureTracker interface and therefore needs to be thought explicitly. Differential Revision: https://reviews.llvm.org/D106185
-
Shilei Tian authored
Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible. Note that `__kmpc_parallel_level` doesn't take activeness into consideration, based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead of 0, 129, 130, etc. that also indicate activeness. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106154
-
Johannes Doerfert authored
While rewriteDeviceCodeStateMachine should probably be folded into buildCustomStateMachine, we at least need the optimization to happen. This was not reliably the case in the CGSCC pass but in the Module pass it seems to work reliably. This also ports a test to the new kernel encoding (target_init/deinit), and makes sure we cannot run the kernel in SPMD mode. Differential Revision: https://reviews.llvm.org/D106345
-
Johannes Doerfert authored
This caused us to rerun AAMemoryBehaviorFloating::updateImpl over and over again. Unfortunately it turned out to be hard to reproduce the behavior in a reasonable way.
-
Johannes Doerfert authored
If we add a new live edge we need to indicate a change or otherwise the new live block is not shown to users. Similarly, new known dead ends and a changed `ToBeExploredFrom` set need to cause us to return CHANGED.
-
- Jul 26, 2021
-
-
Joseph Huber authored
Summary: There was an unnecessary variable assigned to the information cache when we only need it in the constructor to extract the function declaration.
-
- Jul 25, 2021
-
-
Joseph Huber authored
This patch introduces a new RAII struct that will temporarily make an OpenMP RTL function have external linkage. This is done before the attributor is invoked to prevent it from incorrectly removing some function definitions that we will use later. For example, if we determine all calls to one function are dead, because it has internal linkage it can safely be removed. Later when we try to get an instance to that function to modify the source using `getOrCreateRuntimeFunction` we will then get an empty declaration for that function that won't be defined anywhere. This patch prevents this from occurring. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106707
-
Nikita Popov authored
Rather than adding methods for dropping these attributes in various places, add a function that returns an AttrBuilder with these attributes, which can then be used with existing methods for dropping attributes. This is with an eye on D104641, which also needs to drop them from returns, not just parameters. Also be more explicit about the semantics of the method in the documentation. Refer to UB rather than Undef, which is what this is actually about.
-
- Jul 24, 2021
-
-
Kuter Dinel authored
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes. Reviewed By: jdoerfert, arsenm Differential Revision: https://reviews.llvm.org/D104997
-
Kuter Dinel authored
checkForAllInstructions was not handling declarations correctly. It should have been returning false when it gets called on a declaration The patch also fixes a test case for AAFunctionReachability for it to be able to pass after the changes to the checkForAllinstructions. Differential Revision: https://reviews.llvm.org/D106625
-
- Jul 23, 2021
-
-
Shilei Tian authored
[AbstractAttributor] Refine logic to indicate pessimistic fixed point when folding `__kmpc_is_spmd_exec_mode` Since we are using assumed information now, the logic should be refined to avoid unncessary assertion. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106630
-