Skip to content
  1. Aug 12, 2021
  2. Aug 11, 2021
  3. Aug 06, 2021
  4. Aug 04, 2021
    • Giorgis Georgakoudis's avatar
      [OpenMPOpt] Expand SPMDization with guarding for target parallel regions · 29a3e3dd
      Giorgis Georgakoudis authored
      This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA.
      
      Reviewed By: jhuber6
      
      Differential Revision: https://reviews.llvm.org/D106892
      29a3e3dd
    • Sjoerd Meijer's avatar
      [FuncSpec] Support specialising recursive functions · 30fbb069
      Sjoerd Meijer authored
      This adds support for specialising recursive functions. For example:
      
          int Global = 1;
          void recursiveFunc(int *arg) {
            if (*arg < 4) {
              print(*arg);
              recursiveFunc(*arg + 1);
            }
          }
          void main() {
            recursiveFunc(&Global);
          }
      
      After 3 iterations of function specialisation, followed by inlining of the
      specialised versions of recursiveFunc, the main function looks like this:
      
          void main() {
            print(1);
            print(2);
            print(3);
          }
      
      To support this, the following has been added:
      - Update the solver and state of the new specialised functions,
      - An optimisation to propagate constant stack values after each iteration of
        function specialisation, which is necessary for the next iteration to
        recognise the constant values and trigger.
      
      Specialising recursive functions is (at the moment) controlled by option
      -func-specialization-max-iters and is opt-in for compile-time reasons. I.e.,
      the default is -func-specialization-max-iters=1, but for the example above we
      would need to use -func-specialization-max-iters=3. Future work is to see if we
      can increase the default, or improve the cost-model/heuristics to control
      compile-times.
      
      Differential Revision: https://reviews.llvm.org/D106426
      30fbb069
    • Shimin Cui's avatar
      [GlobalOpt] Fix the load types when OptimizeGlobalAddressOfMalloc · 2d9759c7
      Shimin Cui authored
      Currently, in OptimizeGlobalAddressOfMalloc, the transformation for global loads assumes that they have the same Type. With the support of ConstantExpr (https://reviews.llvm.org/D106589), this may not be true any more (as seen in the test case), and we miss the code to handle this, This is to fix that.
      
      Reviewed By: efriedma
      
      Differential Revision: https://reviews.llvm.org/D107397
      2d9759c7
  5. Aug 03, 2021
  6. Aug 01, 2021
  7. Jul 30, 2021
    • Joseph Huber's avatar
      [OpenMP] Adding flags for disabling the following optimizations:... · cd0dd8ec
      Joseph Huber authored
      [OpenMP] Adding flags for disabling the following optimizations: Deglobalization SPMDization State machine rewrites Folding
      
      This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following:
       - openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared.
       - openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode.
       - openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall.
       - openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint.
      
      Reviewed By: jhuber6
      
      Differential Revision: https://reviews.llvm.org/D106802
      cd0dd8ec
  8. Jul 29, 2021
    • Joseph Huber's avatar
      [Attributor] Change function internalization to not replace uses in internalized callers · adbaa39d
      Joseph Huber authored
      The current implementation of function internalization creats a copy of each
      function and replaces every use. This has the downside that the external
      versions of the functions will call into the internalized versions of the
      functions. This prevents them from being fully independent of eachother. This
      patch replaces the current internalization scheme with a method that creates
      all the copies of the functions intended to be internalized first and then
      replaces the uses as long as their caller is not already internalized.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D106931
      adbaa39d
  9. Jul 28, 2021
  10. Jul 27, 2021
    • Alexey Zhikhartsev's avatar
      Add jump-threading optimization for deterministic finite automata · 02077da7
      Alexey Zhikhartsev authored
      The current JumpThreading pass does not jump thread loops since it can
      result in irreducible control flow that harms other optimizations. This
      prevents switch statements inside a loop from being optimized to use
      unconditional branches.
      
      This code pattern occurs in the core_state_transition function of
      Coremark. The state machine can be implemented manually with goto
      statements resulting in a large runtime improvement, and this transform
      makes the switch implementation match the goto version in performance.
      
      This patch specifically targets switch statements inside a loop that
      have the opportunity to be threaded. Once it identifies an opportunity,
      it creates new paths that branch directly to the correct code block.
      For example, the left CFG could be transformed to the right CFG:
      
      ```
                sw.bb                        sw.bb
              /   |   \                    /   |   \
         case1  case2  case3          case1  case2  case3
              \   |   /                /       |       \
              latch.bb             latch.2  latch.3  latch.1
               br sw.bb              /         |         \
                                 sw.bb.2     sw.bb.3     sw.bb.1
                                  br case2    br case3    br case1
      ```
      
      Co-author: Justin Kreiner @jkreiner
      Co-author: Ehsan Amiri @amehsan
      
      Reviewed By: SjoerdMeijer
      
      Differential Revision: https://reviews.llvm.org/D99205
      02077da7
    • Johannes Doerfert's avatar
      [OpenMP] Try to simplify all loads in device code · 70b75f62
      Johannes Doerfert authored
      Eliminating loads/stores in the device code is worth the extra effort,
      especially for the new device runtime.
      
      At the same time we do not compute AAExecutionDomain for non-device code
      anymore, there is no point.
      
      Differential Revision: https://reviews.llvm.org/D106845
      70b75f62
    • Johannes Doerfert's avatar
      [Attributor][FIX] Copy all members in the assignment operator · c55e1882
      Johannes Doerfert authored
      Also improve debug output slightly.
      c55e1882
    • Johannes Doerfert's avatar
      [Attributor] Utilize the InstSimplify interface to simplify instructions · d4bfce55
      Johannes Doerfert authored
      When we simplify at least one operand in the Attributor simplification
      we can use the InstSimplify to work on the simplified operands. This
      allows us to avoid duplication of the logic.
      
      Depends on D106189
      
      Differential Revision: https://reviews.llvm.org/D106190
      d4bfce55
    • wlei's avatar
      [CSSPGO] Tweak ICP threshold in top-down inliner · f0d41b58
      wlei authored
      This change slightly relaxed the current ICP threshold in top-down inliner, specifically always allow one ICP for it. It shows some perf improvements on SPEC and our internal benchmarks. Also renamed the previous flag. We can also try to turn off PGO ICP in the future.
      
      Reviewed By: wenlei, hoy, wmi
      
      Differential Revision: https://reviews.llvm.org/D106588
      f0d41b58
    • Johannes Doerfert's avatar
      [Local] Do not introduce a new `llvm.trap` before `unreachable` · 25a3130d
      Johannes Doerfert authored
      This is the second attempt to remove the `llvm.trap` insertion after
      https://reviews.llvm.org/rGe14e7bc4b889dfaffb7180d176a03311df2d4ae6
      reverted the first one. It is not clear what the exact issue was back
      then and it might already be gone by now, it has been >5 years after
      all.
      
      Replaces D106299.
      
      Differential Revision: https://reviews.llvm.org/D106308
      25a3130d
    • Johannes Doerfert's avatar
      [Attributor] Delete dead stores · 41bd26df
      Johannes Doerfert authored
      D106185 allows us to determine if a store is needed easily. Using that
      knowledge we can start to delete dead stores.
      
      In AAIsDead we now track more state as an instruction can be dead (= the
      old optimisitc state) or just "removable". A store instruction can be
      removable while being very much alive, e.g., if it stores a constant
      into an alloca or internal global. If we would pretend it was dead
      instead of only removablewe we would ignore it when we determine what
      values a load can see, so that is not what we want.
      
      Differential Revision: https://reviews.llvm.org/D106188
      41bd26df
    • Johannes Doerfert's avatar
      [Attributor] Introduce getPotentialCopiesOfStoredValue and use it · adddd3db
      Johannes Doerfert authored
      This patch introduces `getPotentialCopiesOfStoredValue` which uses
      AAPointerInfo to determine all "aliases" or "potential copies" of a
      value that is stored into memory. This operation can fail but if it
      succeeds it means we can visit all "uses" of a value even if it is
      temporarily stored in memory.
      
      There are two users for the function:
        1) `Attributor::checkForAllUses` which will now ignore the value use
           in a store if all "potential copies" can be identified and instead
           be visited. This allows various AAs, including AAPointerInfo
           itself, to look through memory.
        2) `AANoCapture` which uses a custom use tracking through the
           CaptureTracker interface and therefore needs to be thought
           explicitly.
      
      Differential Revision: https://reviews.llvm.org/D106185
      adddd3db
    • Shilei Tian's avatar
      [AbstractAttributor] Fold __kmpc_parallel_level if possible · e97e0a4f
      Shilei Tian authored
      Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible.
      Note that `__kmpc_parallel_level` doesn't take activeness into consideration,
      based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead
      of 0, 129, 130, etc. that also indicate activeness.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D106154
      e97e0a4f
    • Johannes Doerfert's avatar
      [OpenMP] Run rewriteDeviceCodeStateMachine in the Module not CGSCC pass · be2b5696
      Johannes Doerfert authored
      While rewriteDeviceCodeStateMachine should probably be folded into
      buildCustomStateMachine, we at least need the optimization to happen.
      This was not reliably the case in the CGSCC pass but in the Module pass
      it seems to work reliably.
      
      This also ports a test to the new kernel encoding (target_init/deinit),
      and makes sure we cannot run the kernel in SPMD mode.
      
      Differential Revision: https://reviews.llvm.org/D106345
      be2b5696
    • Johannes Doerfert's avatar
      [Attributor][FIX] Do not return CHANGED unconditionally · e6f3e648
      Johannes Doerfert authored
      This caused us to rerun AAMemoryBehaviorFloating::updateImpl over and
      over again. Unfortunately it turned out to be hard to reproduce the
      behavior in a reasonable way.
      e6f3e648
    • Johannes Doerfert's avatar
      [Attributor][FIX] Track change status for AAIsDead properly · 8befd05a
      Johannes Doerfert authored
      If we add a new live edge we need to indicate a change or otherwise the
      new live block is not shown to users. Similarly, new known dead ends and
      a changed `ToBeExploredFrom` set need to cause us to return CHANGED.
      8befd05a
  11. Jul 26, 2021
  12. Jul 25, 2021
    • Joseph Huber's avatar
      [OpenMP] Introduce RAII to protect certain RTL calls from DCE · 58725c12
      Joseph Huber authored
      This patch introduces a new RAII struct that will temporarily make an OpenMP
      RTL function have external linkage. This is done before the attributor is
      invoked to prevent it from incorrectly removing some function definitions that
      we will use later. For example, if we determine all calls to one function are
      dead, because it has internal linkage it can safely be removed. Later when we
      try to get an instance to that function to modify the source using
      `getOrCreateRuntimeFunction` we will then get an empty declaration for that
      function that won't be defined anywhere. This patch prevents this from
      occurring.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D106707
      58725c12
    • Nikita Popov's avatar
      [Attributes] Clean up handling of UB implying attributes (NFC) · 087a8eea
      Nikita Popov authored
      Rather than adding methods for dropping these attributes in
      various places, add a function that returns an AttrBuilder with
      these attributes, which can then be used with existing methods
      for dropping attributes. This is with an eye on D104641, which
      also needs to drop them from returns, not just parameters.
      
      Also be more explicit about the semantics of the method in the
      documentation. Refer to UB rather than Undef, which is what this
      is actually about.
      087a8eea
  13. Jul 24, 2021
  14. Jul 23, 2021
Loading