Skip to content
  1. Nov 10, 2020
    • Sjoerd Meijer's avatar
      [LoopFlatten] Run it earlier, just before IndVarSimplify · 2ef47910
      Sjoerd Meijer authored
      This is a prep step for widening induction variables in LoopFlatten if this is
      posssible (D90640), to avoid having to perform certain overflow checks. Since
      IndVarSimplify may already widen induction variables, we want to run
      LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both
      passes were already close after each other.
      
      Differential Revision: https://reviews.llvm.org/D90402
      2ef47910
    • Sanne Wouda's avatar
      Add loop distribution to the LTO pipeline · dd03881b
      Sanne Wouda authored
      The LoopDistribute pass is missing from the LTO pipeline, so
      -enable-loop-distribute has no effect during post-link. The pre-link
      loop distribution doesn't seem to survive the LTO pipeline either.
      
      With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
      uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
      unaffected.
      
      Differential Revision: https://reviews.llvm.org/D89896
      dd03881b
    • Michael Kruse's avatar
      [OMPIRBuilder] Start 'Create' methods with lower case. NFC. · e5dba2d7
      Michael Kruse authored
      For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.
      
      This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.
      
      I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.
      
      Reviewed By: mehdi_amini, fghanim
      
      Differential Revision: https://reviews.llvm.org/D91109
      e5dba2d7
  2. Nov 03, 2020
  3. Nov 02, 2020
  4. Oct 31, 2020
  5. Oct 30, 2020
  6. Oct 29, 2020
  7. Oct 28, 2020
  8. Oct 27, 2020
  9. Oct 24, 2020
    • Hongtao Yu's avatar
      [AutoFDO] Remove a broken assert in merging inlinee samples · a16cbdd6
      Hongtao Yu authored
      Duplicated callsites share the same callee profile if the original callsite was inlined. The sharing also causes the profile of callee's callee to be shared. This breaks the assert introduced ealier by D84997 in a tricky way.
      
      To illustrate, I'm using an abstract example. Say we have three functions `A`, `B` and `C`. A calls B twice and B calls C once. Some optimize performed prior to the sample profile loader duplicates first callsite to `B` and the program may look like
      
      ```
      A()
      {
        B();  // with nested profile B1 and C1
        B();  // duplicated, with nested profile B1 and C1
        B();  // with nested profile B2 and C2
      }
      ```
      
      For some reason, the sample profile loader inliner then decides to only inline the first callsite in `A` and transforms `A` into
      
      ```
      A()
      {
        C();  // with nested profile C1
        B();  // duplicated, with nested profile B1 and C1
        B();  // with nested profile B2 and C2.
      }
      ```
      
      Here is what happens next:
      
      	1. Failing to inline the callsite `C()` results in `C1`'s samples returned to `C`'s base (outlined) profile. In the meantime, `C1`'s head samples are updated to `C1`'s entry sample. This also affects the profile of the middle callsite which shares `C1` with the first callsite.
      	2. Failing to inline the middle callsite results in `B1` returned to `B`'s base profile, which in turn will cause `C1` merged into `B`'s base profile. Note that the nest `C` profile in `B`'s base has a non-zero head sample count now. The value actually equals to `C1`'s entry count.
      	3. Failing to inline last callsite results in `B2` returned to `B`'s base profile. Note that the nested `C` profile in `B`'s base now has an entry count equal to the sum of that of `C1` and `C2`, with the head count equal to that of `C1`. This will trigger the assert later on.
              4. Compiling `B` using `B`'s base profile. Failing to inline `C` there triggers the returning of the nested `C` profile. Since the nested `C` profile has a non-zero head count, the returning doesn't go through. Instead, the assert goes off.
      
      It's good that `C1` is only returned once, based on using a non-zero head count to ensure an inline profile is only returned once. However C2 is never returned. While it seems hard to solve this perfectly within the current framework, I'm just removing the broken assert. This should be reasonably fixed by the upcoming CSSPGO work where counts returning is based on context-sensitivity and a distribution factor for callsite probes.
      
      The simple example is extracted from one of our internal services. In reality, why the original callsite `B()` and duplicate one having different inline behavior is a magic. It has to do with imperfect counts in profile and extra complicated inlining that makes the hotness for them different.
      
      Reviewed By: wenlei
      
      Differential Revision: https://reviews.llvm.org/D90056
      a16cbdd6
    • Arthur Eubanks's avatar
      [Inliner][NPM] Properly pass callee AAResults · ba22c403
      Arthur Eubanks authored
      Fixes noalias-calls.ll under NPM.
      
      Differential Revision: https://reviews.llvm.org/D89592
      ba22c403
  10. Oct 23, 2020
    • Nick Desaulniers's avatar
      [IR] add fn attr for no_stack_protector; prevent inlining on mismatch · b7926ce6
      Nick Desaulniers authored
      It's currently ambiguous in IR whether the source language explicitly
      did not want a stack a stack protector (in C, via function attribute
      no_stack_protector) or doesn't care for any given function.
      
      It's common for code that manipulates the stack via inline assembly or
      that has to set up its own stack canary (such as the Linux kernel) would
      like to avoid stack protectors in certain functions. In this case, we've
      been bitten by numerous bugs where a callee with a stack protector is
      inlined into an __attribute__((__no_stack_protector__)) caller, which
      generally breaks the caller's assumptions about not having a stack
      protector. LTO exacerbates the issue.
      
      While developers can avoid this by putting all no_stack_protector
      functions in one translation unit together and compiling those with
      -fno-stack-protector, it's generally not very ergonomic or as
      ergonomic as a function attribute, and still doesn't work for LTO. See also:
      https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
      https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u
      
      Typically, when inlining a callee into a caller, the caller will be
      upgraded in its level of stack protection (see adjustCallerSSPLevel()).
      By adding an explicit attribute in the IR when the function attribute is
      used in the source language, we can now identify such cases and prevent
      inlining.  Block inlining when the callee and caller differ in the case that one
      contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.
      
      Fixes pr/47479.
      
      Reviewed By: void
      
      Differential Revision: https://reviews.llvm.org/D87956
      b7926ce6
    • Caroline Concatto's avatar
      [SVE]Clarify TypeSize comparisons in llvm/lib/Transforms · 24156364
      Caroline Concatto authored
      Use isKnownXY comparators when one of the operands can be with
      scalable vectors or getFixedSize() for all the other cases.
      
      This patch also does bug fixes for getPrimitiveSizeInBits by using
      getFixedSize() near the places with the TypeSize comparison.
      
      Differential Revision: https://reviews.llvm.org/D89703
      24156364
    • Arthur Eubanks's avatar
      [Inliner] Run always-inliner in inliner-wrapper · 0291e2c9
      Arthur Eubanks authored
      An alwaysinline function may not get inlined in inliner-wrapper due to
      the inlining order.
      
      Previously for the following, the inliner would first inline @a() into @b(),
      
      ```
      define void @a() {
      entry:
        call void @b()
        ret void
      }
      
      define void @b() alwaysinline {
      entry:
        br label %for.cond
      
      for.cond:
        call void @a()
        br label %for.cond
      }
      ```
      
      making @b() recursive and unable to be inlined into @a(), ending at
      
      ```
      define void @a() {
      entry:
        call void @b()
        ret void
      }
      
      define void @b() alwaysinline {
      entry:
        br label %for.cond
      
      for.cond:
        call void @b()
        br label %for.cond
      }
      ```
      
      Running always-inliner first makes sure that we respect alwaysinline in more cases.
      
      Fixes https://bugs.llvm.org/show_bug.cgi?id=46945.
      
      Reviewed By: davidxl, rnk
      
      Differential Revision: https://reviews.llvm.org/D86988
      0291e2c9
  11. Oct 22, 2020
  12. Oct 21, 2020
  13. Oct 19, 2020
    • Hans Wennborg's avatar
      Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting" · 0628bea5
      Hans Wennborg authored
      This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
      on unintentionally. See comment on the code review for repro etc.
      
      > This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
      > the splitting pass to be toggled on/off. The current method of passing
      > `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
      > correctly (say, with `-O0` or `-Oz`).
      >
      > To implement the -fsplit-cold-code option, an attribute is applied to
      > functions to indicate that they may be considered for splitting. This
      > removes some complexity from the old/new PM pipeline builders, and
      > behaves as expected when LTO is enabled.
      >
      > Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
      > Differential Revision: https://reviews.llvm.org/D57265
      > Reviewed By: Aditya Kumar, Vedant Kumar
      > Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
      
      This reverts commit 273c299d.
      0628bea5
  14. Oct 16, 2020
    • Vedant Kumar's avatar
      [PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting · 273c299d
      Vedant Kumar authored
      This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
      the splitting pass to be toggled on/off. The current method of passing
      `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
      correctly (say, with `-O0` or `-Oz`).
      
      To implement the -fsplit-cold-code option, an attribute is applied to
      functions to indicate that they may be considered for splitting. This
      removes some complexity from the old/new PM pipeline builders, and
      behaves as expected when LTO is enabled.
      
      Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
      Differential Revision: https://reviews.llvm.org/D57265
      Reviewed By: Aditya Kumar, Vedant Kumar
      Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
      273c299d
  15. Oct 14, 2020
  16. Oct 09, 2020
    • Giorgis Georgakoudis's avatar
      [OpenMPOpt] Merge parallel regions · 3a6bfcf2
      Giorgis Georgakoudis authored
      There are cases that generated OpenMP code consists of multiple,
      consecutive OpenMP parallel regions, either due to high-level
      programming models, such as RAJA, Kokkos, lowering to OpenMP code, or
      simply because the programmer parallelized code this way.  This
      optimization merges consecutive parallel OpenMP regions to: (1) reduce
      the runtime overhead of re-activating a team of threads; (2) enlarge the
      scope for other OpenMP optimizations, e.g., runtime call deduplication
      and synchronization elimination.
      
      This implementation defensively merges parallel regions, only when they
      are within the same BB and any in-between instructions are safe to
      execute in parallel.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D83635
      3a6bfcf2
  17. Oct 07, 2020
Loading