Skip to content
  1. Jan 19, 2019
  2. Jan 17, 2019
    • Vedant Kumar's avatar
      [HotColdSplit] Allow outlining with live outputs · f529b507
      Vedant Kumar authored
      Prior to r348205, extracting code regions with live output values was
      disabled because of a miscompilation (PR39433). Lift the restriction as
      PR39433 has been addressed.
      
      Tested on LNT+externals, on a run of check-llvm in a stage2 build, and
      with a full build of iOS (with hot/cold splitting enabled).
      
      As a drive-by, remove an errant TODO.
      
      llvm-svn: 351492
      f529b507
    • Vedant Kumar's avatar
      [HotColdSplit] Consider resume instructions to be cold · b70e20db
      Vedant Kumar authored
      Resuming exception unwinding is roughly as unlikely as throwing an
      exception.
      
      Tested on LNT+externals (in particular, the C++ EH regression tests
      provide end-to-end test coverage), as well as with a full build of iOS.
      
      llvm-svn: 351491
      b70e20db
    • Vedant Kumar's avatar
      [HotColdSplit] Relax requirement that the cold sink block be extractable · 4541be06
      Vedant Kumar authored
      Relaxing this requirement creates opportunities to split code dominated
      by an EH pad.
      
      Tested on LNT+externals.
      
      llvm-svn: 351483
      4541be06
    • Vedant Kumar's avatar
      [HotColdSplit] Simplify tests by lowering their splitting thresholds · 32a014d0
      Vedant Kumar authored
      This gets rid of the brittle/mysterious calls to @sink()/@sideeffect()
      peppered throughout the test cases. They are no longer needed to force
      splitting to occur.
      
      llvm-svn: 351480
      32a014d0
    • Wei Mi's avatar
      [SampleFDO] Skip profile reading when flattened profile used in ThinLTO postlink · 3bcccdfe
      Wei Mi authored
      If the sample profile has no inlining hierachy information included, we call
      the sample profile is flattened. For flattened profile, in ThinLTO postlink
      phase, SampleProfileLoader's hot function inlining and profile annotation will
      do nothing, so it is better to save the effort to read in the profile and run
      the sample profile loader pass. It is helpful for reducing compile time when
      the flattened profile is huge.
      
      Differential Revision: https://reviews.llvm.org/D54819
      
      llvm-svn: 351476
      3bcccdfe
    • Teresa Johnson's avatar
      Revert "[ThinLTO] Add summary entries for index-based WPD" · 8d86f1ba
      Teresa Johnson authored
      Mistaken commit of something still under review!
      
      This reverts commit r351453.
      
      llvm-svn: 351455
      8d86f1ba
    • Teresa Johnson's avatar
      [ThinLTO] Add summary entries for index-based WPD · 4fcf3b16
      Teresa Johnson authored
      Summary:
      If LTOUnit splitting is disabled, the module summary analysis computes
      the summary information necessary to perform single implementation
      devirtualization during the thin link with the index and no IR. The
      information collected from the regular LTO IR in the current hybrid WPD
      algorithm is summarized, including:
      1) For vtable definitions, record the function pointers and their offset
      within the vtable initializer (subsumes the information collected from
      IR by tryFindVirtualCallTargets).
      2) A record for each type metadata summarizing the vtable definitions
      decorated with that metadata (subsumes the TypeIdentiferMap collected
      from IR).
      
      Also added are the necessary bitcode records, and the corresponding
      assembly support.
      
      The index-based WPD will be sent as a follow-on.
      
      Depends on D53890.
      
      Reviewers: pcc
      
      Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D54815
      
      llvm-svn: 351453
      4fcf3b16
    • Vedant Kumar's avatar
      [MergeFunc] Prevent silent miscompile of vararg functions · a9906c1e
      Vedant Kumar authored
      The function merging pass miscompiles identical vararg functions. The
      forwarding thunk it emits doesn't forward the full variable-length list
      of arguments. Disable merging for vararg functions for now.
      
      I've filed llvm.org/PR40345 to track the issue.
      
      rdar://47326238
      
      llvm-svn: 351411
      a9906c1e
    • Wei Mi's avatar
      Fix a mistake in rL351392. · 79c4408a
      Wei Mi authored
      PGOInstrGen should be initialized to "" instead of false.
      
      llvm-svn: 351397
      79c4408a
    • Wei Mi's avatar
      [PGO] Make pgo related options in opt more consistent. · c876e3d4
      Wei Mi authored
      Currently we have pgo options defined in PassManagerBuilder.cpp only for
      instrument pgo, but not for sample pgo. We also have pgo options defined
      in NewPMDriver.cpp in opt only for new pass manager and for all kinds of
      pgo. They have some inconsistency.
      
      To make the options more consistent and make tests writing easier, the
      patch let old pass manager to share the same pgo options with new pass
      manager in opt, and removes the options in PassManagerBuilder.cpp.
      
      Differential Revision: https://reviews.llvm.org/D56749
      
      llvm-svn: 351392
      c876e3d4
  3. Jan 16, 2019
    • Tom Stellard's avatar
      Only promote args when function attributes are compatible · 3d36e5c3
      Tom Stellard authored
      Summary:
      Check to make sure that the caller and the callee have compatible
      function arguments before promoting arguments.  This uses the same
      TargetTransformInfo queries that are used to determine if attributes
      are compatible for inlining.
      
      The goal here is to avoid breaking ABI when a called function's ABI
      depends on a target feature that is not enabled in the caller.
      
      This is a very conservative fix for PR37358.  Ideally we would have a more
      sophisticated check for ABI compatiblity rather than checking if the
      attributes are compatible for inlining.
      
      Reviewers: echristo, chandlerc, eli.friedman, craig.topper
      
      Reviewed By: echristo, chandlerc
      
      Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D53554
      
      llvm-svn: 351296
      3d36e5c3
  4. Jan 15, 2019
    • David Callahan's avatar
      We can improve the performance (generally) by memo-izing the action to map a... · dee00120
      David Callahan authored
      We can improve the performance (generally) by memo-izing the action to map a debug location to its function summary.
      
      Summary:
      Here are timings (as reported by "opt -time-passes") for
      sample-profile pass for some files holding hot functions from a major
      service©r. Average 17% reduction. Delta column is 100*(old-new)/old.
      
      ```
      Old    New    Delta
      0.0537 0.0538 -0.2%
      0.8155 0.6522 20.0%
      0.0779 0.0751  3.6%
      0.0727 0.0913 -25.6%
      0.1622 0.1302 19.7%
      0.0627 0.0594  5.3%
      0.0766 0.0744  2.9%
      0.6426 0.4387 31.7%
      0.3521 0.2776 21.2%
      0.3549 0.2721 23.3%
      0.0912 0.0904  0.9%
      0.1236 0.1059 14.3%
      0.0854 0.0866 -1.4%
      0.0757 0.0722  4.6%
      0.1293 0.1147 11.3%
      0.1354 0.1122 17.1%
      0.0767 0.0770 -0.4%
      0.1135 0.0968 14.7%
      0.0524 0.0608 -16.0%
      0.1279 0.1106 13.5%
      ==========
      3.6820 3.0520 17.1% Total
      ```
      
      Reviewers: twoh, Kader, danielcdh, wmi
      
      Reviewed By: wmi
      
      Subscribers: dblaikie, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56435
      
      llvm-svn: 351211
      dee00120
  5. Jan 14, 2019
  6. Jan 12, 2019
  7. Jan 11, 2019
    • Teresa Johnson's avatar
      [LTO] Record whether LTOUnit splitting is enabled in index · 290a8398
      Teresa Johnson authored
      Summary:
      Records in the module summary index whether the bitcode was compiled
      with the option necessary to enable splitting the LTO unit
      (e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).
      
      The information is passed down to the ModuleSummaryIndex builder via a
      new module flag "EnableSplitLTOUnit", which is propagated onto a flag
      on the summary index.
      
      This is then used during the LTO link to check whether all linked
      summaries were built with the same value of this flag. If not, an error
      is issued when we detect a situation requiring whole program visibility
      of the class hierarchy. This is the case when both of the following
      conditions are met:
      1) We are performing LowerTypeTests or Whole Program Devirtualization.
      2) There are type tests or type checked loads in the code.
      
      Note I have also changed the ThinLTOBitcodeWriter to also gate the
      module splitting on the value of this flag.
      
      Reviewers: pcc
      
      Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D53890
      
      llvm-svn: 350948
      290a8398
    • Vedant Kumar's avatar
      [MergeFunc] Erase unused duplicate functions if they are discardable · ee10ef73
      Vedant Kumar authored
      MergeFunc only deletes unused duplicate functions if they have local
      linkage, but it should be safe to relax this to any "discardable if
      unused" linkage type.
      
      Differential Revision: https://reviews.llvm.org/D56574
      
      llvm-svn: 350939
      ee10ef73
    • Vedant Kumar's avatar
      [MergeFunc] Use Instruction::getFunction as a cleanup, NFC · 08fe7e02
      Vedant Kumar authored
      llvm-svn: 350938
      08fe7e02
  8. Jan 09, 2019
    • Easwaran Raman's avatar
      Refactor synthetic profile count computation. NFC. · b45994b8
      Easwaran Raman authored
      Summary:
      Instead of using two separate callbacks to return the entry count and the
      relative block frequency, use a single callback to return callsite
      count. This would allow better supporting hybrid mode in the future as
      the count of callsite need not always be derived from entry count (as in
      sample PGO).
      
      Reviewers: davidxl
      
      Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56464
      
      llvm-svn: 350755
      b45994b8
  9. Jan 07, 2019
    • Chandler Carruth's avatar
      [CallSite removal] Migrate all Alias Analysis APIs to use the newly · 363ac683
      Chandler Carruth authored
      minted `CallBase` class instead of the `CallSite` wrapper.
      
      This moves the largest interwoven collection of APIs that traffic in
      `CallSite`s. While a handful of these could have been migrated with
      a minorly more shallow migration by converting from a `CallSite` to
      a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
      migrate together because of the complex interplay of AA APIs and the
      fact that converting from a `CallBase` to a `CallSite` isn't free in its
      current implementation.
      
      Out of tree users of these APIs can fairly reliably migrate with some
      combination of `.getInstruction()` on the `CallSite` instance and
      casting the resulting pointer. The most generic form will look like `CS`
      -> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
      is a more elegant migration. Hopefully, this migrates enough APIs for
      users to fully move from `CallSite` to the base class. All of the
      in-tree users were easily migrated in that fashion.
      
      Thanks for the review from Saleem!
      
      Differential Revision: https://reviews.llvm.org/D55641
      
      llvm-svn: 350503
      363ac683
  10. Jan 05, 2019
    • Easwaran Raman's avatar
      [Inliner] Optimize shouldBeDeferred · 366a873f
      Easwaran Raman authored
      This has some minor optimizations to shouldBeDeferred. This is not
      strictly NFC because the early exit inside the loop assumes
      TotalSecondaryCost is monotonically non-decreasing, which is not true if
      the threshold used by CostAnalyzer is negative. AFAICT the thresholds do
      not go below 0 for the default values of the various options we use.
      
      llvm-svn: 350456
      366a873f
  11. Jan 04, 2019
    • Teresa Johnson's avatar
      [ThinLTO] Handle chains of aliases · 853b9624
      Teresa Johnson authored
      At -O0, globalopt is not run during the compile step, and we can have a
      chain of an alias having an immediate aliasee of another alias. The
      summaries are constructed assuming aliases in a canonical form
      (flattened chains), and as a result only the base object but no
      intermediate aliases were preserved.
      
      Fix by adding a pass that canonicalize aliases, which ensures each
      alias is a direct alias of the base object.
      
      Reviewers: pcc, davidxl
      
      Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D54507
      
      llvm-svn: 350423
      853b9624
  12. Jan 03, 2019
  13. Dec 21, 2018
  14. Dec 18, 2018
    • Michael Kruse's avatar
      [LoopVectorize] Rename pass options. NFC. · d4eb13c8
      Michael Kruse authored
      Rename:
      NoUnrolling to InterleaveOnlyWhenForced
      and
      AlwaysVectorize to !VectorizeOnlyWhenForced
      
      Contrary to what the name 'AlwaysVectorize' suggests, it does not
      unconditionally vectorize all loops, but applies a cost model to
      determine whether vectorization is profitable to all loops. Hence,
      passing false will disable the cost model, except when a loop is marked
      with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested
      by @hfinkel in D55716) better matches this behavior.
      
      Similarly, 'NoUnrolling' disables the profitability cost model for
      interleaving (a term to distinguish it from unrolling by the
      LoopUnrollPass); rename it for consistency.
      
      Differential Revision: https://reviews.llvm.org/D55785
      
      llvm-svn: 349513
      d4eb13c8
    • Michael Kruse's avatar
      [LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops. · 3284775b
      Michael Kruse authored
      When using clang with `-fno-unroll-loops` (implicitly added with `-O1`),
      the LoopUnrollPass is not not added to the (legacy) pass pipeline. This
      also means that it will not process any loop metadata such as
      llvm.loop.unroll.enable (which is generated by #pragma unroll or
      WarnMissedTransformationsPass emits a warning that a forced
      transformation has not been applied (see
      https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html).
      Such explicit transformations should take precedence over disabling
      heuristics.
      
      This patch unconditionally adds LoopUnrollPass to the optimizing
      pipeline (that is, it is still not added with `-O0`), but passes a flag
      indicating whether automatic unrolling is dis-/enabled. This is the same
      approach as LoopVectorize uses.
      
      The new pass manager's pipeline builder has no option to disable
      unrolling, hence the problem does not apply.
      
      Differential Revision: https://reviews.llvm.org/D55716
      
      llvm-svn: 349509
      3284775b
    • Dylan McKay's avatar
      [IPO][AVR] Create new Functions in the default address space specified in the data layout · f920da00
      Dylan McKay authored
      This modifies the IPO pass so that it respects any explicit function
      address space specified in the data layout.
      
      In targets with nonzero program address spaces, all functions should, by
      default, be placed into the default program address space.
      
      This is required for Harvard architectures like AVR. Without this, the
      functions will be marked as residing in data space, and thus not be
      callable.
      
      This has no effect to any in-tree official backends, as none use an
      explicit program address space in their data layouts.
      
      Patch by Tim Neumann.
      
      llvm-svn: 349469
      f920da00
  15. Dec 13, 2018
    • Wei Mi's avatar
      [SampleFDO] handle ProfileSampleAccurate when initializing function entry count · 66c6c5ab
      Wei Mi authored
      ProfileSampleAccurate is used to indicate the profile has exact match to the
      code to be optimized.
      
      Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite
      and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function
      entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle
      ProfileSampleAccurate in multiple places.
      
      Differential Revision: https://reviews.llvm.org/D55660
      
      llvm-svn: 349088
      66c6c5ab
    • Easwaran Raman's avatar
      [ThinLTO] Compute synthetic function entry count · 5a7056fa
      Easwaran Raman authored
      Summary:
      This patch computes the synthetic function entry count on the whole
      program callgraph (based on module summary) and writes the entry counts
      to the summary. After function importing, this count gets attached to
      the IR as metadata. Since it adds a new field to the summary, this bumps
      up the version.
      
      Reviewers: tejohnson
      
      Subscribers: mehdi_amini, inglorion, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D43521
      
      llvm-svn: 349076
      5a7056fa
  16. Dec 12, 2018
    • Michael Kruse's avatar
      [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes. · 72448525
      Michael Kruse authored
      When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.
      
          #pragma clang loop unroll_and_jam(enable)
          #pragma clang loop distribute(enable)
      
      is the same as
      
          #pragma clang loop distribute(enable)
          #pragma clang loop unroll_and_jam(enable)
      
      and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.
      
      This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,
      
          !0 = !{!0, !1, !2}
          !1 = !{!"llvm.loop.unroll_and_jam.enable"}
          !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
          !3 = !{!"llvm.loop.distribute.enable"}
      
      defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.
      
      Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.
      
      For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.
      
      Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.
      
      To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.
      
      With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).
      
      Reviewed By: hfinkel, dmgreen
      
      Differential Revision: https://reviews.llvm.org/D49281
      Differential Revision: https://reviews.llvm.org/D55288
      
      llvm-svn: 348944
      72448525
  17. Dec 11, 2018
    • Vedant Kumar's avatar
      [HotColdSplitting] Disable outlining landingpad instructions (PR39917) · b3a7cae0
      Vedant Kumar authored
      It's currently not safe to outline landingpad instructions (see
      llvm.org/PR39917). Like @llvm.eh.typeid.for, the order and content of
      previous landingpad instructions in a function alters the lowering of
      subsequent landingpads by renumbering type info ID's. Outlining a
      landingpad therefore breaks exception handling & unwinding.
      
      llvm-svn: 348870
      b3a7cae0
    • David Stenberg's avatar
      [DeadArgElim] Fixes for dbg.values using dead arg/return values · 2474ce58
      David Stenberg authored
      Summary:
      When eliminating a dead argument or return value in a function with
      local linkage, all uses, including in dbg.value intrinsics, would be
      replaced with null constants. This would mean that, for example for an
      integer argument, the debug info would incorrectly express that the
      value is 0. Instead, replace all uses with undef to indicate that the
      argument/return value is optimized out.
      
      Also, make sure that metadata uses of return values are rewritten even
      if there are no non-metadata uses of the value.
      
      As a bit of historical curiosity, the code that emitted null constants
      was introduced in the initial check-in of the pass in 2003, before
      'undef' values even existed in LLVM.
      
      This fixes PR23260.
      
      Reviewers: dblaikie, aprantl, vsk, djtodoro
      
      Reviewed By: aprantl
      
      Subscribers: llvm-commits
      
      Tags: #debug-info
      
      Differential Revision: https://reviews.llvm.org/D55513
      
      llvm-svn: 348837
      2474ce58
  18. Dec 07, 2018
    • Vedant Kumar's avatar
      [HotColdSplitting] Refine definition of unlikelyExecuted · 03f9f15b
      Vedant Kumar authored
      The splitting pass uses its 'unlikelyExecuted' predicate to statically
      decide which blocks are cold.
      
      - Do not treat noreturn calls as if they are cold unless they are actually
        marked cold. This is motivated by functions like exit() and longjmp(), which
        are not beneficial to outline.
      
      - Do not treat inline asm as an outlining barrier. In practice asm("") is
        frequently used to inhibit basic block merging; enabling outlining in this case
        results in substantial memory savings.
      
      - Treat invokes of cold functions as cold.
      
      As a drive-by, remove the 'exceptionHandlingFunctions' predicate, because it's
      no longer needed. The pass can identify & outline blocks dominated by EH pads,
      so there's no need to special-case __cxa_begin_catch etc.
      
      Differential Revision: https://reviews.llvm.org/D54244
      
      llvm-svn: 348640
      03f9f15b
    • Vedant Kumar's avatar
      [HotColdSplitting] Outline more than once per function · 03aaa3e2
      Vedant Kumar authored
      Algorithm: Identify maximal cold regions and put them in a worklist. If
      a candidate region overlaps with another, discard it. While the worklist
      is full, remove a single-entry sub-region from the worklist and attempt
      to outline it. By the non-overlap property, this should not invalidate
      parts of the domtree pertaining to other outlining regions.
      
      Testing: LNT results on X86 are clean. With test-suite + externals, llvm
      outlines 134KB pre-patch, and 352KB post-patch (+ ~2.6x). The file
      483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm
      outlines over 100 times in some functions (mostly EH paths). There was
      not a significant performance impact pre vs. post-patch.
      
      Differential Revision: https://reviews.llvm.org/D53887
      
      llvm-svn: 348639
      03aaa3e2
  19. Dec 05, 2018
Loading