Skip to content
  1. Aug 03, 2018
  2. Aug 01, 2018
    • Johannes Doerfert's avatar
      [NFC][FunctionAttrs] Remove duplication in old/new PM pipeline · bed4babc
      Johannes Doerfert authored
      This patch just extract code into a separate function to remove some
      duplication between the old and new pass manager pipeline. Due to the
      different CGSCC iterators used, not all code duplication was eliminated.
      
      llvm-svn: 338585
      bed4babc
    • David Bolvansky's avatar
      Revert "Enrich inline messages", tests fail · fbbb83c7
      David Bolvansky authored
      llvm-svn: 338496
      fbbb83c7
    • David Bolvansky's avatar
      Enrich inline messages · 7f36cd9d
      David Bolvansky authored
      Summary:
      This patch improves Inliner to provide causes/reasons for negative inline decisions.
      1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
      2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
      3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
      4. Adjusted tests for changed printing.
      
      Patch by: yrouban (Yevgeny Rouban)
      
      
      Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
      
      Reviewed By: tejohnson, xbolva00
      
      Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
      
      Differential Revision: https://reviews.llvm.org/D49412
      
      llvm-svn: 338494
      7f36cd9d
  3. Jul 31, 2018
    • David Bolvansky's avatar
      Revert Enrich inline messages · ab79414f
      David Bolvansky authored
      llvm-svn: 338389
      ab79414f
    • David Bolvansky's avatar
      Enrich inline messages · b562dbab
      David Bolvansky authored
      Summary:
      This patch improves Inliner to provide causes/reasons for negative inline decisions.
      1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
      2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
      3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
      4. Adjusted tests for changed printing.
      
      Patch by: yrouban (Yevgeny Rouban)
      
      
      Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
      
      Reviewed By: tejohnson, xbolva00
      
      Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
      
      Differential Revision: https://reviews.llvm.org/D49412
      
      llvm-svn: 338387
      b562dbab
  4. Jul 30, 2018
  5. Jul 28, 2018
    • David Green's avatar
      [GlobalOpt] Test array indices inside structs for out-of-bounds accesses · fc4b0fe0
      David Green authored
      We now, from clang, can turn arrays of
        static short g_data[] = {16, 16, 16, 16, 16, 16, 16, 16, 0, 0, 0, 0, 0, 0, 0, 0};
      into structs of the form
        @g_data = internal global <{ [8 x i16], [8 x i16] }> ...
      
      GlobalOpt will incorrectly SROA it, not realising that the access to the first
      element may overflow into the second. This fixes it by checking geps more
      thoroughly.
      
      I believe this makes the globalsra-partial.ll test case invalid as the %i value
      could be out of bounds. I've re-purposed it as a negative test for this case.
      
      Differential Revision: https://reviews.llvm.org/D49816
      
      llvm-svn: 338192
      fc4b0fe0
  6. Jul 25, 2018
  7. Jul 23, 2018
  8. Jul 20, 2018
  9. Jul 19, 2018
  10. Jul 16, 2018
    • Teresa Johnson's avatar
      Restore "[ThinLTO] Ensure we always select the same function copy to import" · d68935c5
      Teresa Johnson authored
      This reverts commit r337081, therefore restoring r337050 (and fix in
      r337059), with test fix for bot failure described after the original
      description below.
      
      In order to always import the same copy of a linkonce function,
      even when encountering it with different thresholds (a higher one then a
      lower one), keep track of the summary we decided to import.
      This ensures that the backend only gets a single definition to import
      for each GUID, so that it doesn't need to choose one.
      
      Move the largest threshold the GUID was considered for import into the
      current module out of the ImportMap (which is part of a larger map
      maintained across the whole index), and into a new map just maintained
      for the current module we are computing imports for. This saves some
      memory since we no longer have the thresholds maintained across the
      whole index (and throughout the in-process backends when doing a normal
      non-distributed ThinLTO build), at the cost of some additional
      information being maintained for each invocation of ComputeImportForModule
      (the selected summary pointer for each import).
      
      There is an additional map lookup for each callee being considered for
      importing, however, this was able to subsume a map lookup in the
      Worklist iteration that invokes computeImportForFunction. We also are
      able to avoid calling selectCallee if we already failed to import at the
      same or higher threshold.
      
      I compared the run time and peak memory for the SPEC2006 471.omnetpp
      benchmark (running in-process ThinLTO backends), as well as for a large
      internal benchmark with a distributed ThinLTO build (so just looking at
      the thin link time/memory). Across a number of runs with and without
      this change there was no significant change in the time and memory.
      
      (I tried a few other variations of the change but they also didn't
      improve time or peak memory).
      
      The new commit removes a test that no longer makes sense
      (Transforms/FunctionImport/hotness_based_import2.ll), as exposed by the
      reverse-iteration bot. The test depends on the order of processing the
      summary call edges, and actually depended on the old problematic
      behavior of selecting more than one summary for a given GUID when
      encountered with different thresholds. There was no guarantee even
      before that we would eventually pick the linkonce copy with the hottest
      call edges, it just happened to work with the test and the old code, and
      there was no guarantee that we would end up importing the selected
      version of the copy that had the hottest call edges (since the backend
      would effectively import only one of the selected copies).
      
      Reviewers: davidxl
      
      Subscribers: mehdi_amini, inglorion, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D48670
      
      llvm-svn: 337184
      d68935c5
  11. Jul 14, 2018
  12. Jul 13, 2018
    • Teresa Johnson's avatar
      [ThinLTO] Ensure we always select the same function copy to import · d94c0594
      Teresa Johnson authored
      In order to always import the same copy of a linkonce function,
      even when encountering it with different thresholds (a higher one then a
      lower one), keep track of the summary we decided to import.
      This ensures that the backend only gets a single definition to import
      for each GUID, so that it doesn't need to choose one.
      
      Move the largest threshold the GUID was considered for import into the
      current module out of the ImportMap (which is part of a larger map
      maintained across the whole index), and into a new map just maintained
      for the current module we are computing imports for. This saves some
      memory since we no longer have the thresholds maintained across the
      whole index (and throughout the in-process backends when doing a normal
      non-distributed ThinLTO build), at the cost of some additional
      information being maintained for each invocation of ComputeImportForModule
      (the selected summary pointer for each import).
      
      There is an additional map lookup for each callee being considered for
      importing, however, this was able to subsume a map lookup in the
      Worklist iteration that invokes computeImportForFunction. We also are
      able to avoid calling selectCallee if we already failed to import at the
      same or higher threshold.
      
      I compared the run time and peak memory for the SPEC2006 471.omnetpp
      benchmark (running in-process ThinLTO backends), as well as for a large
      internal benchmark with a distributed ThinLTO build (so just looking at
      the thin link time/memory). Across a number of runs with and without
      this change there was no significant change in the time and memory.
      
      (I tried a few other variations of the change but they also didn't
      improve time or peak memory).
      
      Reviewers: davidxl
      
      Subscribers: mehdi_amini, inglorion, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D48670
      
      llvm-svn: 337050
      d94c0594
    • Vlad Tsyrklevich's avatar
      [LowerTypeTests] Limit when icall jumptable entries are emitted · cd155936
      Vlad Tsyrklevich authored
      Summary:
      Currently LowerTypeTests emits jumptable entries for all live external
      and address-taken functions; however, we could limit the number of
      functions that we emit entries for significantly.
      
      For Cross-DSO CFI, we continue to emit jumptable entries for all
      exported definitions.  In the non-Cross-DSO CFI case, we only need to
      emit jumptable entries for live functions that are address-taken in live
      functions. This ignores exported functions and functions that are only
      address taken in dead functions. This change uses ThinLTO summary data
      (now emitted for all modules during ThinLTO builds) to determine
      address-taken and liveness info.
      
      The logic for emitting jumptable entries is more conservative in the
      regular LTO case because we don't have summary data in the case of
      monolithic LTO builds; however, once summaries are emitted for all LTO
      builds we can unify the Thin/monolithic LTO logic to only use summaries
      to determine the liveness of address taking functions.
      
      This change is a partial fix for PR37474. It reduces the build size for
      nacl_helper by ~2-3%, the reduction is due to nacl_helper compiling in
      lots of unused code and unused functions that are address taken in dead
      functions no longer being being considered live due to emitted jumptable
      references. The reduction for chromium is ~0.1-0.2%.
      
      Reviewers: pcc, eugenis, javed.absar
      
      Reviewed By: pcc
      
      Subscribers: aheejin, dexonsmith, dschuff, mehdi_amini, eraman, steven_wu, llvm-commits, kcc
      
      Differential Revision: https://reviews.llvm.org/D47652
      
      llvm-svn: 337038
      cd155936
  13. Jul 10, 2018
    • Teresa Johnson's avatar
      [ThinLTO] Use std::map to get determistic imports files · c0320ef4
      Teresa Johnson authored
      Summary:
      I noticed that the .imports files emitted for distributed ThinLTO
      backends do not have consistent ordering. This is because StringMap
      iteration order is not guaranteed to be deterministic. Since we already
      have a std::map with this information, used when emitting the individual
      index files (ModuleToSummariesForIndex), use it for the imports files as
      well.
      
      This issue is likely causing some unnecessary rebuilds of the ThinLTO
      backends in our distributed build system as the imports files are inputs
      to those backends.
      
      Reviewers: pcc, steven_wu, mehdi_amini
      
      Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D48783
      
      llvm-svn: 336721
      c0320ef4
    • Manoj Gupta's avatar
      llvm: Add support for "-fno-delete-null-pointer-checks" · 77eeac3d
      Manoj Gupta authored
      Summary:
      Support for this option is needed for building Linux kernel.
      This is a very frequently requested feature by kernel developers.
      
      More details : https://lkml.org/lkml/2018/4/4/601
      
      GCC option description for -fdelete-null-pointer-checks:
      This Assume that programs cannot safely dereference null pointers,
      and that no code or data element resides at address zero.
      
      -fno-delete-null-pointer-checks is the inverse of this implying that
      null pointer dereferencing is not undefined.
      
      This feature is implemented in LLVM IR in this CL as the function attribute
      "null-pointer-is-valid"="true" in IR (Under review at D47894).
      The CL updates several passes that assumed null pointer dereferencing is
      undefined to not optimize when the "null-pointer-is-valid"="true"
      attribute is present.
      
      Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
      
      Reviewed By: efriedma, george.burgess.iv
      
      Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D47895
      
      llvm-svn: 336613
      77eeac3d
  14. Jul 09, 2018
    • Xin Tong's avatar
      [CVP] Handle calls with void return value. No need to create CVPLattice state for it. · b467233d
      Xin Tong authored
      Summary:
      Tests: 10
      Metric: compile_time
      
      Program                                         unpatch-result  patch-result diff
      
      Bullet/bullet                                  32.39           30.54        -5.7%
      SPASS/SPASS                                    18.14           17.25        -4.9%
      mafft/pairlocalalign                           12.10           11.64        -3.8%
      ClamAV/clamscan                                19.21           19.63         2.2%
      7zip/7zip-benchmark                            49.55           48.85        -1.4%
      kimwitu++/kc                                   15.68           15.87         1.2%
      lencod/lencod                                  21.13           21.34         1.0%
      consumer-typeset/consumer-typeset              13.65           13.62        -0.2%
      tramp3d-v4/tramp3d-v4                          29.88           29.92         0.1%
      sqlite3/sqlite3                                18.48           18.46        -0.1%
             unpatch-result  patch-result       diff
      count  10.000000       10.000000     10.000000
      mean   23.022000       22.712400    -0.011671
      std    11.362831       11.094183     0.027338
      min    12.104000       11.640000    -0.057298
      25%    16.299000       16.214000    -0.032282
      50%    18.844000       19.048000    -0.001350
      75%    27.689000       27.774000     0.007752
      max    49.552000       48.852000     0.021861
      
      I also tested only this pass by concatenating all the code from the
      llvm/lib/Analysis/ folder and do clang -g followed by opt. I get close to 20% speedup
      for the pass. I expect a majority of the gain come from skipping the dbg intrinsics.
      
      Before patch (opt -time-passes -called-value-propagation):
      ============
      ===-------------------------------------------------------------------------===
       ... Pass execution timing report ...
      ===-------------------------------------------------------------------------===
       Total Execution Time: 3.8303 seconds (3.8279 wall clock)
      
       ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
      Name ---
       2.0768 ( 57.3%) 0.0990 ( 48.0%) 2.1757 ( 56.8%) 2.1757 ( 56.8%) Bitcode
      Writer
       0.8444 ( 23.3%) 0.0600 ( 29.1%) 0.9044 ( 23.6%) 0.9044 ( 23.6%) Called
      Value Propagation
       0.7031 ( 19.4%) 0.0472 ( 22.9%) 0.7502 ( 19.6%) 0.7478 ( 19.5%) Module
      Verifier
       3.6242 (100.0%) 0.2062 (100.0%) 3.8303 (100.0%) 3.8279 (100.0%) Total
      
      After patch (opt -time-passes -called-value-propagation):
      ============
      ===-------------------------------------------------------------------------===
       ... Pass execution timing report ...
      ===-------------------------------------------------------------------------===
       Total Execution Time: 3.6605 seconds (3.6579 wall clock)
      
       ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
      Name ---
       2.0716 ( 59.7%) 0.0990 ( 52.5%) 2.1705 ( 59.3%) 2.1706 ( 59.3%) Bitcode
      Writer
       0.7144 ( 20.6%) 0.0300 ( 15.9%) 0.7444 ( 20.3%) 0.7444 ( 20.4%) Called
      Value Propagation
       0.6859 ( 19.8%) 0.0596 ( 31.6%) 0.7455 ( 20.4%) 0.7429 ( 20.3%) Module
      Verifier
       3.4719 (100.0%) 0.1886 (100.0%) 3.6605 (100.0%) 3.6579 (100.0%) Total
      
      Reviewers: davide, mssimpso
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D49078
      
      llvm-svn: 336551
      b467233d
  15. Jul 01, 2018
    • David Green's avatar
      [UnrollAndJam] New Unroll and Jam pass · 963401d2
      David Green authored
      This is a simple implementation of the unroll-and-jam classical loop
      optimisation.
      
      The basic idea is that we take an outer loop of the form:
      
        for i..
          ForeBlocks(i)
          for j..
            SubLoopBlocks(i, j)
          AftBlocks(i)
      
      Instead of doing normal inner or outer unrolling, we unroll as follows:
      
        for i... i+=2
          ForeBlocks(i)
          ForeBlocks(i+1)
          for j..
            SubLoopBlocks(i, j)
            SubLoopBlocks(i+1, j)
          AftBlocks(i)
          AftBlocks(i+1)
        Remainder Loop
      
      So we have unrolled the outer loop, then jammed the two inner loops into
      one. This can lead to a simpler inner loop if memory accesses can be shared
      between the now jammed loops.
      
      To do this we have to prove that this is all safe, both for the memory
      accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
      AftBlocks(i) and SubLoopBlocks(i, j).
      
      Differential Revision: https://reviews.llvm.org/D41953
      
      llvm-svn: 336062
      963401d2
  16. Jun 30, 2018
  17. Jun 28, 2018
  18. Jun 27, 2018
  19. Jun 25, 2018
    • Wei Mi's avatar
      [SampleFDO] Add an option to turn on/off warning about samples unused. · e5551274
      Wei Mi authored
      If a function has sample to use, but cannot use them because of no debug
      information, currently a warning will be issued to inform the missing
      opportunity.
      
      This warning assumes the binary generating the profile and the binary using
      the profile are similar enough. It is not always the case. Sometimes even
      if the binaries are not quite similar, we may still get some benefit by
      using sampleFDO. In those cases, we may still want to apply sampleFDO but
      not want to see a lot of such warnings pop up.
      
      The patch adds an option for the warning.
      
      Differential Revision: https://reviews.llvm.org/D48510
      
      llvm-svn: 335484
      e5551274
  20. Jun 22, 2018
    • Tobias Edler von Koch's avatar
      Re-land "[LTO] Enable module summary emission by default for regular LTO" · 7609cb83
      Tobias Edler von Koch authored
      Since we are now producing a summary also for regular LTO builds, we
      need to run the NameAnonGlobals pass in those cases as well (the
      summary cannot handle anonymous globals).
      
      See https://reviews.llvm.org/D34156 for details on the original change.
      
      This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.
      
      llvm-svn: 335385
      7609cb83
    • Chandler Carruth's avatar
      Revert r335306 (and r335314) - the Call Graph Profile pass. · aa5f4d2e
      Chandler Carruth authored
      This is the first pass in the main pipeline to use the legacy PM's
      ability to run function analyses "on demand". Unfortunately, it turns
      out there are bugs in that somewhat-hacky approach. At the very least,
      it leaks memory and doesn't support -debug-pass=Structure. Unclear if
      there are larger issues or not, but this should get the sanitizer bots
      back to green by fixing the memory leaks.
      
      llvm-svn: 335320
      aa5f4d2e
    • Michael J. Spencer's avatar
      [Instrumentation] Add Call Graph Profile pass · fc93dd8e
      Michael J. Spencer authored
      This patch adds support for generating a call graph profile from Branch Frequency Info.
      
      The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.
      
      After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
      
      !llvm.module.flags = !{!0}
      
      !0 = !{i32 5, !"CG Profile", !1}
      !1 = !{!2, !3, !4} ; List of edges
      !2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
      !3 = !{void (i1)* @freq, void ()* @a, i64 11}
      !4 = !{void (i1)* @freq, void ()* @b, i64 20}
      
      Differential Revision: https://reviews.llvm.org/D48105
      
      llvm-svn: 335306
      fc93dd8e
  21. Jun 21, 2018
  22. Jun 12, 2018
    • Florian Hahn's avatar
      Use SmallPtrSet explicitly for SmallSets with pointer types (NFC). · a1cc8483
      Florian Hahn authored
      Currently SmallSet<PointerTy> inherits from SmallPtrSet<PointerTy>. This
      patch replaces such types with SmallPtrSet, because IMO it is slightly
      clearer and allows us to get rid of unnecessarily including SmallSet.h
      
      Reviewers: dblaikie, craig.topper
      
      Reviewed By: dblaikie
      
      Differential Revision: https://reviews.llvm.org/D47836
      
      llvm-svn: 334492
      a1cc8483
    • Wei Mi's avatar
      [SampleFDO] Add a new compact binary format for sample profile. · a0c0857e
      Wei Mi authored
      Name table occupies a big chunk of size in current binary format sample profile.
      In order to reduce its size, the patch changes the sample writer/reader to
      save/restore MD5Hash of names in the name table. Sample annotation phase will
      also use MD5Hash of name to query samples accordingly.
      
      Experiment shows compact binary format can reduce the size of sample profile by
      2/3 compared with binary format generally.
      
      Differential Revision: https://reviews.llvm.org/D47955
      
      llvm-svn: 334447
      a0c0857e
  23. Jun 07, 2018
    • Teresa Johnson's avatar
      [ThinLTO] Rename index IsAnalysis flag to HaveGVs (NFC) · 4ffc3e78
      Teresa Johnson authored
      With the upcoming patch to add summary parsing support, IsAnalysis would
      be true in contexts where we are not performing module summary analysis.
      Rename to the more specific and approprate HaveGVs, which is essentially
      what this flag is indicating.
      
      llvm-svn: 334140
      4ffc3e78
  24. Jun 04, 2018
  25. Jun 01, 2018
    • Vlad Tsyrklevich's avatar
      [ThinLTOBitcodeWriter] Emit summaries for regular LTO modules · 6867ab7c
      Vlad Tsyrklevich authored
      Summary:
      Emit summaries for bitcode modules that are only destined for the
      regular LTO portion of the build so they can participate in
      summary-based dead stripping.
      
      This change reduces the size of a nacl_helper build with cfi-icall
      enabled by 7%, removing the majority of the overhead due to enabling
      cfi-icall. The cfi-icall size increase was caused by compiling in lots
      of unused code and cfi-icall generating jumptable references to unused
      symbols that could no longer be removed by -Wl,-gc-sections. Increasing
      the visibility of summary-based dead stripping prevented jumptable
      entries being created for unused symbols from the regular LTO portion
      of the build.
      
      Reviewers: pcc
      
      Reviewed By: pcc
      
      Subscribers: dschuff, mehdi_amini, inglorion, eraman, llvm-commits, kcc
      
      Differential Revision: https://reviews.llvm.org/D47594
      
      llvm-svn: 333768
      6867ab7c
    • Florian Hahn's avatar
      Revert r333740: IPSCCP] Use PredicateInfo to propagate facts from cmp. · 8a17f1f4
      Florian Hahn authored
      This is breaking the clang-with-thin-lto-ubuntu bot.
      
      llvm-svn: 333745
      8a17f1f4
    • Florian Hahn's avatar
      Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions. · f4df554f
      Florian Hahn authored
      This patch updates IPSCCP to use PredicateInfo to propagate
      facts to true branches predicated by EQ and to false branches
      predicated by NE.
      
      As a follow up, we should be able to extend it to also propagate additional
      facts about nonnull.
      
      Reviewers: davide, mssimpso, dberlin, efriedma
      
      Reviewed By: davide, dberlin
      
      Differential Revision: https://reviews.llvm.org/D45330
      
      llvm-svn: 333740
      f4df554f
Loading