Skip to content
  1. Nov 17, 2016
  2. Nov 14, 2016
    • Teresa Johnson's avatar
      [ThinLTO] Only promote exported locals as marked in index · 4fef68cb
      Teresa Johnson authored
      Summary:
      We have always speculatively promoted all renamable local values
      (except const non-address taken variables) for both the exporting
      and importing module. We would then internalize them back based on
      the ThinLink results if they weren't actually exported. This is
      inefficient, and results in unnecessary renames. It also meant we
      had to check the non-renamability of a value in the summary, which
      was already checked during function importing analysis in the ThinLink.
      
      Made renameModuleForThinLTO (which does the promotion/renaming) instead
      use the index when exporting, to avoid unnecessary renames/promotions.
      For importing modules, we can simply promoted all values as any local
      we import by definition is exported and needs promotion.
      
      This required changes to the method used by the FunctionImport pass
      (only invoked from 'opt' for testing) and when invoked from llvm-link,
      since neither does a ThinLink. We simply conservatively mark all locals
      in the index as promoted, which preserves the current aggressive
      promotion behavior.
      
      I also needed to change an llvm-lto based test where we had previously
      been aggressively promoting values that weren't importable (aliasees),
      but now will not promote.
      
      Reviewers: mehdi_amini
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D26467
      
      llvm-svn: 286871
      4fef68cb
    • Teresa Johnson's avatar
      [ThinLTO] Make inline assembly handling more efficient in summary · d5033a45
      Teresa Johnson authored
      Summary:
      The change in r285513 to prevent exporting of locals used in
      inline asm added all locals in the llvm.used set to the reference
      set of functions containing inline asm. Since these locals were marked
      NoRename, this automatically prevented importing of the function.
      
      Unfortunately, this caused an explosion in the summary reference lists
      in some cases. In my particular example, it happened for a large protocol
      buffer generated C++ file, where many of the generated functions
      contained an inline asm call. It was exacerbated when doing a ThinLTO
      PGO instrumentation build, where the PGO instrumentation included
      thousands of private __profd_* values that were added to llvm.used.
      
      We really only need to include a single llvm.used local (NoRename) value
      in the reference list of a function containing inline asm to block it
      being imported. However, it seems cleaner to add a flag to the summary
      that explicitly describes this situation, which is what this patch does.
      
      Reviewers: mehdi_amini
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D26402
      
      llvm-svn: 286840
      d5033a45
  3. Nov 13, 2016
  4. Nov 11, 2016
    • Evgeniy Stepanov's avatar
      [cfi] Fix weak functions handling. · 1fe189d7
      Evgeniy Stepanov authored
      When a function pointer is replaced with a jumptable pointer, special
      case is needed to preserve the semantics of extern_weak functions.
      Since a jumptable entry can not be extern_weak, we emulate that
      behaviour by replacing all references to F (the extern_weak function)
      with the following expression: F != nullptr ? JumpTablePtr : nullptr.
      
      Extra special care is needed for global initializers, since most (or
      probably all) backends can not lower an initializer that includes
      this kind of constant expression. Initializers like that are replaced
      with a global constructor (i.e. a runtime initializer).
      
      llvm-svn: 286636
      1fe189d7
    • Erik Eckstein's avatar
      Make the FunctionComparator of the MergeFunctions pass a stand-alone utility. · 4d6fb72a
      Erik Eckstein authored
      This is pure refactoring. NFC.
      
      This change moves the FunctionComparator (together with the GlobalNumberState
      utility) in to a separate file so that it can be used by other passes.
      For example, the SwiftMergeFunctions pass in the Swift compiler:
      https://github.com/apple/swift/blob/master/lib/LLVMPasses/LLVMMergeFunctions.cpp
      
      Details of the change:
      
      *) The big part is just moving code out of MergeFunctions.cpp into FunctionComparator.h/cpp
      *) Make FunctionComparator member functions protected (instead of private)
         so that a derived comparator class can use them.
      
      Following refactoring helps to share code between the base FunctionComparator
      class and a derived class:
      
      *) Add a beginCompare() function
      *) Move some basic function property comparisons into a separate function compareSignature()
      *) Do the GEP comparison inside cmpOperations() which now has a new
         needToCmpOperands reference parameter
      
      https://reviews.llvm.org/D25385
      
      llvm-svn: 286632
      4d6fb72a
    • Peter Collingbourne's avatar
      6de481a3
    • Evgeniy Stepanov's avatar
      [cfi] Implement cfi-icall using inline assembly. · f48ffab5
      Evgeniy Stepanov authored
      The current implementation is emitting a global constant that happens
      to evaluate to the same bytes + relocation as a jump instruction on
      X86. This does not work for PIE executables and shared libraries
      though, because we end up with a wrong relocation type. And it has no
      chance of working on ARM/AArch64 which use different relocation types
      for jump instructions (R_ARM_JUMP24) that is never generated for
      data.
      
      This change replaces the constant with module-level inline assembly
      followed by a hidden declaration of the jump table. Works fine for
      ARM/AArch64, but has some drawbacks.
      * Extra symbols are added to the static symbol table, which inflate
      the size of the unstripped binary a little. Stripped binaries are not
      affected. This happens because jump table declarations must be
      external (because their body is in the inline asm).
      * Original functions that were anonymous are now named
      <original name>.cfi, and it affects symbolization sometimes. This is
      necessary because the only user of these functions is the (inline
      asm) jump table, so they had to be added to @llvm.used, which does
      not allow unnamed functions.
      
      llvm-svn: 286611
      f48ffab5
  5. Nov 10, 2016
  6. Nov 09, 2016
  7. Nov 08, 2016
  8. Nov 07, 2016
  9. Nov 04, 2016
  10. Oct 28, 2016
  11. Oct 25, 2016
  12. Oct 18, 2016
    • Rong Xu's avatar
      Conditionally eliminate library calls where the result value is not used · 1c0e9b97
      Rong Xu authored
      Summary:
      This pass shrink-wraps a condition to some library calls where the call
      result is not used. For example:
         sqrt(val);
       is transformed to
         if (val < 0)
           sqrt(val);
      Even if the result of library call is not being used, the compiler cannot
      safely delete the call because the function can set errno on error
      conditions.
      Note in many functions, the error condition solely depends on the incoming
      parameter. In this optimization, we can generate the condition can lead to
      the errno to shrink-wrap the call. Since the chances of hitting the error
      condition is low, the runtime call is effectively eliminated.
      
      These partially dead calls are usually results of C++ abstraction penalty
      exposed by inlining. This optimization hits 108 times in 19 C/C++ programs
      in SPEC2006.
      
      Reviewers: hfinkel, mehdi_amini, davidxl
      
      Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz
      
      Differential Revision: https://reviews.llvm.org/D24414
      
      llvm-svn: 284542
      1c0e9b97
  13. Oct 08, 2016
  14. Oct 05, 2016
    • David Callahan's avatar
      Modify df_iterator to support post-order actions · c1051ab2
      David Callahan authored
      Summary: This makes a change to the state used to maintain visited information for depth first iterator. We know assume a method "completed(...)" which is called after all children of a node have been visited. In all existing cases, this method does nothing so this patch has no functional changes.  It will however allow a client to distinguish back from cross edges in a DFS tree.
      
      Reviewers: nadav, mehdi_amini, dberlin
      
      Subscribers: MatzeB, mzolotukhin, twoh, freik, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D25191
      
      llvm-svn: 283391
      c1051ab2
  15. Oct 03, 2016
  16. Oct 01, 2016
  17. Sep 30, 2016
  18. Sep 29, 2016
  19. Sep 28, 2016
  20. Sep 27, 2016
    • Adam Nemet's avatar
      [Inliner] Fold the analysis remark into the missed remark · 1142147e
      Adam Nemet authored
      There is really no reason for these to be separate.
      
      The vectorizer started this pretty bad tradition that the text of the
      missed remarks is pretty meaningless, i.e. vectorization failed.  There,
      you have to query analysis to get the full picture.
      
      I think we should just explain the reason for missing the optimization
      in the missed remark when possible.  Analysis remarks should provide
      information that the pass gathers regardless whether the optimization is
      passing or not.
      
      llvm-svn: 282542
      1142147e
    • Adam Nemet's avatar
      Output optimization remarks in YAML · a62b7e1a
      Adam Nemet authored
      (Re-committed after moving the template specialization under the yaml
      namespace.  GCC was complaining about this.)
      
      This allows various presentation of this data using an external tool.
      This was first recommended here[1].
      
      As an example, consider this module:
      
        1 int foo();
        2 int bar();
        3
        4 int baz() {
        5   return foo() + bar();
        6 }
      
      The inliner generates these missed-optimization remarks today (the
      hotness information is pulled from PGO):
      
        remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
        remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
      
      Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
      
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: foo
          - String:  will not be inlined into
          - Caller: baz
        ...
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: bar
          - String:  will not be inlined into
          - Caller: baz
        ...
      
      This is a summary of the high-level decisions:
      
      * There is a new streaming interface to emit optimization remarks.
      E.g. for the inliner remark above:
      
         ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                      DEBUG_TYPE, "NotInlined", &I)
                  << NV("Callee", Callee) << " will not be inlined into "
                  << NV("Caller", CS.getCaller()) << setIsVerbose());
      
      NV stands for named value and allows the YAML client to process a remark
      using its name (NotInlined) and the named arguments (Callee and Caller)
      without parsing the text of the message.
      
      Subsequent patches will update ORE users to use the new streaming API.
      
      * I am using YAML I/O for writing the YAML file.  YAML I/O requires you
      to specify reading and writing at once but reading is highly non-trivial
      for some of the more complex LLVM types.  Since it's not clear that we
      (ever) want to use LLVM to parse this YAML file, the code supports and
      asserts that we're writing only.
      
      On the other hand, I did experiment that the class hierarchy starting at
      DiagnosticInfoOptimizationBase can be mapped back from YAML generated
      here (see D24479).
      
      * The YAML stream is stored in the LLVM context.
      
      * In the example, we can probably further specify the IR value used,
      i.e. print "Function" rather than "Value".
      
      * As before hotness is computed in the analysis pass instead of
      DiganosticInfo.  This avoids the layering problem since BFI is in
      Analysis while DiagnosticInfo is in IR.
      
      [1] https://reviews.llvm.org/D19678#419445
      
      Differential Revision: https://reviews.llvm.org/D24587
      
      llvm-svn: 282539
      a62b7e1a
    • Adam Nemet's avatar
      Revert "Output optimization remarks in YAML" · cc2a3fa8
      Adam Nemet authored
      This reverts commit r282499.
      
      The GCC bots are failing
      
      llvm-svn: 282503
      cc2a3fa8
    • Adam Nemet's avatar
      Output optimization remarks in YAML · 92e928c1
      Adam Nemet authored
      This allows various presentation of this data using an external tool.
      This was first recommended here[1].
      
      As an example, consider this module:
      
        1 int foo();
        2 int bar();
        3
        4 int baz() {
        5   return foo() + bar();
        6 }
      
      The inliner generates these missed-optimization remarks today (the
      hotness information is pulled from PGO):
      
        remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
        remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
      
      Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
      
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: foo
          - String:  will not be inlined into
          - Caller: baz
        ...
        --- !Missed
        Pass:            inline
        Name:            NotInlined
        DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
        Function:        baz
        Hotness:         30
        Args:
          - Callee: bar
          - String:  will not be inlined into
          - Caller: baz
        ...
      
      This is a summary of the high-level decisions:
      
      * There is a new streaming interface to emit optimization remarks.
      E.g. for the inliner remark above:
      
         ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                      DEBUG_TYPE, "NotInlined", &I)
                  << NV("Callee", Callee) << " will not be inlined into "
                  << NV("Caller", CS.getCaller()) << setIsVerbose());
      
      NV stands for named value and allows the YAML client to process a remark
      using its name (NotInlined) and the named arguments (Callee and Caller)
      without parsing the text of the message.
      
      Subsequent patches will update ORE users to use the new streaming API.
      
      * I am using YAML I/O for writing the YAML file.  YAML I/O requires you
      to specify reading and writing at once but reading is highly non-trivial
      for some of the more complex LLVM types.  Since it's not clear that we
      (ever) want to use LLVM to parse this YAML file, the code supports and
      asserts that we're writing only.
      
      On the other hand, I did experiment that the class hierarchy starting at
      DiagnosticInfoOptimizationBase can be mapped back from YAML generated
      here (see D24479).
      
      * The YAML stream is stored in the LLVM context.
      
      * In the example, we can probably further specify the IR value used,
      i.e. print "Function" rather than "Value".
      
      * As before hotness is computed in the analysis pass instead of
      DiganosticInfo.  This avoids the layering problem since BFI is in
      Analysis while DiagnosticInfo is in IR.
      
      [1] https://reviews.llvm.org/D19678#419445
      
      Differential Revision: https://reviews.llvm.org/D24587
      
      llvm-svn: 282499
      92e928c1
    • Ivan Krasin's avatar
      Revert r277556. Add -lowertypetests-bitsets-level to control bitsets generation · 4ff4f21e
      Ivan Krasin authored
      Summary:
      We don't currently need this facility for CFI. Disabling individual hot methods proved
      to be a better strategy in Chrome.
      
      Also, the design of the feature is suboptimal, as pointed out by Peter Collingbourne.
      
      Reviewers: pcc
      
      Subscribers: kcc
      
      Differential Revision: https://reviews.llvm.org/D24948
      
      llvm-svn: 282461
      4ff4f21e
    • Peter Collingbourne's avatar
      LowerTypeTests: Remove unused variable. · 53a852b6
      Peter Collingbourne authored
      llvm-svn: 282456
      53a852b6
    • Peter Collingbourne's avatar
      LowerTypeTests: Create LowerTypeTestsModule class and move implementation... · 6ed92e3f
      Peter Collingbourne authored
      LowerTypeTests: Create LowerTypeTestsModule class and move implementation there. Related simplifications.
      
      llvm-svn: 282455
      6ed92e3f
  21. Sep 26, 2016
    • Piotr Padlewski's avatar
      [thinlto] Basic thinlto fdo heuristic · d9830eb7
      Piotr Padlewski authored
      Summary:
      This patch improves thinlto importer
      by importing 3x larger functions that are called from hot block.
      
      I compared performance with the trunk on spec, and there
      were about 2% on povray and 3.33% on milc. These results seems
      to be consistant and match the results Teresa got with her simple
      heuristic. Some benchmarks got slower but I think they are just
      noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
      more iterations to confirm. Geomean of all benchmarks including the noisy ones
      were about +0.02%.
      
      I see much better improvement on google branch with Easwaran patch
      for pgo callsite inlining (the inliner actually inline those big functions)
      Over all I see +0.5% improvement, and I get +8.65% on povray.
      So I guess we will see much bigger change when Easwaran patch will land
      (it depends on new pass manager), but it is still worth putting this to trunk
      before it.
      
      Implementation details changes:
      - Removed CallsiteCount.
      - ProfileCount got replaced by Hotness
      - hot-import-multiplier is set to 3.0 for now,
      didn't have time to tune it up, but I see that we get most of the interesting
      functions with 3, so there is no much performance difference with higher, and
      binary size doesn't grow as much as with 10.0.
      
      Reviewers: eraman, mehdi_amini, tejohnson
      
      Subscribers: mehdi_amini, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D24638
      
      llvm-svn: 282437
      d9830eb7
  22. Sep 21, 2016
  23. Sep 19, 2016
Loading