Skip to content
  1. Oct 19, 2016
  2. Oct 18, 2016
    • Dehao Chen's avatar
      Using branch probability to guide critical edge splitting. · ea62ae98
      Dehao Chen authored
      Summary:
      The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
      
      The performance impact on speccpu2006 on Intel sandybridge machines:
      
      spec/2006/fp/C++/444.namd                  25.3  +0.26%
      spec/2006/fp/C++/447.dealII               45.96  -0.10%
      spec/2006/fp/C++/450.soplex               41.97  +1.49%
      spec/2006/fp/C++/453.povray               36.83  -0.96%
      spec/2006/fp/C/433.milc                   23.81  +0.32%
      spec/2006/fp/C/470.lbm                    41.17  +0.34%
      spec/2006/fp/C/482.sphinx3                48.13  +0.69%
      spec/2006/int/C++/471.omnetpp             22.45  +3.25%
      spec/2006/int/C++/473.astar               21.35  -2.06%
      spec/2006/int/C++/483.xalancbmk           36.02  -2.39%
      spec/2006/int/C/400.perlbench              33.7  -0.17%
      spec/2006/int/C/401.bzip2                  22.9  +0.52%
      spec/2006/int/C/403.gcc                   32.42  -0.54%
      spec/2006/int/C/429.mcf                   39.59  +0.19%
      spec/2006/int/C/445.gobmk                 26.98  -0.00%
      spec/2006/int/C/456.hmmer                 24.52  -0.18%
      spec/2006/int/C/458.sjeng                 28.26  +0.02%
      spec/2006/int/C/462.libquantum            55.44  +3.74%
      spec/2006/int/C/464.h264ref               46.67  -0.39%
      
      geometric mean                                   +0.20%
      
      Manually checked 473 and 471 to verify the diff is in the noise range.
      
      Reviewers: rengolin, davidxl
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D24818
      
      llvm-svn: 284541
      ea62ae98
  3. Aug 25, 2016
  4. Jul 15, 2016
    • Jacques Pienaar's avatar
      Rename AnalyzeBranch* to analyzeBranch*. · 71c30a14
      Jacques Pienaar authored
      Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect.
      
      Reviewers: tstellarAMD, mcrosier
      
      Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai
      
      Differential Revision: https://reviews.llvm.org/D22409
      
      llvm-svn: 275564
      71c30a14
  5. Jul 01, 2016
  6. Jun 30, 2016
    • Duncan P. N. Exon Smith's avatar
      CodeGen: Use MachineInstr& in TargetInstrInfo, NFC · 9cfc75c2
      Duncan P. N. Exon Smith authored
      This is mostly a mechanical change to make TargetInstrInfo API take
      MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
      when the argument is expected to be a valid MachineInstr.  This is a
      general API improvement.
      
      Although it would be possible to do this one function at a time, that
      would demand a quadratic amount of churn since many of these functions
      call each other.  Instead I've done everything as a block and just
      updated what was necessary.
      
      This is mostly mechanical fixes: adding and removing `*` and `&`
      operators.  The only non-mechanical change is to split
      ARMBaseInstrInfo::getOperandLatencyImpl out from
      ARMBaseInstrInfo::getOperandLatency.  Previously, the latter took a
      `MachineInstr*` which it updated to the instruction bundle leader; now,
      the latter calls the former either with the same `MachineInstr&` or the
      bundle leader.
      
      As a side effect, this removes a bunch of MachineInstr* to
      MachineBasicBlock::iterator implicit conversions, a necessary step
      toward fixing PR26753.
      
      Note: I updated WebAssembly, Lanai, and AVR (despite being
      off-by-default) since it turned out to be easy.  I couldn't run tests
      for AVR since llc doesn't link with it turned on.
      
      llvm-svn: 274189
      9cfc75c2
  7. Apr 23, 2016
  8. Apr 22, 2016
  9. Apr 21, 2016
    • Quentin Colombet's avatar
      [MachineBasicBlock] Make the pass argument truly mandatory when · 23341a84
      Quentin Colombet authored
      splitting edges.
      
      MachineBasicBlock::SplitCriticalEdges will crash if a nullptr would have
      been passed for the Pass argument. Do not allow that by turning this
      argument into a reference.
      The alternative would have been to make the Pass a truly optional
      argument, but although this is easy to do, I was afraid users using it
      like this would not be aware the livness information, dominator tree and
      such would silently be broken.
      
      llvm-svn: 267052
      23341a84
    • Andrew Kaylor's avatar
      Initial implementation of optimization bisect support. · f0f27929
      Andrew Kaylor authored
      This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
      
      The bisection is enabled using a new command line option (-opt-bisect-limit).  Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit.  A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
      
      The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check.  Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute.  A new function call has been added for module and SCC passes that behaves in a similar way.
      
      Differential Revision: http://reviews.llvm.org/D19172
      
      llvm-svn: 267022
      f0f27929
  10. Mar 30, 2016
  11. Mar 09, 2016
  12. Feb 18, 2016
  13. Jan 20, 2016
    • Sanjoy Das's avatar
      [MachineSink] Don't break ImplicitNulls · 16901a3e
      Sanjoy Das authored
      Summary:
      This teaches MachineSink to not sink instructions that might break the
      implicit null check optimization that runs later.  This should not
      affect frontends that do not use implicit null checks.
      
      Reviewers: aadg, reames, hfinkel, atrick
      
      Subscribers: majnemer, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D14632
      
      llvm-svn: 258254
      16901a3e
  14. Oct 09, 2015
  15. Sep 09, 2015
    • Chandler Carruth's avatar
      [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible · 7b560d40
      Chandler Carruth authored
      with the new pass manager, and no longer relying on analysis groups.
      
      This builds essentially a ground-up new AA infrastructure stack for
      LLVM. The core ideas are the same that are used throughout the new pass
      manager: type erased polymorphism and direct composition. The design is
      as follows:
      
      - FunctionAAResults is a type-erasing alias analysis results aggregation
        interface to walk a single query across a range of results from
        different alias analyses. Currently this is function-specific as we
        always assume that aliasing queries are *within* a function.
      
      - AAResultBase is a CRTP utility providing stub implementations of
        various parts of the alias analysis result concept, notably in several
        cases in terms of other more general parts of the interface. This can
        be used to implement only a narrow part of the interface rather than
        the entire interface. This isn't really ideal, this logic should be
        hoisted into FunctionAAResults as currently it will cause
        a significant amount of redundant work, but it faithfully models the
        behavior of the prior infrastructure.
      
      - All the alias analysis passes are ported to be wrapper passes for the
        legacy PM and new-style analysis passes for the new PM with a shared
        result object. In some cases (most notably CFL), this is an extremely
        naive approach that we should revisit when we can specialize for the
        new pass manager.
      
      - BasicAA has been restructured to reflect that it is much more
        fundamentally a function analysis because it uses dominator trees and
        loop info that need to be constructed for each function.
      
      All of the references to getting alias analysis results have been
      updated to use the new aggregation interface. All the preservation and
      other pass management code has been updated accordingly.
      
      The way the FunctionAAResultsWrapperPass works is to detect the
      available alias analyses when run, and add them to the results object.
      This means that we should be able to continue to respect when various
      passes are added to the pipeline, for example adding CFL or adding TBAA
      passes should just cause their results to be available and to get folded
      into this. The exception to this rule is BasicAA which really needs to
      be a function pass due to using dominator trees and loop info. As
      a consequence, the FunctionAAResultsWrapperPass directly depends on
      BasicAA and always includes it in the aggregation.
      
      This has significant implications for preserving analyses. Generally,
      most passes shouldn't bother preserving FunctionAAResultsWrapperPass
      because rebuilding the results just updates the set of known AA passes.
      The exception to this rule are LoopPass instances which need to preserve
      all the function analyses that the loop pass manager will end up
      needing. This means preserving both BasicAAWrapperPass and the
      aggregating FunctionAAResultsWrapperPass.
      
      Now, when preserving an alias analysis, you do so by directly preserving
      that analysis. This is only necessary for non-immutable-pass-provided
      alias analyses though, and there are only three of interest: BasicAA,
      GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
      preserved when needed because it (like DominatorTree and LoopInfo) is
      marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
      set everywhere we previously were preserving all of AliasAnalysis, and
      I've added SCEVAA in the intersection of that with where we preserve
      SCEV itself.
      
      One significant challenge to all of this is that the CGSCC passes were
      actually using the alias analysis implementations by taking advantage of
      a pretty amazing set of loop holes in the old pass manager's analysis
      management code which allowed analysis groups to slide through in many
      cases. Moving away from analysis groups makes this problem much more
      obvious. To fix it, I've leveraged the flexibility the design of the new
      PM components provides to just directly construct the relevant alias
      analyses for the relevant functions in the IPO passes that need them.
      This is a bit hacky, but should go away with the new pass manager, and
      is already in many ways cleaner than the prior state.
      
      Another significant challenge is that various facilities of the old
      alias analysis infrastructure just don't fit any more. The most
      significant of these is the alias analysis 'counter' pass. That pass
      relied on the ability to snoop on AA queries at different points in the
      analysis group chain. Instead, I'm planning to build printing
      functionality directly into the aggregation layer. I've not included
      that in this patch merely to keep it smaller.
      
      Note that all of this needs a nearly complete rewrite of the AA
      documentation. I'm planning to do that, but I'd like to make sure the
      new design settles, and to flesh out a bit more of what it looks like in
      the new pass manager first.
      
      Differential Revision: http://reviews.llvm.org/D12080
      
      llvm-svn: 247167
      7b560d40
  16. Aug 28, 2015
  17. Jun 16, 2015
  18. Jun 15, 2015
    • Arnaud A. de Grandmaison's avatar
      [MachineSink] Improve runtime performance. NFC. · d8673edc
      Arnaud A. de Grandmaison authored
      This patch fixes a compilation time issue, when MachineSink faces PHIs
      with a huge number of operands. This can happen for example in goto table
      based interpreters, where some basic blocks can have several of those PHIs,
      each one with several hundreds operands. MachineSink was spending a
      significant time re-building and re-sorting the list of successors of
      the current MachineBasicBlock. The computing and sorting of the current
      MachineBasicBlock successors is now cached.
      
      llvm-svn: 239720
      d8673edc
  19. Jun 01, 2015
  20. May 19, 2015
  21. May 16, 2015
    • Matthias Braun's avatar
      MachineSink: Collect registers before clearing their killflags. · 352b89c4
      Matthias Braun authored
      Currently whenever we sink any instruction, we do clearKillFlags for
      every use of every use operand for that instruction, apparently there
      are a lot of duplication, therefore compile time penalties.
      
      This patch collect all the interested registers first, do clearKillFlags
      for it all together at once at the end, so we only need to do
      clearKillFlags once for one register, duplication is avoided.
      
      Patch by Lawrence Hu!
      
      Differential Revision: http://reviews.llvm.org/D9719
      
      llvm-svn: 237510
      352b89c4
  22. May 08, 2015
  23. Dec 04, 2014
    • Patrik Hagglund's avatar
      Use DomTree in MachineSink to sink over diamonds. · d06de4b9
      Patrik Hagglund authored
      According to a previous FIXME comment we now not only look at MBB
      successors, but also handle code sinking past them:
      
        x = computation
        if () {} else {}
        use x
      
      The instruction could be sunk over the whole diamond for the
      if/then/else (or loop, etc), allowing it to be sunk into other blocks
      after that.
      
      Modified test added in r204522, due to one spill less present.
      
      Minor fixes in comments.
      
      Patch provided by Jonas Paulsson. Reviewed by Hal Finkel.
      
      llvm-svn: 223350
      d06de4b9
  24. Nov 19, 2014
  25. Oct 15, 2014
    • Jingyue Wu's avatar
      [MachineSink] Use the real post dominator tree · 2954280f
      Jingyue Wu authored
      Summary:
      Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in
      isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo.
      The old heuristics caused instructions to sink unnecessarily, and might create
      register pressure.
      
      This is the second try of the fix. The first one (D4814) caused a performance
      regression due to failing to sink instructions out of loops (PR21115). This
      patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower
      one regardless of whether the target block post-dominates the source.
      
      Thanks Alexey Volkov for reporting PR21115! 
      
      Test Plan:
      Added a NVPTX codegen test to verify that our change prevents the backend from
      over-sinking. It also shows the unnecessary register pressure caused by
      over-sinking.
      
      Added an X86 test to verify we can sink instructions out of loops regardless of
      the dominance relationship. This test is reduced from Alexey's test in PR21115.
      
      Updated an affected test in X86.
      
      Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime
      performance. Results are attached separately in the review thread.
      
      Reviewers: Jiangning, resistor, hfinkel
      
      Reviewed By: hfinkel
      
      Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski
      
      Differential Revision: http://reviews.llvm.org/D5633
      
      llvm-svn: 219773
      2954280f
  26. Oct 14, 2014
  27. Oct 01, 2014
  28. Sep 26, 2014
    • Bruno Cardoso Lopes's avatar
      [MachineSink+PGO] Teach MachineSink to use BlockFrequencyInfo · d04f7596
      Bruno Cardoso Lopes authored
      Machine Sink uses loop depth information to select between successors BBs to
      sink machine instructions into, where BBs within smaller loop depths are
      preferable.  This patch adds support for choosing between successors by using
      profile information from BlockFrequencyInfo instead, whenever the information
      is available.
      
      Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5%
      execution speedup in average on x86-64 darwin.
      
      <rdar://problem/18021659>
      
      llvm-svn: 218472
      d04f7596
  29. Sep 09, 2014
    • Patrik Hagglund's avatar
      [MachineSinking] Conservatively clear kill flags after coalescing. · 57d315b7
      Patrik Hagglund authored
      This solves the problem of having a kill flag inside a loop
      with a definition of the register prior to the loop:
      
      %vreg368<def> ...
      
      Inside loop:
      
              %vreg520<def> = COPY %vreg368
              %vreg568<def,tied1> = add %vreg341<tied0>, %vreg520<kill>
      
      => was coalesced into =>
      
              %vreg568<def,tied1> = add %vreg341<tied0>, %vreg368<kill>
      
      MachineVerifier then complained:
      *** Bad machine code: Virtual register killed in block, but needed live out. ***
      
      The kill flag for %vreg368 is incorrect, and is cleared by this patch.
      
      This is similar to the clearing done at the end of
      MachineSinking::SinkInstruction().
      
      Patch provided by Jonas Paulsson.
      
      Reviewed by Quentin Colombet and Juergen Ributzka.
      
      llvm-svn: 217427
      57d315b7
  30. Sep 04, 2014
  31. Sep 01, 2014
    • Jingyue Wu's avatar
      [MachineSink] Use the real post dominator tree · 5208cc5d
      Jingyue Wu authored
      Summary:
      Fixes a FIXME in MachineSinking. Instead of using the simple heuristics
      in isPostDominatedBy, use the real MachinePostDominatorTree. The old
      heuristics caused instructions to sink unnecessarily, and might create
      register pressure.
      
      Test Plan:
      Added a NVPTX codegen test to verify that our change is in effect. It also
      shows the unnecessary register pressure caused by over-sinking. Updated
      affected tests in AArch64 and X86.
      
      Reviewers: eliben, meheff, Jiangning
      
      Reviewed By: Jiangning
      
      Subscribers: jholewinski, aemerson, mcrosier, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D4814
      
      llvm-svn: 216862
      5208cc5d
  32. Aug 30, 2014
    • Juergen Ributzka's avatar
      [MachineSinking] Clear kill flag of all operands at all their uses. · 00d78221
      Juergen Ributzka authored
      When sinking an instruction it might be moved past the original last use of one
      of its operands. This last use has the kill flag set and the verifier will
      obviously complain about this.
      
      Before Machine Sinking (AArch64):
      %vreg3<def> = ASRVXr %vreg1, %vreg2<kill>
      %XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def>
      ...
      
      After Machine Sinking:
      %XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def>
      ...
      %vreg3<def> = ASRVXr %vreg1, %vreg2<kill>
      
      This fix clears all the kill flags in all instruction that use the same operands
      as the instruction that is being sunk.
      
      This fixes rdar://problem/18180996.
      
      llvm-svn: 216803
      00d78221
  33. Aug 12, 2014
    • Quentin Colombet's avatar
      [MachineSink] Improve the compile time by preserving the dominance information · 5cded89d
      Quentin Colombet authored
      as long as possible.
      
      ** Context **
      
      Each time the dominance information is modified, the dominator tree analysis
      switches in a slow query mode. After a few queries without any modification on
      the dominator tree, it performs an expensive update of its internal structure to
      provide fast queries again.
      
      ** Problem **
      
      Prior to this patch, the MachineSink pass was splitting the critical edges on
      demand while relying heavy on the dominator tree information. In some cases,
      this leads to pathological behavior where:
      - We end up in the slow query mode right after splitting an edge.
      - We update the dominance information.
      - We break the dominance information again, thus ending up in the slow query
        mode and so on.
      
      ** Proposed Solution **
      
      To mitigate this effect, this patch postpones all the splitting of the edges at
      the end of each iteration of the main loop.
      The benefits are:
      - The dominance information is valid for the life time of an iteration.
      - This simplifies the code as we do not have to special treat instructions that
        are sunk on critical edges. Indeed, the related block will be available
        through the next iteration.
      
      The downside is that when edges splitting is required, this incurs an additional
      iteration of the main loop compared to the previous scheme.
      
      ** Performance **
      
      Thanks to this patch, the motivating example compiles in 6+ minutes instead of
      10+ minutes. No test case added as the motivating example as nothing special but
      being huge!
      
      I have measured only noise for both the compile time and the runtime on the llvm
      test-suite + SPECs with Os and O3.
      
      Note: The current implementation of MachineBasicBlock::SplitCriticalEdge also
      uses the dominance information and therefore, hits this problem. A subsequent
      patch will address that.
      
      <rdar://problem/17894619>
      
      llvm-svn: 215410
      5cded89d
  34. Aug 04, 2014
  35. Jul 29, 2014
  36. Apr 22, 2014
    • Chandler Carruth's avatar
      [Modules] Remove potential ODR violations by sinking the DEBUG_TYPE · 1b9dde08
      Chandler Carruth authored
      define below all header includes in the lib/CodeGen/... tree. While the
      current modules implementation doesn't check for this kind of ODR
      violation yet, it is likely to grow support for it in the future. It
      also removes one layer of macro pollution across all the included
      headers.
      
      Other sub-trees will follow.
      
      llvm-svn: 206837
      1b9dde08
  37. Apr 14, 2014
  38. Mar 31, 2014
Loading