Skip to content
  1. Sep 14, 2013
    • Chandler Carruth's avatar
      Remove the long, long defunct IR block placement pass. · ebeac5cb
      Chandler Carruth authored
      This pass was based on the previous (essentially unused) profiling
      infrastructure and the assumption that by ordering the basic blocks at
      the IR level in a particular way, the correct layout would happen in the
      end. This sometimes worked, and mostly didn't. It also was a really
      naive implementation of the classical paper that dates from when branch
      predictors were primarily directional and when loop structure wasn't
      commonly available. It also didn't factor into the equation
      non-fallthrough branches and other machine level details.
      
      Anyways, for all of these reasons and more, I wrote
      MachineBlockPlacement, which completely supercedes this pass. It both
      uses modern profile information infrastructure, and actually works. =]
      
      llvm-svn: 190748
      ebeac5cb
  2. Sep 11, 2013
  3. Sep 10, 2013
  4. Sep 06, 2013
  5. Aug 29, 2013
    • Hal Finkel's avatar
      Revert: r189565 - Add getUnrollingPreferences to TTI · 8e83820a
      Hal Finkel authored
      Revert unintentional commit (of an unreviewed change).
      
      Original commit message:
      
      Add getUnrollingPreferences to TTI
      
      Allow targets to customize the default behavior of the generic loop unrolling
      transformation. This will be used by the PowerPC backend when targeting the A2
      core (which is in-order with a deep pipeline), and using more aggressive
      defaults is important.
      
      llvm-svn: 189566
      8e83820a
    • Hal Finkel's avatar
      Add getUnrollingPreferences to TTI · 63e6c0e9
      Hal Finkel authored
      Allow targets to customize the default behavior of the generic loop unrolling
      transformation. This will be used by the PowerPC backend when targeting the A2
      core (which is in-order with a deep pipeline), and using more aggressive
      defaults is important.
      
      llvm-svn: 189565
      63e6c0e9
  6. Aug 23, 2013
  7. Aug 14, 2013
    • Nick Lewycky's avatar
      Revert r187191, which broke opt -mem2reg on the testcases included in PR16867. · c7776f73
      Nick Lewycky authored
      However, opt -O2 doesn't run mem2reg directly so nobody noticed until r188146
      when SROA started sending more things directly down the PromoteMemToReg path.
      
      In order to revert r187191, I also revert dependent revisions r187296, r187322
      and r188146. Fixes PR16867. Does not add the testcases from that PR, but both
      of them should get added for both mem2reg and sroa when this revert gets
      unreverted.
      
      llvm-svn: 188327
      c7776f73
  8. Aug 13, 2013
  9. Aug 11, 2013
    • Chandler Carruth's avatar
      Re-instate r187323 which fast-tracks promotable allocas as soon as the · d7cd7e36
      Chandler Carruth authored
      SROA-based analysis has enough information. This should work now that
      both mem2reg *and* the SSAUpdater-based AllocaPromoter have been updated
      to be able to promote the types of allocas that the SROA analysis
      detects.
      
      I've included tests for the AllocaPromoter that were only possible to
      write once we fast-tracked promotable allocas without rewriting them.
      This includes a test both for r187347 and r188145.
      
      Original commit log for r187323:
      """
      Now that mem2reg understands how to cope with a slightly wider set of uses of
      an alloca, we can pre-compute promotability while analyzing an alloca for
      splitting in SROA. That lets us short-circuit the common case of a bunch of
      trivially promotable allocas. This cuts 20% to 30% off the run time of SROA for
      typical frontend-generated IR sequneces I'm seeing. It gets the new SROA to
      within 20% of ScalarRepl for such code. My current benchmark for these numbers
      is PR15412, but it fits the general pattern of IR emitted by Clang so it should
      be widely applicable.
      """
      
      llvm-svn: 188146
      d7cd7e36
    • Chandler Carruth's avatar
      Finish fixing the SSAUpdater-based AllocaPromoter strategy in SROA to cope with · c17283b4
      Chandler Carruth authored
      the more general set of patterns that are now handled by mem2reg and that we
      can detect quickly while doing SROA's initial analysis. Notably, this allows it
      to promote through no-op bitcast and GEP sequences. A core part of the
      SSAUpdater approach is the ability to test whether a particular instruction is
      part of the set being promoted. Testing this becomes significantly more complex
      in the world where the operand to every load and store isn't the alloca itself.
      I ended up using the approach of walking up the def-chain until we find the
      alloca. I benchmarked this against keeping a set of pointer operands and
      keeping a set of the loads and stores we care about, and this one seemed faster
      although the difference was very small.
      
      No test case yet because currently the rewriting always "fixes" the inputs to
      not require this. The next patch which re-enables early promotion of easy cases
      in SROA will include a test case that specifically exercises this aspect of the
      alloca promoter.
      
      llvm-svn: 188145
      c17283b4
    • Chandler Carruth's avatar
      Reformat some bits of AllocaPromoter and simplify the name and type of · 45b136f4
      Chandler Carruth authored
      our visiting datastructures in the AllocaPromoter/SSAUpdater path of
      SROA. Also shift the order if clears around to be more consistent.
      
      No functionality changed here, this is just a cleanup.
      
      llvm-svn: 188144
      45b136f4
  10. Aug 10, 2013
  11. Aug 07, 2013
    • Benjamin Kramer's avatar
      JumpThreading: Turn a select instruction into branching if it allows to thread... · 6a4976d3
      Benjamin Kramer authored
      JumpThreading: Turn a select instruction into branching if it allows to thread one half of the select.
      
      This is a common pattern coming out of simplifycfg generating gross code.
      
      a:                                       ; preds = %entry
        %sel = select i1 %cmp1, double %add, double 0.000000e+00
        br label %b
      
      b:
        %cond5 = phi double [ %sel, %a ], [ %sub, %entry ]
        %cmp6 = fcmp oeq double %cond5, 0.000000e+00
        br i1 %cmp6, label %if.then, label %if.end
      
      becomes
      
      a:
        br i1 %cmp1, label %b, label %if.then
      
      b:
        %cond5 = phi double [ %sub, %entry ], [ %add, %a ]
        %cmp6 = fcmp oeq double %cond5, 0.000000e+00
        br i1 %cmp6, label %if.then, label %if.end
      
      Skipping block b completely if possible.
      
      llvm-svn: 187880
      6a4976d3
  12. Aug 06, 2013
  13. Jul 29, 2013
    • Chandler Carruth's avatar
      Teach the AllocaPromoter which is wrapped around the SSAUpdater · cd7c8cdf
      Chandler Carruth authored
      infrastructure to do promotion without a domtree the same smarts about
      looking through GEPs, bitcasts, etc., that I just taught mem2reg about.
      This way, if SROA chooses to promote an alloca which still has some
      noisy instructions this code can cope with them.
      
      I've not used as principled of an approach here for two reasons:
      1) This code doesn't really need it as we were already set up to zip
         through the instructions used by the alloca.
      2) I view the code here as more of a hack, and hopefully a temporary one.
      
      The SSAUpdater path in SROA is a real sore point for me. It doesn't make
      a lot of architectural sense for many reasons:
      - We're likely to end up needing the domtree anyways in a subsequent
        pass, so why not compute it earlier and use it.
      - In the future we'll likely end up needing the domtree for parts of the
        inliner itself.
      - If we need to we could teach the inliner to preserve the domtree. Part
        of the re-work of the pass manager will allow this to be very powerful
        even in large SCCs with many functions.
      - Ultimately, computing a domtree has gotten significantly faster since
        the original SSAUpdater-using code went into ScalarRepl. We no longer
        use domfrontiers, and much of domtree is lazily done based on queries
        rather than eagerly.
      - At this point keeping the SSAUpdater-based promotion saves a total of
        0.7% on a build of the 'opt' tool for me. That's not a lot of
        performance given the complexity!
      
      So I'm leaving this a bit ugly in the hope that eventually we just
      remove all of this nonsense.
      
      I can't even readily test this because this code isn't reachable except
      through SROA. When I re-instate the patch that fast-tracks allocas
      already suitable for promotion, I'll add a testcase there that failed
      before this change. Before that, SROA will fix any test case I give it.
      
      llvm-svn: 187347
      cd7c8cdf
  14. Jul 28, 2013
  15. Jul 27, 2013
  16. Jul 24, 2013
    • Benjamin Kramer's avatar
      TRE: Move class into anonymous namespace. · 328da33d
      Benjamin Kramer authored
      While there shrink a dangerously large SmallPtrSet.
      
      llvm-svn: 187050
      328da33d
    • Chandler Carruth's avatar
      Fix a problem I introduced in r187029 where we would over-eagerly · 58e25d39
      Chandler Carruth authored
      schedule an alloca for another iteration in SROA. This only showed up
      with a mixture of promotable and unpromotable selects and phis. Added
      a test case for this.
      
      llvm-svn: 187031
      58e25d39
    • Chandler Carruth's avatar
      Fix PR16687 where we were incorrectly promoting an alloca that had · 83ea195d
      Chandler Carruth authored
      pending speculation for a phi node. The problem here is that we were
      using growth of the specluation set as an indicator of whether
      speculation would occur, and if the phi node is already in the set we
      don't see it grow. This is a symptom of the fact that this signal is
      a total hack.
      
      Unfortunately, I couldn't really come up with a non-hacky way of
      signaling that promotion remains valid *after* speculation occurs, such
      that we only speculate when all else looks good for promotion. In the
      end, I went with at least a much more explicit approach of doing the
      work of queuing inside the phi and select processing and setting
      a preposterously named flag to convey that we're in the special state of
      requiring speculating before promotion.
      
      Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing
      a testcase for this from a pretty giant, nasty assert in a big
      application. =] The testcase was excellent.
      
      llvm-svn: 187029
      83ea195d
  17. Jul 23, 2013
  18. Jul 22, 2013
  19. Jul 20, 2013
  20. Jul 19, 2013
Loading