Skip to content
  1. Apr 16, 2012
    • Chandler Carruth's avatar
      Add a somewhat hacky heuristic to do something different from whole-loop · 8c0b41d6
      Chandler Carruth authored
      rotation. When there is a loop backedge which is an unconditional
      branch, we will end up with a branch somewhere no matter what. Try
      placing this backedge in a fallthrough position above the loop header as
      that will definitely remove at least one branch from the loop iteration,
      where whole loop rotation may not.
      
      I haven't seen any benchmarks where this is important but loop-blocks.ll
      tests for it, and so this will be covered when I flip the default.
      
      llvm-svn: 154812
      8c0b41d6
    • Chandler Carruth's avatar
      Tweak the loop rotation logic to check whether the loop is naturally · 8c74c7b1
      Chandler Carruth authored
      laid out in a form with a fallthrough into the header and a fallthrough
      out of the bottom. In that case, leave the loop alone because any
      rotation will introduce unnecessary branches. If either side looks like
      it will require an explicit branch, then the rotation won't add any, do
      it to ensure the branch occurs outside of the loop (if possible) and
      maximize the benefit of the fallthrough in the bottom.
      
      llvm-svn: 154806
      8c74c7b1
    • Hal Finkel's avatar
      Remove dead SD nodes after the combining pass. Fixes PR12201. · e0cf6397
      Hal Finkel authored
      llvm-svn: 154786
      e0cf6397
    • Chandler Carruth's avatar
      Rewrite how machine block placement handles loop rotation. · ccc7e42b
      Chandler Carruth authored
      This is a complex change that resulted from a great deal of
      experimentation with several different benchmarks. The one which proved
      the most useful is included as a test case, but I don't know that it
      captures all of the relevant changes, as I didn't have specific
      regression tests for each, they were more the result of reasoning about
      what the old algorithm would possibly do wrong. I'm also failing at the
      moment to craft more targeted regression tests for these changes, if
      anyone has ideas, it would be welcome.
      
      The first big thing broken with the old algorithm is the idea that we
      can take a basic block which has a loop-exiting successor and a looping
      successor and use the looping successor as the layout top in order to
      get that particular block to be the bottom of the loop after layout.
      This happens to work in many cases, but not in all.
      
      The second big thing broken was that we didn't try to select the exit
      which fell into the nearest enclosing loop (to which we exit at all). As
      a consequence, even if the rotation worked perfectly, it would result in
      one of two bad layouts. Either the bottom of the loop would get
      fallthrough, skipping across a nearer enclosing loop and thereby making
      it discontiguous, or it would be forced to take an explicit jump over
      the nearest enclosing loop to earch its successor. The point of the
      rotation is to get fallthrough, so we need it to fallthrough to the
      nearest loop it can.
      
      The fix to the first issue is to actually layout the loop from the loop
      header, and then rotate the loop such that the correct exiting edge can
      be a fallthrough edge. This is actually much easier than I anticipated
      because we can handle all the hard parts of finding a viable rotation
      before we do the layout. We just store that, and then rotate after
      layout is finished. No inner loops get split across the post-rotation
      backedge because we check for them when selecting the rotation.
      
      That fix exposed a latent problem with our exitting block selection --
      we should allow the backedge to point into the middle of some inner-loop
      chain as there is no real penalty to it, the whole point is that it
      *won't* be a fallthrough edge. This may have blocked the rotation at all
      in some cases, I have no idea and no test case as I've never seen it in
      practice, it was just noticed by inspection.
      
      Finally, all of these fixes, and studying the loops they produce,
      highlighted another problem: in rotating loops like this, we sometimes
      fail to align the destination of these backwards jumping edges. Fix this
      by actually walking the backwards edges rather than relying on loopinfo.
      
      This fixes regressions on heapsort if block placement is enabled as well
      as lots of other cases where the previous logic would introduce an
      abundance of unnecessary branches into the execution.
      
      llvm-svn: 154783
      ccc7e42b
  2. Apr 15, 2012
  3. Apr 14, 2012
    • Andrew Trick's avatar
      misched: Added CanHandleTerminators. · 97d5b9cc
      Andrew Trick authored
      This is a special flag for targets that really want their block
      terminators in the DAG. The default scheduler cannot handle this
      correctly, so it becomes the specialized scheduler's responsibility to
      schedule terminators.
      
      llvm-svn: 154712
      97d5b9cc
  4. Apr 13, 2012
    • Benjamin Kramer's avatar
      Reduce malloc traffic in DwarfAccelTable · 330970d6
      Benjamin Kramer authored
      - Don't copy offsets into HashData, the underlying vector won't change once the table is finalized.
      - Allocate HashData and HashDataContents in a BumpPtrAllocator.
      - Allocate string map entries in the same allocator.
      - Random cleanups.
      
      llvm-svn: 154694
      330970d6
  5. Apr 12, 2012
  6. Apr 11, 2012
  7. Apr 10, 2012
  8. Apr 09, 2012
  9. Apr 08, 2012
    • Benjamin Kramer's avatar
      Silence sign-compare warning. · bb6ff087
      Benjamin Kramer authored
      llvm-svn: 154297
      bb6ff087
    • Duncan Sands's avatar
      Only have codegen turn fdiv by a constant into fmul by the reciprocal · 2f1dc381
      Duncan Sands authored
      when -ffast-math, i.e. don't just always do it if the reciprocal can
      be formed exactly.  There is already an IR level transform that does
      that, and it does it more carefully.
      
      llvm-svn: 154296
      2f1dc381
    • Craig Topper's avatar
      Simplify code that tries to do vector extracts for shuffles when the mask... · c8e2d91a
      Craig Topper authored
      Simplify code that tries to do vector extracts for shuffles when the mask width and the input vector widths don't match. No need to check the min and max are in range before calculating the start index. The range check after having the start index is sufficient. Also no need to check for an extract from the beginning differently.
      
      llvm-svn: 154295
      c8e2d91a
    • Chandler Carruth's avatar
      Move the TLSModel information into the TargetMachine rather than hiding · 16f0ebcb
      Chandler Carruth authored
      in TargetLowering. There was already a FIXME about this location being
      odd. The interface is simplified as a consequence. This will also make
      it easier to change TLS models when compiling with PIE.
      
      llvm-svn: 154292
      16f0ebcb
    • Chandler Carruth's avatar
      Remove an over zealous assert. The assert was trying to catch places · bed1abf9
      Chandler Carruth authored
      where a chain outside of the loop block-set ended up in the worklist for
      scheduling as part of the contiguous loop. However, asserting the first
      block in the chain is in the loop-set isn't a valid check -- we may be
      forced to drag a chain into the worklist due to one block in the chain
      being part of the loop even though the first block is *not* in the loop.
      This occurs when we have been forced to form a chain early due to
      un-analyzable branches.
      
      No test case here as I have no idea how to even begin reducing one, and
      it will be hopelessly fragile. We have to somehow end up with a loop
      header of an inner loop which is a successor of a basic block with an
      unanalyzable pair of branch instructions. Ow. Self-host triggers it so
      it is unlikely it will regress.
      
      This at least gets block placement back to passing selfhost and the test
      suite. There are still a lot of slowdown that I don't like coming out of
      block placement, although there are now also a lot of speedups. =[ I'm
      seeing swings in both directions up to 10%. I'm going to try to find
      time to dig into this and see if we can turn this on for 3.1 as it does
      a really good job of cleaning up after some loops that degraded with the
      inliner changes.
      
      llvm-svn: 154287
      bed1abf9
    • Chandler Carruth's avatar
      Add a debug-only 'dump' method to the BlockChain structure to ease · 49158908
      Chandler Carruth authored
      debugging.
      
      llvm-svn: 154286
      49158908
    • Craig Topper's avatar
      Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and... · d024cef2
      Craig Topper authored
      Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1.
      
      llvm-svn: 154272
      d024cef2
  10. Apr 07, 2012
Loading