Skip to content
  1. Dec 02, 2011
  2. Dec 01, 2011
  3. Nov 29, 2011
  4. Nov 28, 2011
  5. Nov 27, 2011
    • Chandler Carruth's avatar
      Prevent rotating the blocks of a loop (and thus getting a backedge to be · 4f567207
      Chandler Carruth authored
      fallthrough) in cases where we might fail to rotate an exit to an outer
      loop onto the end of the loop chain.
      
      Having *some* rotation, but not performing this rotation, is the primary
      fix of thep performance regression with -enable-block-placement for
      Olden/em3d (a whopping 30% regression). Still working on reducing the
      test case that actually exercises this and the new rotation strategy out
      of this code, but I want to check if this regresses other test cases
      first as that may indicate it isn't the correct fix.
      
      llvm-svn: 145195
      4f567207
    • Chandler Carruth's avatar
      Take two on rotating the block ordering of loops. My previous attempt · 03adbd46
      Chandler Carruth authored
      was centered around the premise of laying out a loop in a chain, and
      then rotating that chain. This is good for preserving contiguous layout,
      but bad for actually making sane rotations. In order to keep it safe,
      I had to essentially make it impossible to rotate deeply nested loops.
      The information needed to correctly reason about a deeply nested loop is
      actually available -- *before* we layout the loop. We know the inner
      loops are already fused into chains, etc. We lose information the moment
      we actually lay out the loop.
      
      The solution was the other alternative for this algorithm I discussed
      with Benjamin and some others: rather than rotating the loop
      after-the-fact, try to pick a profitable starting block for the loop's
      layout, and then use our existing layout logic. I was worried about the
      complexity of this "pick" step, but it turns out such complexity is
      needed to handle all the important cases I keep teasing out of benchmarks.
      
      This is, I'm afraid, a bit of a work-in-progress. It is still
      misbehaving on some likely important cases I'm investigating in Olden.
      It also isn't really tested. I'm going to try to craft some interesting
      nested-loop test cases, but it's likely to be extremely time consuming
      and I don't want to go there until I'm sure I'm testing the correct
      behavior. Sadly I can't come up with a way of getting simple, fine
      grained test cases for this logic. We need complex loop structures to
      even trigger much of it.
      
      llvm-svn: 145183
      03adbd46
    • Chandler Carruth's avatar
      Fix an impressive type-o / spell-o Duncan noticed. · 9e466841
      Chandler Carruth authored
      llvm-svn: 145181
      9e466841
    • Chandler Carruth's avatar
      Rework a bit of the implementation of loop block rotation to not rely so · a0545809
      Chandler Carruth authored
      heavily on AnalyzeBranch. That routine doesn't behave as we want given
      that rotation occurs mid-way through re-ordering the function. Instead
      merely check that there are not unanalyzable branching constructs
      present, and then reason about the CFG via successor lists. This
      actually simplifies my mental model for all of this as well.
      
      The concrete result is that we now will rotate more loop chains. I've
      added a test case from Olden highlighting the effect. There is still
      a bit more to do here though in order to regain all of the performance
      in Olden.
      
      llvm-svn: 145179
      a0545809
    • Chandler Carruth's avatar
      Introduce a loop block rotation optimization to the new block placement · 9ffb97e6
      Chandler Carruth authored
      pass. This is designed to achieve one of the important optimizations
      that the old code placement pass did, but more simply.
      
      This is a somewhat rough and *very* conservative version of the
      transform. We could get a lot fancier here if there are profitable cases
      to do so. In particular, this only looks for a single pattern, it
      insists that the loop backedge being rotated away is the last backedge
      in the chain, and it doesn't provide any means of doing better in-loop
      placement due to the rotation. However, it appears that it will handle
      the important loops I am finding in the LLVM test suite.
      
      llvm-svn: 145158
      9ffb97e6
    • Benjamin Kramer's avatar
      Move code into anonymous namespaces. · 7ba71be3
      Benjamin Kramer authored
      llvm-svn: 145154
      7ba71be3
  6. Nov 24, 2011
    • Chandler Carruth's avatar
      Fix a silly use-after-free issue. A much earlier version of this code · 7adee1a0
      Chandler Carruth authored
      need lots of fanciness around retaining a reference to a Chain's slot in
      the BlockToChain map, but that's all gone now. We can just go directly
      to allocating the new chain (which will update the mapping for us) and
      using it.
      
      Somewhat gross mechanically generated test case replicates the issue
      Duncan spotted when actually testing this out.
      
      llvm-svn: 145120
      7adee1a0
    • Chandler Carruth's avatar
      When adding blocks to the list of those which no longer have any CFG · d394bafd
      Chandler Carruth authored
      conflicts, we should only be adding the first block of the chain to the
      list, lest we try to merge into the middle of that chain. Most of the
      places we were doing this we already happened to be looking at the first
      block, but there is no reason to assume that, and in some cases it was
      clearly wrong.
      
      I've added a couple of tests here. One already worked, but I like having
      an explicit test for it. The other is reduced from a test case Duncan
      reduced for me and used to crash. Now it is handled correctly.
      
      llvm-svn: 145119
      d394bafd
  7. Nov 23, 2011
  8. Nov 22, 2011
    • Chandler Carruth's avatar
      Fix a devilish miscompile exposed by block placement. The · ee54feb6
      Chandler Carruth authored
      updateTerminator code didn't correctly handle EH terminators in one very
      specific case. AnalyzeBranch would find no terminator instruction, and
      so the fallback in updateTerminator is to assume fallthrough. This is
      correct, but the destination of the fallthrough was assumed to be the
      first successor.
      
      This is *almost always* true, but in certain cases the loop
      transformations will cause the landing pad to be the first successor!
      Instead of this brittle logic, actually look through the successors for
      a non-landing-pad accessor, and to assert if more than one is found.
      
      This will hopefully fix some (if not all) of the self host miscompiles
      with block placement. Thanks to Benjamin Kramer for reporting, Nick
      Lewycky for an initial stab at a reduction, and Duncan for endless
      advice on EH (which I know nothing about) as well as reviewing the
      actual fix.
      
      llvm-svn: 145062
      ee54feb6
    • Chandler Carruth's avatar
      Fix an obvious omission in the SelectionDAGBuilder where we were · e2530dc8
      Chandler Carruth authored
      dropping weights on the floor for invokes. This was impeding my writing
      further test cases for invoke when interacting with probabilities and
      block placement.
      
      No test case as there doesn't appear to be a way to test this stuff. =/
      Suggestions for a test case of course welcome. I hope to be able to add
      test cases that indirectly cover this eventually by adding probabilities
      to the exceptional edge and reordering blocks as a result.
      
      llvm-svn: 145060
      e2530dc8
    • Rafael Espindola's avatar
      If a register is both an early clobber and part of a tied use, handle the use · 2021f382
      Rafael Espindola authored
      before the clobber so that we copy the value if needed.
      
      Fixes pr11415.
      
      llvm-svn: 145056
      2021f382
  9. Nov 20, 2011
    • Chandler Carruth's avatar
      The logic for breaking the CFG in the presence of hot successors didn't · 18dfac38
      Chandler Carruth authored
      properly account for the *global* probability of the edge being taken.
      This manifested as a very large number of unconditional branches to
      blocks being merged against the CFG even though they weren't
      particularly hot within the CFG.
      
      The fix is to check whether the edge being merged is both locally hot
      relative to other successors for the source block, and globally hot
      compared to other (unmerged) predecessors of the destination block.
      
      This introduces a new crasher on GCC single-source, but it's currently
      behind a flag, and Ben has offered to work on the reduction. =]
      
      llvm-svn: 145010
      18dfac38
  10. Nov 19, 2011
    • Chandler Carruth's avatar
      Move the handling of unanalyzable branches out of the loop-driven chain · f3dc9eff
      Chandler Carruth authored
      formation phase and into the initial walk of the basic blocks. We
      essentially pre-merge all blocks where unanalyzable fallthrough exists,
      as we won't be able to update the terminators effectively after any
      reorderings. This is quite a bit more principled as there may be CFGs
      where the second half of the unanalyzable pair has some analyzable
      predecessor that gets placed first. Then it may get placed next,
      implicitly breaking the unanalyzable branch even though we never even
      looked at the part that isn't analyzable. I've included a test case that
      triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
      some more general ones as I dig into related issues.
      
      Also, to make this new scheme work we have to be able to handle branches
      into the middle of a chain, so add this check. We always fallback on the
      incoming ordering.
      
      Finally, this starts to really underscore a known limitation of the
      current implementation -- we don't consider broken predecessors when
      merging successors. This can caused major missed opportunities, and is
      something I'm planning on looking at next (modulo more bug reports).
      
      llvm-svn: 144994
      f3dc9eff
  11. Nov 18, 2011
  12. Nov 17, 2011
    • Chad Rosier's avatar
      When fast iseling a GEP, accumulate the offset rather than emitting a series of · f83ab704
      Chad Rosier authored
      ADDs.  MaxOffs is used as a threshold to limit the size of the offset. Tradeoffs
      being: (1) If we can't materialize the large constant then we'll cause fast-isel
      to bail. (2) Too large of an offset can't be directly encoded in the ADD
      resulting in a MOV+ADD.  Generally not a bad thing because otherwise we would
      have had ADD+ADD, but on Thumb this turns into a MOVS+MOVT+ADD. Working on a fix
      for that. (3) Conversely, too low of a threshold we'll miss opportunities to 
      coalesce ADDs.
      rdar://10412592
      
      llvm-svn: 144886
      f83ab704
    • Eli Friedman's avatar
      Make sure to replace the chain properly when DAGCombining a... · ff1eaa75
      Eli Friedman authored
      Make sure to replace the chain properly when DAGCombining a LOAD+EXTRACT_VECTOR_ELT into a single LOAD.  Fixes PR10747/PR11393.
      
      llvm-svn: 144863
      ff1eaa75
  13. Nov 16, 2011
Loading