Skip to content
  1. Nov 19, 2011
    • Chandler Carruth's avatar
      Move the handling of unanalyzable branches out of the loop-driven chain · f3dc9eff
      Chandler Carruth authored
      formation phase and into the initial walk of the basic blocks. We
      essentially pre-merge all blocks where unanalyzable fallthrough exists,
      as we won't be able to update the terminators effectively after any
      reorderings. This is quite a bit more principled as there may be CFGs
      where the second half of the unanalyzable pair has some analyzable
      predecessor that gets placed first. Then it may get placed next,
      implicitly breaking the unanalyzable branch even though we never even
      looked at the part that isn't analyzable. I've included a test case that
      triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
      some more general ones as I dig into related issues.
      
      Also, to make this new scheme work we have to be able to handle branches
      into the middle of a chain, so add this check. We always fallback on the
      incoming ordering.
      
      Finally, this starts to really underscore a known limitation of the
      current implementation -- we don't consider broken predecessors when
      merging successors. This can caused major missed opportunities, and is
      something I'm planning on looking at next (modulo more bug reports).
      
      llvm-svn: 144994
      f3dc9eff
  2. Nov 18, 2011
  3. Nov 17, 2011
    • Chad Rosier's avatar
      When fast iseling a GEP, accumulate the offset rather than emitting a series of · f83ab704
      Chad Rosier authored
      ADDs.  MaxOffs is used as a threshold to limit the size of the offset. Tradeoffs
      being: (1) If we can't materialize the large constant then we'll cause fast-isel
      to bail. (2) Too large of an offset can't be directly encoded in the ADD
      resulting in a MOV+ADD.  Generally not a bad thing because otherwise we would
      have had ADD+ADD, but on Thumb this turns into a MOVS+MOVT+ADD. Working on a fix
      for that. (3) Conversely, too low of a threshold we'll miss opportunities to 
      coalesce ADDs.
      rdar://10412592
      
      llvm-svn: 144886
      f83ab704
    • Eli Friedman's avatar
      Make sure to replace the chain properly when DAGCombining a... · ff1eaa75
      Eli Friedman authored
      Make sure to replace the chain properly when DAGCombining a LOAD+EXTRACT_VECTOR_ELT into a single LOAD.  Fixes PR10747/PR11393.
      
      llvm-svn: 144863
      ff1eaa75
  4. Nov 16, 2011
  5. Nov 15, 2011
  6. Nov 14, 2011
    • Evan Cheng's avatar
      Avoid dereferencing off the beginning of lists. · f2fc508d
      Evan Cheng authored
      llvm-svn: 144569
      f2fc508d
    • Evan Cheng's avatar
      At -O0, multiple uses of a virtual registers in the same BB are being marked · 28ffb7e4
      Evan Cheng authored
      "kill". This looks like a bug upstream. Since that's going to take some time
      to understand, loosen the assertion and disable the optimization when
      multiple kills are seen.
      
      llvm-svn: 144568
      28ffb7e4
    • Evan Cheng's avatar
      Teach two-address pass to re-schedule two-address instructions (or the kill · 30f44ad7
      Evan Cheng authored
      instructions of the two-address operands) in order to avoid inserting copies.
      This fixes the few regressions introduced when the two-address hack was
      disabled (without regressing the improvements).
      rdar://10422688
      
      llvm-svn: 144559
      30f44ad7
    • Jakob Stoklund Olesen's avatar
      Fix early-clobber handling in shrinkToUses. · 7e6004a3
      Jakob Stoklund Olesen authored
      I broke this in r144515, it affected most ARM testers.
      
      <rdar://problem/10441389>
      
      llvm-svn: 144547
      7e6004a3
    • Chandler Carruth's avatar
      It helps to deallocate memory as well as allocate it. =] This actually · fd9b4d98
      Chandler Carruth authored
      cleans up all the chains allocated during the processing of each
      function so that for very large inputs we don't just grow memory usage
      without bound.
      
      llvm-svn: 144533
      fd9b4d98
    • Chandler Carruth's avatar
      Remove an over-eager assert that was firing on one of the ARM regression · 0a31d149
      Chandler Carruth authored
      tests when I forcibly enabled block placement.
      
      It is apparantly possible for an unanalyzable block to fallthrough to
      a non-loop block. I don't actually beleive this is correct, I believe
      that 'canFallThrough' is returning true needlessly for the code
      construct, and I've left a bit of a FIXME on the verification code to
      try to track down why this is coming up.
      
      Anyways, removing the assert doesn't degrade the correctness of the algorithm.
      
      llvm-svn: 144532
      0a31d149
    • Chandler Carruth's avatar
      Begin chipping away at one of the biggest quadratic-ish behaviors in · 0af6a0bb
      Chandler Carruth authored
      this pass. We're leaving already merged blocks on the worklist, and
      scanning them again and again only to determine each time through that
      indeed they aren't viable. We can instead remove them once we're going
      to have to scan the worklist. This is the easy way to implement removing
      them. If this remains on the profile (as I somewhat suspect it will), we
      can get a lot more clever here, as the worklist's order is essentially
      irrelevant. We can use swapping and fold the two loops to reduce
      overhead even when there are many blocks on the worklist but only a few
      of them are removed.
      
      llvm-svn: 144531
      0af6a0bb
    • Chandler Carruth's avatar
      Under the hood, MBPI is doing a linear scan of every successor every · 84cd44c7
      Chandler Carruth authored
      time it is queried to compute the probability of a single successor.
      This makes computing the probability of every successor of a block in
      sequence... really really slow. ;] This switches to a linear walk of the
      successors rather than a quadratic one. One of several quadratic
      behaviors slowing this pass down.
      
      I'm not really thrilled with moving the sum code into the public
      interface of MBPI, but I don't (at the moment) have ideas for a better
      interface. My direction I'm thinking in for a better interface is to
      have MBPI actually retain much more state and make *all* of these
      queries cheap. That's a lot of work, and would require invasive changes.
      Until then, this seems like the least bad (ie, least quadratic)
      solution. Suggestions welcome.
      
      llvm-svn: 144530
      84cd44c7
Loading