Skip to content
  1. Nov 16, 2011
  2. Nov 15, 2011
  3. Nov 14, 2011
    • Evan Cheng's avatar
      Avoid dereferencing off the beginning of lists. · f2fc508d
      Evan Cheng authored
      llvm-svn: 144569
      f2fc508d
    • Evan Cheng's avatar
      At -O0, multiple uses of a virtual registers in the same BB are being marked · 28ffb7e4
      Evan Cheng authored
      "kill". This looks like a bug upstream. Since that's going to take some time
      to understand, loosen the assertion and disable the optimization when
      multiple kills are seen.
      
      llvm-svn: 144568
      28ffb7e4
    • Evan Cheng's avatar
      Teach two-address pass to re-schedule two-address instructions (or the kill · 30f44ad7
      Evan Cheng authored
      instructions of the two-address operands) in order to avoid inserting copies.
      This fixes the few regressions introduced when the two-address hack was
      disabled (without regressing the improvements).
      rdar://10422688
      
      llvm-svn: 144559
      30f44ad7
    • Jakob Stoklund Olesen's avatar
      Fix early-clobber handling in shrinkToUses. · 7e6004a3
      Jakob Stoklund Olesen authored
      I broke this in r144515, it affected most ARM testers.
      
      <rdar://problem/10441389>
      
      llvm-svn: 144547
      7e6004a3
    • Chandler Carruth's avatar
      It helps to deallocate memory as well as allocate it. =] This actually · fd9b4d98
      Chandler Carruth authored
      cleans up all the chains allocated during the processing of each
      function so that for very large inputs we don't just grow memory usage
      without bound.
      
      llvm-svn: 144533
      fd9b4d98
    • Chandler Carruth's avatar
      Remove an over-eager assert that was firing on one of the ARM regression · 0a31d149
      Chandler Carruth authored
      tests when I forcibly enabled block placement.
      
      It is apparantly possible for an unanalyzable block to fallthrough to
      a non-loop block. I don't actually beleive this is correct, I believe
      that 'canFallThrough' is returning true needlessly for the code
      construct, and I've left a bit of a FIXME on the verification code to
      try to track down why this is coming up.
      
      Anyways, removing the assert doesn't degrade the correctness of the algorithm.
      
      llvm-svn: 144532
      0a31d149
    • Chandler Carruth's avatar
      Begin chipping away at one of the biggest quadratic-ish behaviors in · 0af6a0bb
      Chandler Carruth authored
      this pass. We're leaving already merged blocks on the worklist, and
      scanning them again and again only to determine each time through that
      indeed they aren't viable. We can instead remove them once we're going
      to have to scan the worklist. This is the easy way to implement removing
      them. If this remains on the profile (as I somewhat suspect it will), we
      can get a lot more clever here, as the worklist's order is essentially
      irrelevant. We can use swapping and fold the two loops to reduce
      overhead even when there are many blocks on the worklist but only a few
      of them are removed.
      
      llvm-svn: 144531
      0af6a0bb
    • Chandler Carruth's avatar
      Under the hood, MBPI is doing a linear scan of every successor every · 84cd44c7
      Chandler Carruth authored
      time it is queried to compute the probability of a single successor.
      This makes computing the probability of every successor of a block in
      sequence... really really slow. ;] This switches to a linear walk of the
      successors rather than a quadratic one. One of several quadratic
      behaviors slowing this pass down.
      
      I'm not really thrilled with moving the sum code into the public
      interface of MBPI, but I don't (at the moment) have ideas for a better
      interface. My direction I'm thinking in for a better interface is to
      have MBPI actually retain much more state and make *all* of these
      queries cheap. That's a lot of work, and would require invasive changes.
      Until then, this seems like the least bad (ie, least quadratic)
      solution. Suggestions welcome.
      
      llvm-svn: 144530
      84cd44c7
    • Chandler Carruth's avatar
      Reuse the logic in getEdgeProbability within getHotSucc in order to · a9e71faa
      Chandler Carruth authored
      correctly handle blocks whose successor weights sum to more than
      UINT32_MAX. This is slightly less efficient, but the entire thing is
      already linear on the number of successors. Calling it within any hot
      routine is a mistake, and indeed no one is calling it. It also
      simplifies the code.
      
      llvm-svn: 144527
      a9e71faa
    • Chandler Carruth's avatar
      Fix an overflow bug in MachineBranchProbabilityInfo. This pass relied on · ed5aa547
      Chandler Carruth authored
      the sum of the edge weights not overflowing uint32, and crashed when
      they did. This is generally safe as BranchProbabilityInfo tries to
      provide this guarantee. However, the CFG can get modified during codegen
      in a way that grows the *sum* of the edge weights. This doesn't seem
      unreasonable (imagine just adding more blocks all with the default
      weight of 16), but it is hard to come up with a case that actually
      triggers 32-bit overflow. Fortuately, the single-source GCC build is
      good at this. The solution isn't very pretty, but its no worse than the
      previous code. We're already summing all of the edge weights on each
      query, we can sum them, check for an overflow, compute a scale, and sum
      them again.
      
      I've included a *greatly* reduced test case out of the GCC source that
      triggers it. It's a pretty lame test, as it clearly is just barely
      triggering the overflow. I'd like to have something that is much more
      definitive, but I don't understand the fundamental pattern that triggers
      an explosion in the edge weight sums.
      
      The buggy code is duplicated within this file. I'll colapse them into
      a single implementation in a subsequent commit.
      
      llvm-svn: 144526
      ed5aa547
    • Jakob Stoklund Olesen's avatar
      Use getVNInfoBefore() when it makes sense. · d7bcf43d
      Jakob Stoklund Olesen authored
      llvm-svn: 144517
      d7bcf43d
    • Chandler Carruth's avatar
      Teach machine block placement to cope with unnatural loops. These don't · 1071cfa4
      Chandler Carruth authored
      get loop info structures associated with them, and so we need some way
      to make forward progress selecting and placing basic blocks. The
      technique used here is pretty brutal -- it just scans the list of blocks
      looking for the first unplaced candidate. It keeps placing blocks like
      this until the CFG becomes tractable.
      
      The cost is somewhat unfortunate, it requires allocating a vector of all
      basic block pointers eagerly. I have some ideas about how to simplify
      and optimize this, but I'm trying to get the logic correct first.
      
      Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly
      there are other bugs that GCC is tickling that I'm reducing and working
      on now.
      
      llvm-svn: 144516
      1071cfa4
    • Jakob Stoklund Olesen's avatar
      Use kill slots instead of the previous slot in shrinkToUses. · 69797902
      Jakob Stoklund Olesen authored
      It's more natural to use the actual end points.
      
      llvm-svn: 144515
      69797902
  4. Nov 13, 2011
    • Chandler Carruth's avatar
      Cleanup some 80-columns violations and poor formatting. These snuck by · c4a2cb34
      Chandler Carruth authored
      when I was reading through the code for style.
      
      llvm-svn: 144513
      c4a2cb34
    • Jakob Stoklund Olesen's avatar
      Terminate all dead defs at the dead slot instead of the 'next' slot. · d8f2405e
      Jakob Stoklund Olesen authored
      This makes no difference for normal defs, but early clobber dead defs
      now look like:
      
        [Slot_EarlyClobber; Slot_Dead)
      
      instead of:
      
        [Slot_EarlyClobber; Slot_Register).
      
      Live ranges for normal dead defs look like:
      
        [Slot_Register; Slot_Dead)
      
      as before.
      
      llvm-svn: 144512
      d8f2405e
    • Jakob Stoklund Olesen's avatar
      Simplify early clobber slots a bit. · ce7cc08f
      Jakob Stoklund Olesen authored
      llvm-svn: 144507
      ce7cc08f
    • Chandler Carruth's avatar
      Enhance the assertion mechanisms in place to make it easier to catch · 8e1d9067
      Chandler Carruth authored
      when we fail to place all the blocks of a loop. Currently this is
      happening for unnatural loops, and this logic helps more immediately
      point to the problem.
      
      llvm-svn: 144504
      8e1d9067
    • Jakob Stoklund Olesen's avatar
      Rename SlotIndexes to match how they are used. · 90b5e565
      Jakob Stoklund Olesen authored
      The old naming scheme (load/use/def/store) can be traced back to an old
      linear scan article, but the names don't match how slots are actually
      used.
      
      The load and store slots are not needed after the deferred spill code
      insertion framework was deleted.
      
      The use and def slots don't make any sense because we are using
      half-open intervals as is customary in C code, but the names suggest
      closed intervals.  In reality, these slots were used to distinguish
      early-clobber defs from normal defs.
      
      The new naming scheme also has 4 slots, but the names match how the
      slots are really used.  This is a purely mechanical renaming, but some
      of the code makes a lot more sense now.
      
      llvm-svn: 144503
      90b5e565
    • Chandler Carruth's avatar
      Teach MBP to force-merge layout successors for blocks with unanalyzable · 0bb42c0f
      Chandler Carruth authored
      branches that also may involve fallthrough. In the case of blocks with
      no fallthrough, we can still re-order the blocks profitably. For example
      instruction decoding will in some cases continue past an indirect jump,
      making laying out its most likely successor there profitable.
      
      Note, no test case. I don't know how to write a test case that exercises
      this logic, but it matches the described desired semantics in
      discussions with Jakob and others. If anyone has a nice example of IR
      that will trigger this, that would be lovely.
      
      Also note, there are still assertion failures in real world code with
      this. I'm digging into those next, now that I know this isn't the cause.
      
      llvm-svn: 144499
      0bb42c0f
    • Chandler Carruth's avatar
      Hoist another gross nested loop into a helper method. · f9213fe7
      Chandler Carruth authored
      llvm-svn: 144498
      f9213fe7
    • Chandler Carruth's avatar
      Add a missing doxygen comment for a helper method. · eb4ec3ae
      Chandler Carruth authored
      llvm-svn: 144497
      eb4ec3ae
    • Chandler Carruth's avatar
      Hoist a nested loop into its own method. · b336172f
      Chandler Carruth authored
      llvm-svn: 144496
      b336172f
    • Chandler Carruth's avatar
      Rewrite #3 of machine block placement. This is based somewhat on the · 8d150789
      Chandler Carruth authored
      second algorithm, but only loosely. It is more heavily based on the last
      discussion I had with Andy. It continues to walk from the inner-most
      loop outward, but there is a key difference. With this algorithm we
      ensure that as we visit each loop, the entire loop is merged into
      a single chain. At the end, the entire function is treated as a "loop",
      and merged into a single chain. This chain forms the desired sequence of
      blocks within the function. Switching to a single algorithm removes my
      biggest problem with the previous approaches -- they had different
      behavior depending on which system triggered the layout. Now there is
      exactly one algorithm and one basis for the decision making.
      
      The other key difference is how the chain is formed. This is based
      heavily on the idea Andy mentioned of keeping a worklist of blocks that
      are viable layout successors based on the CFG. Having this set allows us
      to consistently select the best layout successor for each block. It is
      expensive though.
      
      The code here remains very rough. There is a lot that needs to be done
      to clean up the code, and to make the runtime cost of this pass much
      lower. Very much WIP, but this was a giant chunk of code and I'd rather
      folks see it sooner than later. Everything remains behind a flag of
      course.
      
      I've added a couple of tests to exercise the issues that this iteration
      was motivated by: loop structure preservation. I've also fixed one test
      that was exhibiting the broken behavior of the previous version.
      
      llvm-svn: 144495
      8d150789
    • NAKAMURA Takumi's avatar
      Prune more RALinScan. RALinScan was also here! · 4784df71
      NAKAMURA Takumi authored
      llvm-svn: 144487
      4784df71
    • Jakob Stoklund Olesen's avatar
      More dead code elimination in VirtRegMap. · c601d8c7
      Jakob Stoklund Olesen authored
      This thing is looking a lot like a virtual register map now.
      
      llvm-svn: 144486
      c601d8c7
    • Jakob Stoklund Olesen's avatar
      Stop tracking spill slot uses in VirtRegMap. · 28df7ef8
      Jakob Stoklund Olesen authored
      Nobody cared, StackSlotColoring scans the instructions to find used stack
      slots.
      
      llvm-svn: 144485
      28df7ef8
Loading