Skip to content
  1. Nov 15, 2011
    • Jakob Stoklund Olesen's avatar
      Check all overlaps when looking for used registers. · e14ef7e6
      Jakob Stoklund Olesen authored
      A function using any RC alias is enough to enable the ExeDepsFix pass.
      
      llvm-svn: 144636
      e14ef7e6
    • Jay Foad's avatar
      Make use of MachinePointerInfo::getFixedStack. · ab9ebd35
      Jay Foad authored
      llvm-svn: 144635
      ab9ebd35
    • Jay Foad's avatar
      Remove some unnecessary includes of PseudoSourceValue.h. · 70679df6
      Jay Foad authored
      llvm-svn: 144634
      70679df6
    • Evan Cheng's avatar
      Set SeenStore to true to prevent loads from being moved; also eliminates a... · 7098c4e5
      Evan Cheng authored
      Set SeenStore to true to prevent loads from being moved; also eliminates a non-deterministic behavior.
      
      llvm-svn: 144628
      7098c4e5
    • Chandler Carruth's avatar
      Rather than trying to use the loop block sequence *or* the function · 9b548a7f
      Chandler Carruth authored
      block sequence when recovering from unanalyzable control flow
      constructs, *always* use the function sequence. I'm not sure why I ever
      went down the path of trying to use the loop sequence, it is
      fundamentally not the correct sequence to use. We're trying to preserve
      the incoming layout in the cases of unreasonable control flow, and that
      is only encoded at the function level. We already have a filter to
      select *exactly* the sub-set of blocks within the function that we're
      trying to form into a chain.
      
      The resulting code layout is also significantly better because of this.
      In several places we were ending up with completely unreasonable control
      flow constructs due to the ordering chosen by the loop structure for its
      internal storage. This change removes a completely wasteful vector of
      basic blocks, saving memory allocation in the common case even though it
      costs us CPU in the fairly rare case of unnatural loops. Finally, it
      fixes the latest crasher reduced out of GCC's single source. Thanks
      again to Benjamin Kramer for the reduction, my bugpoint skills failed at
      it.
      
      llvm-svn: 144627
      9b548a7f
    • Jakob Stoklund Olesen's avatar
      Break false dependencies before partial register updates. · f8ad336b
      Jakob Stoklund Olesen authored
      Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix
      about instructions with partial register updates causing false unwanted
      dependencies.
      
      The ExecutionDepsFix pass will break the false dependencies if the
      updated register was written in the previoius N instructions.
      
      The small loop added to sse-domains.ll runs twice as fast with
      dependency-breaking instructions inserted.
      
      llvm-svn: 144602
      f8ad336b
    • Jakob Stoklund Olesen's avatar
      Track register ages more accurately. · 543bef6e
      Jakob Stoklund Olesen authored
      Keep track of the last instruction to define each register individually
      instead of per DomainValue.  This lets us track more accurately when a
      register was last written.
      
      Also track register ages across basic blocks.  When entering a new
      basic block, use the least stale predecessor def as a worst case
      estimate for register age.
      
      The register age is used to arbitrate between conflicting domains. The
      most recently defined register wins.
      
      llvm-svn: 144601
      543bef6e
  2. Nov 14, 2011
    • Evan Cheng's avatar
      Avoid dereferencing off the beginning of lists. · f2fc508d
      Evan Cheng authored
      llvm-svn: 144569
      f2fc508d
    • Evan Cheng's avatar
      At -O0, multiple uses of a virtual registers in the same BB are being marked · 28ffb7e4
      Evan Cheng authored
      "kill". This looks like a bug upstream. Since that's going to take some time
      to understand, loosen the assertion and disable the optimization when
      multiple kills are seen.
      
      llvm-svn: 144568
      28ffb7e4
    • Evan Cheng's avatar
      Teach two-address pass to re-schedule two-address instructions (or the kill · 30f44ad7
      Evan Cheng authored
      instructions of the two-address operands) in order to avoid inserting copies.
      This fixes the few regressions introduced when the two-address hack was
      disabled (without regressing the improvements).
      rdar://10422688
      
      llvm-svn: 144559
      30f44ad7
    • Jakob Stoklund Olesen's avatar
      Fix early-clobber handling in shrinkToUses. · 7e6004a3
      Jakob Stoklund Olesen authored
      I broke this in r144515, it affected most ARM testers.
      
      <rdar://problem/10441389>
      
      llvm-svn: 144547
      7e6004a3
    • Chandler Carruth's avatar
      It helps to deallocate memory as well as allocate it. =] This actually · fd9b4d98
      Chandler Carruth authored
      cleans up all the chains allocated during the processing of each
      function so that for very large inputs we don't just grow memory usage
      without bound.
      
      llvm-svn: 144533
      fd9b4d98
    • Chandler Carruth's avatar
      Remove an over-eager assert that was firing on one of the ARM regression · 0a31d149
      Chandler Carruth authored
      tests when I forcibly enabled block placement.
      
      It is apparantly possible for an unanalyzable block to fallthrough to
      a non-loop block. I don't actually beleive this is correct, I believe
      that 'canFallThrough' is returning true needlessly for the code
      construct, and I've left a bit of a FIXME on the verification code to
      try to track down why this is coming up.
      
      Anyways, removing the assert doesn't degrade the correctness of the algorithm.
      
      llvm-svn: 144532
      0a31d149
    • Chandler Carruth's avatar
      Begin chipping away at one of the biggest quadratic-ish behaviors in · 0af6a0bb
      Chandler Carruth authored
      this pass. We're leaving already merged blocks on the worklist, and
      scanning them again and again only to determine each time through that
      indeed they aren't viable. We can instead remove them once we're going
      to have to scan the worklist. This is the easy way to implement removing
      them. If this remains on the profile (as I somewhat suspect it will), we
      can get a lot more clever here, as the worklist's order is essentially
      irrelevant. We can use swapping and fold the two loops to reduce
      overhead even when there are many blocks on the worklist but only a few
      of them are removed.
      
      llvm-svn: 144531
      0af6a0bb
    • Chandler Carruth's avatar
      Under the hood, MBPI is doing a linear scan of every successor every · 84cd44c7
      Chandler Carruth authored
      time it is queried to compute the probability of a single successor.
      This makes computing the probability of every successor of a block in
      sequence... really really slow. ;] This switches to a linear walk of the
      successors rather than a quadratic one. One of several quadratic
      behaviors slowing this pass down.
      
      I'm not really thrilled with moving the sum code into the public
      interface of MBPI, but I don't (at the moment) have ideas for a better
      interface. My direction I'm thinking in for a better interface is to
      have MBPI actually retain much more state and make *all* of these
      queries cheap. That's a lot of work, and would require invasive changes.
      Until then, this seems like the least bad (ie, least quadratic)
      solution. Suggestions welcome.
      
      llvm-svn: 144530
      84cd44c7
    • Chandler Carruth's avatar
      Reuse the logic in getEdgeProbability within getHotSucc in order to · a9e71faa
      Chandler Carruth authored
      correctly handle blocks whose successor weights sum to more than
      UINT32_MAX. This is slightly less efficient, but the entire thing is
      already linear on the number of successors. Calling it within any hot
      routine is a mistake, and indeed no one is calling it. It also
      simplifies the code.
      
      llvm-svn: 144527
      a9e71faa
    • Chandler Carruth's avatar
      Fix an overflow bug in MachineBranchProbabilityInfo. This pass relied on · ed5aa547
      Chandler Carruth authored
      the sum of the edge weights not overflowing uint32, and crashed when
      they did. This is generally safe as BranchProbabilityInfo tries to
      provide this guarantee. However, the CFG can get modified during codegen
      in a way that grows the *sum* of the edge weights. This doesn't seem
      unreasonable (imagine just adding more blocks all with the default
      weight of 16), but it is hard to come up with a case that actually
      triggers 32-bit overflow. Fortuately, the single-source GCC build is
      good at this. The solution isn't very pretty, but its no worse than the
      previous code. We're already summing all of the edge weights on each
      query, we can sum them, check for an overflow, compute a scale, and sum
      them again.
      
      I've included a *greatly* reduced test case out of the GCC source that
      triggers it. It's a pretty lame test, as it clearly is just barely
      triggering the overflow. I'd like to have something that is much more
      definitive, but I don't understand the fundamental pattern that triggers
      an explosion in the edge weight sums.
      
      The buggy code is duplicated within this file. I'll colapse them into
      a single implementation in a subsequent commit.
      
      llvm-svn: 144526
      ed5aa547
    • Jakob Stoklund Olesen's avatar
      Use getVNInfoBefore() when it makes sense. · d7bcf43d
      Jakob Stoklund Olesen authored
      llvm-svn: 144517
      d7bcf43d
    • Chandler Carruth's avatar
      Teach machine block placement to cope with unnatural loops. These don't · 1071cfa4
      Chandler Carruth authored
      get loop info structures associated with them, and so we need some way
      to make forward progress selecting and placing basic blocks. The
      technique used here is pretty brutal -- it just scans the list of blocks
      looking for the first unplaced candidate. It keeps placing blocks like
      this until the CFG becomes tractable.
      
      The cost is somewhat unfortunate, it requires allocating a vector of all
      basic block pointers eagerly. I have some ideas about how to simplify
      and optimize this, but I'm trying to get the logic correct first.
      
      Thanks to Benjamin Kramer for the reduced test case out of GCC. Sadly
      there are other bugs that GCC is tickling that I'm reducing and working
      on now.
      
      llvm-svn: 144516
      1071cfa4
    • Jakob Stoklund Olesen's avatar
      Use kill slots instead of the previous slot in shrinkToUses. · 69797902
      Jakob Stoklund Olesen authored
      It's more natural to use the actual end points.
      
      llvm-svn: 144515
      69797902
  3. Nov 13, 2011
Loading