Skip to content
  1. Mar 04, 2014
  2. Mar 03, 2014
    • Diego Novillo's avatar
      Pass to emit DWARF path discriminators. · f5041ce5
      Diego Novillo authored
      DWARF discriminators are used to distinguish multiple control flow paths
      on the same source location. When this happens, instructions across
      basic block boundaries will share the same debug location.
      
      This pass detects this situation and creates a new lexical scope to one
      of the two instructions. This lexical scope is a child scope of the
      original and contains a new discriminator value. This discriminator is
      then picked up from MCObjectStreamer::EmitDwarfLocDirective to be
      written on the object file.
      
      This fixes http://llvm.org/bugs/show_bug.cgi?id=18270.
      
      llvm-svn: 202752
      f5041ce5
  3. Mar 02, 2014
  4. Feb 25, 2014
  5. Feb 21, 2014
  6. Feb 18, 2014
  7. Feb 10, 2014
    • Chandler Carruth's avatar
      [LPM] A terribly simple fix to a terribly complex bug: PR18773. · 756c22cd
      Chandler Carruth authored
      The crux of the issue is that LCSSA doesn't preserve stateful alias
      analyses. Before r200067, LICM didn't cause LCSSA to run in the LTO pass
      manager, where LICM runs essentially without any of the other loop
      passes. As a consequence the globalmodref-aa pass run before that loop
      pass manager was able to survive the loop pass manager and be used by
      DSE to eliminate stores in the function called from the loop body in
      Adobe-C++/loop_unroll (and similar patterns in other benchmarks).
      
      When LICM was taught to preserve LCSSA it had to require it as well.
      This caused it to be run in the loop pass manager and because it did not
      preserve AA, the stateful AA was lost. Most of LLVM's AA isn't stateful
      and so this didn't manifest in most cases. Also, in most cases LCSSA was
      already running, and so there was no interesting change.
      
      The real kicker is that LCSSA by its definition (injecting PHI nodes
      only) trivially preserves AA! All we need to do is mark it, and then
      everything goes back to working as intended. It probably was blocking
      some other weird cases of stateful AA but the only one I have is
      a 1000-line IR test case from loop_unroll, so I don't really have a good
      test case here.
      
      Hopefully this fixes the regressions on performance that have been seen
      since that revision.
      
      llvm-svn: 201104
      756c22cd
  8. Feb 04, 2014
  9. Feb 02, 2014
    • Duncan P. N. Exon Smith's avatar
      Lower llvm.expect intrinsic correctly for i1 · 1ff08e38
      Duncan P. N. Exon Smith authored
      LowerExpectIntrinsic previously only understood the idiom of an expect
      intrinsic followed by a comparison with zero. For llvm.expect.i1, the
      comparison would be stripped by the early-cse pass.
      
      Patch by Daniel Micay.
      
      llvm-svn: 200664
      1ff08e38
  10. Feb 01, 2014
  11. Jan 29, 2014
    • Chandler Carruth's avatar
      [LPM] Fix PR18643, another scary place where loop transforms failed to · d4be9dc0
      Chandler Carruth authored
      preserve loop simplify of enclosing loops.
      
      The problem here starts with LoopRotation which ends up cloning code out
      of the latch into the new preheader it is buidling. This can create
      a new edge from the preheader into the exit block of the loop which
      breaks LoopSimplify form. The code tries to fix this by splitting the
      critical edge between the latch and the exit block to get a new exit
      block that only the latch dominates. This sadly isn't sufficient.
      
      The exit block may be an exit block for multiple nested loops. When we
      clone an edge from the latch of the inner loop to the new preheader
      being built in the outer loop, we create an exiting edge from the outer
      loop to this exit block. Despite breaking the LoopSimplify form for the
      inner loop, this is fine for the outer loop. However, when we split the
      edge from the inner loop to the exit block, we create a new block which
      is in neither the inner nor outer loop as the new exit block. This is
      a predecessor to the old exit block, and so the split itself takes the
      outer loop out of LoopSimplify form. We need to split every edge
      entering the exit block from inside a loop nested more deeply than the
      exit block in order to preserve all of the loop simplify constraints.
      
      Once we try to do that, a problem with splitting critical edges
      surfaces. Previously, we tried a very brute force to update LoopSimplify
      form by re-computing it for all exit blocks. We don't need to do this,
      and doing this much will sometimes but not always overlap with the
      LoopRotate bug fix. Instead, the code needs to specifically handle the
      cases which can start to violate LoopSimplify -- they aren't that
      common. We need to see if the destination of the split edge was a loop
      exit block in simplified form for the loop of the source of the edge.
      For this to be true, all the predecessors need to be in the exact same
      loop as the source of the edge being split. If the dest block was
      originally in this form, we have to split all of the deges back into
      this loop to recover it. The old mechanism of doing this was
      conservatively correct because at least *one* of the exiting blocks it
      rewrote was the DestBB and so the DestBB's predecessors were fixed. But
      this is a much more targeted way of doing it. Making it targeted is
      important, because ballooning the set of edges touched prevents
      LoopRotate from being able to split edges *it* needs to split to
      preserve loop simplify in a coherent way -- the critical edge splitting
      would sometimes find the other edges in need of splitting but not
      others.
      
      Many, *many* thanks for help from Nick reducing these test cases
      mightily. And helping lots with the analysis here as this one was quite
      tricky to track down.
      
      llvm-svn: 200393
      d4be9dc0
  12. Jan 28, 2014
    • Rafael Espindola's avatar
      Fix pr14893. · ab73c493
      Rafael Espindola authored
      When simplifycfg moves an instruction, it must drop metadata it doesn't know
      is still valid with the preconditions changes. In particular, it must drop
      the range and tbaa metadata.
      
      The patch implements this with an utility function to drop all metadata not
      in a white list.
      
      llvm-svn: 200322
      ab73c493
    • Chandler Carruth's avatar
      [LPM] Fix PR18616 where the shifts to the loop pass manager to extract · d84f776e
      Chandler Carruth authored
      LCSSA from it caused a crasher with the LoopUnroll pass.
      
      This crasher is really nasty. We destroy LCSSA form in a suprising way.
      When unrolling a loop into an outer loop, we not only need to restore
      LCSSA form for the outer loop, but for all children of the outer loop.
      This is somewhat obvious in retrospect, but hey!
      
      While this seems pretty heavy-handed, it's not that bad. Fundamentally,
      we only do this when we unroll a loop, which is already a heavyweight
      operation. We're unrolling all of these hypothetical inner loops as
      well, so their size and complexity is already on the critical path. This
      is just adding another pass over them to re-canonicalize.
      
      I have a test case from PR18616 that is great for reproducing this, but
      pretty useless to check in as it relies on many 10s of nested empty
      loops that get unrolled and deleted in just the right order. =/ What's
      worse is that investigating this has exposed another source of failure
      that is likely to be even harder to test. I'll try to come up with test
      cases for these fixes, but I want to get the fixes into the tree first
      as they're causing crashes in the wild.
      
      llvm-svn: 200273
      d84f776e
    • Manman Ren's avatar
      PGO branch weight: keep halving the weights until they can fit into · f1cb16e4
      Manman Ren authored
      uint32.
      
      When folding branches to common destination, the updated branch weights
      can exceed uint32 by more than factor of 2. We should keep halving the
      weights until they can fit into uint32.
      
      llvm-svn: 200262
      f1cb16e4
  13. Jan 25, 2014
    • Chandler Carruth's avatar
      [LPM] Make LCSSA a utility with a FunctionPass that applies it to all · 8765cf70
      Chandler Carruth authored
      the loops in a function, and teach LICM to work in the presance of
      LCSSA.
      
      Previously, LCSSA was a loop pass. That made passes requiring it also be
      loop passes and unable to depend on function analysis passes easily. It
      also caused outer loops to have a different "canonical" form from inner
      loops during analysis. Instead, we go into LCSSA form and preserve it
      through the loop pass manager run.
      
      Note that this has the same problem as LoopSimplify that prevents
      enabling its verification -- loop passes which run at the end of the loop
      pass manager and don't preserve these are valid, but the subsequent loop
      pass runs of outer loops that do preserve this pass trigger too much
      verification and fail because the inner loop no longer verifies.
      
      The other problem this exposed is that LICM was completely unable to
      handle LCSSA form. It didn't preserve it and it actually would give up
      on moving instructions in many cases when they were used by an LCSSA phi
      node. I've taught LICM to support detecting LCSSA-form PHI nodes and to
      hoist and sink around them. This may actually let LICM fire
      significantly more because we put everything into LCSSA form to rotate
      the loop before running LICM. =/ Now LICM should handle that fine and
      preserve it correctly. The down side is that LICM has to require LCSSA
      in order to preserve it. This is just a fact of life for LCSSA. It's
      entirely possible we should completely remove LCSSA from the optimizer.
      
      The test updates are essentially accomodating LCSSA phi nodes in the
      output of LICM, and the fact that we now completely sink every
      instruction in ashr-crash below the loop bodies prior to unrolling.
      
      With this change, LCSSA is computed only three times in the pass
      pipeline. One of them could be removed (and potentially a SCEV run and
      a separate LoopPassManager entirely!) if we had a LoopPass variant of
      InstCombine that ran InstCombine on the loop body but refused to combine
      away LCSSA PHI nodes. Currently, this also prevents loop unrolling from
      being in the same loop pass manager is rotate, LICM, and unswitch.
      
      There is one thing that I *really* don't like -- preserving LCSSA in
      LICM is quite expensive. We end up having to re-run LCSSA twice for some
      loops after LICM runs because LICM can undo LCSSA both in the current
      loop and the parent loop. I don't really see good solutions to this
      other than to completely move away from LCSSA and using tools like
      SSAUpdater instead.
      
      llvm-svn: 200067
      8765cf70
  14. Jan 24, 2014
    • Alp Toker's avatar
      Fix known typos · cb402911
      Alp Toker authored
      Sweep the codebase for common typos. Includes some changes to visible function
      names that were misspelt.
      
      llvm-svn: 200018
      cb402911
  15. Jan 23, 2014
    • Chandler Carruth's avatar
      [LPM] Make LoopSimplify no longer a LoopPass and instead both a utility · aa7fa5e4
      Chandler Carruth authored
      function and a FunctionPass.
      
      This has many benefits. The motivating use case was to be able to
      compute function analysis passes *after* running LoopSimplify (to avoid
      invalidating them) and then to run other passes which require
      LoopSimplify. Specifically passes like unrolling and vectorization are
      critical to wire up to BranchProbabilityInfo and BlockFrequencyInfo so
      that they can be profile aware. For the LoopVectorize pass the only
      things in the way are LoopSimplify and LCSSA. This fixes LoopSimplify
      and LCSSA is next on my list.
      
      There are also a bunch of other benefits of doing this:
      - It is now very feasible to make more passes *preserve* LoopSimplify
        because they can simply run it after changing a loop. Because
        subsequence passes can assume LoopSimplify is preserved we can reduce
        the runs of this pass to the times when we actually mutate a loop
        structure.
      - The new pass manager should be able to more easily support loop passes
        factored in this way.
      - We can at long, long last observe that LoopSimplify is preserved
        across SCEV. This *halves* the number of times we run LoopSimplify!!!
      
      Now, getting here wasn't trivial. First off, the interfaces used by
      LoopSimplify are all over the map regarding how analysis are updated. We
      end up with weird "pass" parameters as a consequence. I'll try to clean
      at least some of this up later -- I'll have to have it all clean for the
      new pass manager.
      
      Next up I discovered a really frustrating bug. LoopUnroll *claims* to
      preserve LoopSimplify. That's actually a lie. But the way the
      LoopPassManager ends up running the passes, it always ran LoopSimplify
      on the unrolled-into loop, rectifying this oversight before any
      verification could kick in and point out that in fact nothing was
      preserved. So I've added code to the unroller to *actually* simplify the
      surrounding loop when it succeeds at unrolling.
      
      The only functional change in the test suite is that we now catch a case
      that was previously missed because SCEV and other loop transforms see
      their containing loops as simplified and thus don't miss some
      opportunities. One test case has been converted to check that we catch
      this case rather than checking that we miss it but at least don't get
      the wrong answer.
      
      Note that I have #if-ed out all of the verification logic in
      LoopSimplify! This is a temporary workaround while extracting these bits
      from the LoopPassManager. Currently, there is no way to have a pass in
      the LoopPassManager which preserves LoopSimplify along with one which
      does not. The LPM will try to verify on each loop in the nest that
      LoopSimplify holds but the now-Function-pass cannot distinguish what
      loop is being verified and so must try to verify all of them. The inner
      most loop is clearly no longer simplified as there is a pass which
      didn't even *attempt* to preserve it. =/ Once I get LCSSA out (and maybe
      LoopVectorize and some other fixes) I'll be able to re-enable this check
      and catch any places where we are still failing to preserve
      LoopSimplify. If this causes problems I can back this out and try to
      commit *all* of this at once, but so far this seems to work and allow
      much more incremental progress.
      
      llvm-svn: 199884
      aa7fa5e4
  16. Jan 15, 2014
    • Hans Wennborg's avatar
      Switch-to-lookup tables: set threshold to 3 cases · 4744ac17
      Hans Wennborg authored
      There has been an old FIXME to find the right cut-off for when it's worth
      analyzing and potentially transforming a switch to a lookup table.
      
      The switches always have two or more cases. I could not measure any speed-up
      by transforming a switch with two cases. A switch with three cases gets a nice
      speed-up, and I couldn't measure any compile-time regression, so I think this
      is the right threshold.
      
      In a Clang self-host, this causes 480 new switches to be transformed,
      and reduces the final binary size with 8 KB.
      
      llvm-svn: 199294
      4744ac17
  17. Jan 13, 2014
    • Chandler Carruth's avatar
      [PM] Split DominatorTree into a concrete analysis result object which · 73523021
      Chandler Carruth authored
      can be used by both the new pass manager and the old.
      
      This removes it from any of the virtual mess of the pass interfaces and
      lets it derive cleanly from the DominatorTreeBase<> template. In turn,
      tons of boilerplate interface can be nuked and it turns into a very
      straightforward extension of the base DominatorTree interface.
      
      The old analysis pass is now a simple wrapper. The names and style of
      this split should match the split between CallGraph and
      CallGraphWrapperPass. All of the users of DominatorTree have been
      updated to match using many of the same tricks as with CallGraph. The
      goal is that the common type remains the resulting DominatorTree rather
      than the pass. This will make subsequent work toward the new pass
      manager significantly easier.
      
      Also in numerous places things became cleaner because I switched from
      re-running the pass (!!! mid way through some other passes run!!!) to
      directly recomputing the domtree.
      
      llvm-svn: 199104
      73523021
    • Chandler Carruth's avatar
      [cleanup] Move the Dominators.h and Verifier.h headers into the IR · 5ad5f15c
      Chandler Carruth authored
      directory. These passes are already defined in the IR library, and it
      doesn't make any sense to have the headers in Analysis.
      
      Long term, I think there is going to be a much better way to divide
      these matters. The dominators code should be fully separated into the
      abstract graph algorithm and have that put in Support where it becomes
      obvious that evn Clang's CFGBlock's can use it. Then the verifier can
      manually construct dominance information from the Support-driven
      interface while the Analysis library can provide a pass which both
      caches, reconstructs, and supports a nice update API.
      
      But those are very long term, and so I don't want to leave the really
      confusing structure until that day arrives.
      
      llvm-svn: 199082
      5ad5f15c
  18. Jan 12, 2014
    • Hans Wennborg's avatar
      Switch-to-lookup tables: Don't require a result for the default · ac114a3c
      Hans Wennborg authored
      case when the lookup table doesn't have any holes.
      
      This means we can build a lookup table for switches like this:
      
        switch (x) {
          case 0: return 1;
          case 1: return 2;
          case 2: return 3;
          case 3: return 4;
          default: exit(1);
        }
      
      The default case doesn't yield a constant result here, but that doesn't matter,
      since a default result is only necessary for filling holes in the lookup table,
      and this table doesn't have any holes.
      
      This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB
      off the resulting clang binary.
      
      llvm-svn: 199025
      ac114a3c
  19. Jan 07, 2014
  20. Jan 06, 2014
  21. Jan 04, 2014
    • Alp Toker's avatar
      Revert "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things." · 5e9f3265
      Alp Toker authored
      This commit was the source of crasher PR18384:
      
      While deleting: label %for.cond127
      An asserting value handle still pointed to this value!
      UNREACHABLE executed at llvm/lib/IR/Value.cpp:671!
      
      Reverting to get the builders green, feel free to re-land after fixing up.
      (Renato has a handy isolated repro if you need it.)
      
      This reverts commit r198478.
      
      llvm-svn: 198503
      5e9f3265
    • Andrew Trick's avatar
      Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things. · aceac974
      Andrew Trick authored
      getSCEV for an ashr instruction creates an intermediate zext
      expression when it truncates its operand.
      
      The operand is initially inside the loop, so the narrow zext
      expression has a non-loop-invariant loop disposition.
      
      LoopSimplify then runs on an outer loop, hoists the ashr operand, and
      properly invalidate the SCEVs that are mapped to value.
      
      The SCEV expression for the ashr is now an AddRec with the hoisted
      value as the now loop-invariant start value.
      
      The LoopDisposition of this wide value was properly invalidated during
      LoopSimplify.
      
      However, if we later get the ashr SCEV again, we again try to create
      the intermediate zext expression. We get the same SCEV that we did
      earlier, and it is still cached because it was never mapped to a
      Value. When we try to create a new AddRec we abort because we're using
      the old non-loop-invariant LoopDisposition.
      
      I don't have a solution for this other than to clear LoopDisposition
      when LoopSimplify hoists things.
      
      I think the long-term strategy should be to perform LoopSimplify on
      all loops before computing SCEV and before running any loop opts on
      individual loops. It's possible we may want to rerun LoopSimplify on
      individual loops, but it should rarely do anything, so rarely require
      invalidating SCEV.
      
      llvm-svn: 198478
      aceac974
  22. Dec 24, 2013
    • Andrew Trick's avatar
      Add support to indvars for optimizing sadd.with.overflow. · 0ba77a07
      Andrew Trick authored
      Split sadd.with.overflow into add + sadd.with.overflow to allow
      analysis and optimization. This should ideally be done after
      InstCombine, which can perform code motion (eventually indvars should
      run after all canonical instcombines). We want ISEL to recombine the
      add and the check, at least on x86.
      
      This is currently under an option for reducing live induction
      variables: -liv-reduce. The next step is reducing liveness of IVs that
      are live out of the overflow check paths. Once the related
      optimizations are fully developed, reviewed and tested, I do expect
      this to become default.
      
      llvm-svn: 197926
      0ba77a07
  23. Dec 23, 2013
  24. Dec 20, 2013
  25. Dec 19, 2013
Loading