Skip to content
  1. Feb 03, 2014
    • Reid Kleckner's avatar
      inalloca: Don't remove dead arguments in the presence of inalloca args · d47a59a4
      Reid Kleckner authored
      It disturbs the layout of the parameters in memory and registers,
      leading to problems in the backend.
      
      The plan for optimizing internal inalloca functions going forward is to
      essentially SROA the argument memory and demote any captured arguments
      (things that aren't trivially written by a load or store) to an indirect
      pointer to a static alloca.
      
      llvm-svn: 200717
      d47a59a4
  2. Feb 02, 2014
  3. Feb 01, 2014
    • Chandler Carruth's avatar
      [LPM] Apply a really big hammer to fix PR18688 by recursively reforming · 1665152c
      Chandler Carruth authored
      LCSSA when we promote to SSA registers inside of LICM.
      
      Currently, this is actually necessary. The promotion logic in LICM uses
      SSAUpdater which doesn't understand how to place LCSSA PHI nodes.
      Teaching it to do so would be a very significant undertaking. It may be
      worthwhile and I've left a FIXME about this in the code as well as
      starting a thread on llvmdev to try to figure out the right long-term
      solution.
      
      For now, the PR needs to be fixed. Short of using the promition
      SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes,
      I don't see a cleaner or cheaper way of achieving this. Fortunately,
      LCSSA is relatively lazy and sparse -- it should only update
      instructions which need it. We can also skip the recursive variant when
      we don't promote to SSA values.
      
      llvm-svn: 200612
      1665152c
    • Eli Bendersky's avatar
      Remove some unused #includes · fc49d198
      Eli Bendersky authored
      llvm-svn: 200611
      fc49d198
    • Reid Kleckner's avatar
      Revert "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..." · a04504fe
      Reid Kleckner authored
      This reverts commit r200576.  It broke 32-bit self-host builds by
      vectorizing two calls to @llvm.bswap.i64, which we then fail to expand.
      
      llvm-svn: 200602
      a04504fe
  4. Jan 31, 2014
    • Chandler Carruth's avatar
      [SLPV] Recognize vectorizable intrinsics during SLP vectorization and · b3da389e
      Chandler Carruth authored
      transform accordingly. Based on similar code from Loop vectorization.
      Subsequent commits will include vectorization of function calls to
      vector intrinsics and form function calls to vector library calls.
      
      Patch by Raul Silvera! (Much delayed due to my not running dcommit)
      
      llvm-svn: 200576
      b3da389e
    • Chandler Carruth's avatar
      [vectorizer] Tweak the way we do small loop runtime unrolling in the · c12224cb
      Chandler Carruth authored
      loop vectorizer to not do so when runtime pointer checks are needed and
      share code with the new (not yet enabled) load/store saturation runtime
      unrolling. Also ensure that we only consider the runtime checks when the
      loop hasn't already been vectorized. If it has, the runtime check cost
      has already been paid.
      
      I've fleshed out a test case to cover the scalar unrolling as well as
      the vector unrolling and comment clearly why we are or aren't following
      the pattern.
      
      llvm-svn: 200530
      c12224cb
    • Bob Wilson's avatar
      Fix a bug in gcov instrumentation introduced by r195513. <rdar://15930350> · 055a0b4c
      Bob Wilson authored
      The entry block of a function starts with all the static allocas. The change
      in r195513 splits the block before those allocas, which has the effect of
      turning them into dynamic allocas. That breaks all sorts of things. Change to
      split after the initial allocas, and also add a comment explaining why the
      block is split.
      
      llvm-svn: 200515
      055a0b4c
  5. Jan 29, 2014
    • Chandler Carruth's avatar
      [LPM] Fix PR18643, another scary place where loop transforms failed to · d4be9dc0
      Chandler Carruth authored
      preserve loop simplify of enclosing loops.
      
      The problem here starts with LoopRotation which ends up cloning code out
      of the latch into the new preheader it is buidling. This can create
      a new edge from the preheader into the exit block of the loop which
      breaks LoopSimplify form. The code tries to fix this by splitting the
      critical edge between the latch and the exit block to get a new exit
      block that only the latch dominates. This sadly isn't sufficient.
      
      The exit block may be an exit block for multiple nested loops. When we
      clone an edge from the latch of the inner loop to the new preheader
      being built in the outer loop, we create an exiting edge from the outer
      loop to this exit block. Despite breaking the LoopSimplify form for the
      inner loop, this is fine for the outer loop. However, when we split the
      edge from the inner loop to the exit block, we create a new block which
      is in neither the inner nor outer loop as the new exit block. This is
      a predecessor to the old exit block, and so the split itself takes the
      outer loop out of LoopSimplify form. We need to split every edge
      entering the exit block from inside a loop nested more deeply than the
      exit block in order to preserve all of the loop simplify constraints.
      
      Once we try to do that, a problem with splitting critical edges
      surfaces. Previously, we tried a very brute force to update LoopSimplify
      form by re-computing it for all exit blocks. We don't need to do this,
      and doing this much will sometimes but not always overlap with the
      LoopRotate bug fix. Instead, the code needs to specifically handle the
      cases which can start to violate LoopSimplify -- they aren't that
      common. We need to see if the destination of the split edge was a loop
      exit block in simplified form for the loop of the source of the edge.
      For this to be true, all the predecessors need to be in the exact same
      loop as the source of the edge being split. If the dest block was
      originally in this form, we have to split all of the deges back into
      this loop to recover it. The old mechanism of doing this was
      conservatively correct because at least *one* of the exiting blocks it
      rewrote was the DestBB and so the DestBB's predecessors were fixed. But
      this is a much more targeted way of doing it. Making it targeted is
      important, because ballooning the set of edges touched prevents
      LoopRotate from being able to split edges *it* needs to split to
      preserve loop simplify in a coherent way -- the critical edge splitting
      would sometimes find the other edges in need of splitting but not
      others.
      
      Many, *many* thanks for help from Nick reducing these test cases
      mightily. And helping lots with the analysis here as this one was quite
      tricky to track down.
      
      llvm-svn: 200393
      d4be9dc0
    • Chandler Carruth's avatar
      [LPM] Fix PR18642, a pretty nasty bug in IndVars that "never mattered" · 66f0b163
      Chandler Carruth authored
      because of the inside-out run of LoopSimplify in the LoopPassManager and
      the fact that LoopSimplify couldn't be "preserved" across two
      independent LoopPassManagers.
      
      Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI
      node because it thought it was rewriting (via SCEV) the incoming value
      to a loop invariant value. While it may well be invariant for the
      current loop, it may be rewritten in terms of an enclosing loop's
      values. This in and of itself is fine, as the LCSSA PHI node in the
      enclosing loop for the inner loop value we're rewriting will have its
      own LCSSA PHI node if used outside of the enclosing loop. With me so
      far?
      
      Well, the current loop and the enclosing loop may share an exiting
      block and exit block, and when they do they also share LCSSA PHI nodes.
      In this case, its not valid to RAUW through the LCSSA PHI node.
      
      Expected crazy test included.
      
      llvm-svn: 200372
      66f0b163
    • Arnold Schwaighofer's avatar
      LoopVectorizer: Don't count the induction variable multiple times · 1aab75ab
      Arnold Schwaighofer authored
      When estimating register pressure, don't count the induction variable mulitple
      times. It is unlikely to be unrolled. This is currently disabled and hidden
      behind a flag ("enable-ind-var-reg-heur").
      
      llvm-svn: 200371
      1aab75ab
  6. Jan 28, 2014
    • Rafael Espindola's avatar
      Fix pr14893. · ab73c493
      Rafael Espindola authored
      When simplifycfg moves an instruction, it must drop metadata it doesn't know
      is still valid with the preconditions changes. In particular, it must drop
      the range and tbaa metadata.
      
      The patch implements this with an utility function to drop all metadata not
      in a white list.
      
      llvm-svn: 200322
      ab73c493
    • Chandler Carruth's avatar
      [vectorizer] Completely disable the block frequency guidance of the loop · b7836285
      Chandler Carruth authored
      vectorizer, placing it behind an off-by-default flag.
      
      It turns out that block frequency isn't what we want at all, here or
      elsewhere. This has been I think a nagging feeling for several of us
      working with it, but Arnold has given some really nice simple examples
      where the results are so comprehensively wrong that they aren't useful.
      
      I'm planning to email the dev list with a summary of why its not really
      useful and a couple of ideas about how to better structure these types
      of heuristics.
      
      llvm-svn: 200294
      b7836285
    • Reid Kleckner's avatar
      Update optimization passes to handle inalloca arguments · 26af2cae
      Reid Kleckner authored
      Summary:
      I searched Transforms/ and Analysis/ for 'ByVal' and updated those call
      sites to check for inalloca if appropriate.
      
      I added tests for any change that would allow an optimization to fire on
      inalloca.
      
      Reviewers: nlewycky
      
      Differential Revision: http://llvm-reviews.chandlerc.com/D2449
      
      llvm-svn: 200281
      26af2cae
    • Chandler Carruth's avatar
      [LPM] Fix PR18616 where the shifts to the loop pass manager to extract · d84f776e
      Chandler Carruth authored
      LCSSA from it caused a crasher with the LoopUnroll pass.
      
      This crasher is really nasty. We destroy LCSSA form in a suprising way.
      When unrolling a loop into an outer loop, we not only need to restore
      LCSSA form for the outer loop, but for all children of the outer loop.
      This is somewhat obvious in retrospect, but hey!
      
      While this seems pretty heavy-handed, it's not that bad. Fundamentally,
      we only do this when we unroll a loop, which is already a heavyweight
      operation. We're unrolling all of these hypothetical inner loops as
      well, so their size and complexity is already on the critical path. This
      is just adding another pass over them to re-canonicalize.
      
      I have a test case from PR18616 that is great for reproducing this, but
      pretty useless to check in as it relies on many 10s of nested empty
      loops that get unrolled and deleted in just the right order. =/ What's
      worse is that investigating this has exposed another source of failure
      that is likely to be even harder to test. I'll try to come up with test
      cases for these fixes, but I want to get the fixes into the tree first
      as they're causing crashes in the wild.
      
      llvm-svn: 200273
      d84f776e
    • Arnold Schwaighofer's avatar
      LoopVectorize: Support conditional stores by scalarizing · 18865db3
      Arnold Schwaighofer authored
      The vectorizer takes a loop like this and widens all instructions except for the
      store. The stores are scalarized/unrolled and hidden behind an "if" block.
      
        for (i = 0; i < 128; ++i) {
          if (a[i] < 10)
            a[i] += val;
        }
      
        for (i = 0; i < 128; i+=2) {
          v = a[i:i+1];
          v0 = (extract v, 0) + 10;
          v1 = (extract v, 1) + 10;
          if (v0 < 10)
            a[i] = v0;
          if (v1 < 10)
            a[i] = v1;
        }
      
      The vectorizer relies on subsequent optimizations to sink instructions into the
      conditional block where they are anticipated.
      
      The flag "vectorize-num-stores-pred" controls whether and how many stores to
      handle this way. Vectorization of conditional stores is disabled per default for
      now.
      
      This patch also adds a change to the heuristic when the flag
      "enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small
      loops until load/store ports are saturated. This heuristic uses TTI's
      getMaxUnrollFactor as a measure for load/store ports.
      
      I also added a second flag -enable-cond-stores-vec. It will enable vectorization
      of conditional stores. But there is no cost model for vectorization of
      conditional stores in place yet so this will not do good at the moment.
      
      rdar://15892953
      
      Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll
      -vectorize-num-stores-pred=1 (before the BFI change):
      
       Performance Regressions:
         Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower)
         Applications/siod/siod         2.18%
       Performance improvements:
         mesa                          -4.42%
         libquantum                    -4.15%
      
       With a patch that slightly changes the register heuristics (by subtracting the
       induction variable on both sides of the register pressure equation, as the
       induction variable is probably not really unrolled):
      
       Performance Regressions:
         Benchmarks/Ptrdist/yacr2/yacr2  7.73%
         Applications/siod/siod          1.97%
      
       Performance Improvements:
         libquantum                    -13.05% (we now also unroll quantum_toffoli)
         mesa                           -4.27%
      
      llvm-svn: 200270
      18865db3
    • Manman Ren's avatar
      PGO branch weight: keep halving the weights until they can fit into · f1cb16e4
      Manman Ren authored
      uint32.
      
      When folding branches to common destination, the updated branch weights
      can exceed uint32 by more than factor of 2. We should keep halving the
      weights until they can fit into uint32.
      
      llvm-svn: 200262
      f1cb16e4
  7. Jan 27, 2014
    • Chandler Carruth's avatar
      [vectorize] Initial version of respecting PGO in the vectorizer: treat · e24f3973
      Chandler Carruth authored
      cold loops as-if they were being optimized for size.
      
      Nothing fancy here. Simply test case included. The nice thing is that we
      can now incrementally build on top of this to drive other heuristics.
      All of the infrastructure work is done to get the profile information
      into this layer.
      
      The remaining work necessary to make this a fully general purpose loop
      unroller for very hot loops is to make it a fully general purpose loop
      unroller. Things I know of but am not going to have time to benchmark
      and fix in the immediate future:
      
      1) Don't disable the entire pass when the target is lacking vector
         registers. This really doesn't make any sense any more.
      2) Teach the unroller at least and the vectorizer potentially to handle
         non-if-converted loops. This is trivial for the unroller but hard for
         the vectorizer.
      3) Compute the relative hotness of the loop and thread that down to the
         various places that make cost tradeoffs (very likely only the
         unroller makes sense here, and then only when dealing with loops that
         are small enough for unrolling to not completely blow out the LSD).
      
      I'm still dubious how useful hotness information will be. So far, my
      experiments show that if we can get the correct logic for determining
      when unrolling actually helps performance, the code size impact is
      completely unimportant and we can unroll in all cases. But at least
      we'll no longer burn code size on cold code.
      
      One somewhat unrelated idea that I've had forever but not had time to
      implement: mark all functions which are only reachable via the global
      constructors rigging in the module as optsize. This would also decrease
      the impact of any more aggressive heuristics here on code size.
      
      llvm-svn: 200219
      e24f3973
    • Benjamin Kramer's avatar
      ConstantHoisting: We can't insert instructions directly in front of a PHI node. · 9e709bce
      Benjamin Kramer authored
      Insert before the terminating instruction of the dominating block instead.
      
      llvm-svn: 200218
      9e709bce
    • Chandler Carruth's avatar
      [vectorizer] Add an override for the target instruction cost and use it · edfa37ef
      Chandler Carruth authored
      to stabilize a test that really is trying to test generic behavior and
      not a specific target's behavior.
      
      llvm-svn: 200215
      edfa37ef
    • Chandler Carruth's avatar
      [vectorizer] Simplify code to use existing helpers on the Function · 2bb03ba6
      Chandler Carruth authored
      object and fewer pointless variables.
      
      Also, add a clarifying comment and a FIXME because the code which
      disables *all* vectorization if we can't use implicit floating point
      instructions just makes no sense at all.
      
      llvm-svn: 200214
      2bb03ba6
    • Chandler Carruth's avatar
      [vectorizer] Teach the loop vectorizer's unroller to only unroll by · 147c2327
      Chandler Carruth authored
      powers of two. This is essentially always the correct thing given the
      impact on alignment, scaling factors that can be used in addressing
      modes, etc. Also, fix the management of the unroll vs. small loop cost
      to more accurately model things with this world.
      
      Enhance a test case to actually exercise more of the unroll machinery if
      using synthetic constants rather than a specific target model. Before
      this change, with the added flags this test will unroll 3 times instead
      of either 2 or 4 (the two sensible answers).
      
      While I don't expect this to make a huge difference, if there are lots
      of loops sitting right on the edge of hitting the 'small unroll' factor,
      they might change behavior. However, I've benchmarked moving the small
      loop cost up and down in many various ways and by a huge factor (2x)
      without seeing more than 0.2% code size growth. Small adjustments such
      as the series that led up here have led to about 1% improvement on some
      benchmarks, but it is very close to the noise floor so I mostly checked
      that nothing regressed. Let me know if you see bad behavior on other
      targets but I don't expect this to be a sufficiently dramatic change to
      trigger anything.
      
      llvm-svn: 200213
      147c2327
    • Chandler Carruth's avatar
      [vectorizer] Add some flags which are useful for conducting experiments · 7f90b453
      Chandler Carruth authored
      with the unrolling behavior in the loop vectorizer. No functionality
      changed at this point.
      
      These are a bit hack-y, but talking with Hal, there doesn't seem to be
      a cleaner way to easily experiment with different thresholds here and he
      was also interested in them so I wanted to commit them. Suggestions for
      improvement are very welcome here.
      
      llvm-svn: 200212
      7f90b453
    • Chandler Carruth's avatar
      [vectorizer] Fix a trivial oversight where we always requested the · 328998b2
      Chandler Carruth authored
      number of vector registers rather than toggling between vector and
      scalar register number based on VF. I don't have a test case as
      I spotted this by inspection and on X86 it only makes a difference if
      your target is lacking SSE and thus has *no* vector registers.
      
      If someone wants to add a test case for this for ARM or somewhere else
      where this is more significant, that would be awesome.
      
      Also made the variable name a bit more sensible while I'm here.
      
      llvm-svn: 200211
      328998b2
    • Chandler Carruth's avatar
      [vectorizer] Clean up the handling of unvectorized loop unrolling in the · 56612b20
      Chandler Carruth authored
      LoopVectorize pass.
      
      The logic here doesn't make much sense. We *only* unrolled if the
      unvectorized loop was a reduction loop with a single basic block *and*
      small loop body. The reduction part in particular doesn't make much
      sense. Instead, if we just fall through to the vectorized unroll logic
      it makes more sense of unrolling if there is a vectorized reduction that
      could be hacked on by the SLP vectorizer *or* if the loop is small.
      
      This is mostly a cleanup and nothing in the test suite really exercises
      this, but I did run benchmarks across this change and saw no really
      significant changes.
      
      llvm-svn: 200198
      56612b20
  8. Jan 25, 2014
    • Chandler Carruth's avatar
      [LPM] Conclude my immediate work by making the LoopVectorizer · 3aebcb99
      Chandler Carruth authored
      a FunctionPass. With this change the loop vectorizer no longer is a loop
      pass and can readily depend on function analyses. In particular, with
      this change we no longer have to form a loop pass manager to run the
      loop vectorizer which simplifies the entire pass management of LLVM.
      
      The next step here is to teach the loop vectorizer to leverage profile
      information through the profile information providing analysis passes.
      
      llvm-svn: 200074
      3aebcb99
    • Chandler Carruth's avatar
      [LPM] Make LCSSA a utility with a FunctionPass that applies it to all · 8765cf70
      Chandler Carruth authored
      the loops in a function, and teach LICM to work in the presance of
      LCSSA.
      
      Previously, LCSSA was a loop pass. That made passes requiring it also be
      loop passes and unable to depend on function analysis passes easily. It
      also caused outer loops to have a different "canonical" form from inner
      loops during analysis. Instead, we go into LCSSA form and preserve it
      through the loop pass manager run.
      
      Note that this has the same problem as LoopSimplify that prevents
      enabling its verification -- loop passes which run at the end of the loop
      pass manager and don't preserve these are valid, but the subsequent loop
      pass runs of outer loops that do preserve this pass trigger too much
      verification and fail because the inner loop no longer verifies.
      
      The other problem this exposed is that LICM was completely unable to
      handle LCSSA form. It didn't preserve it and it actually would give up
      on moving instructions in many cases when they were used by an LCSSA phi
      node. I've taught LICM to support detecting LCSSA-form PHI nodes and to
      hoist and sink around them. This may actually let LICM fire
      significantly more because we put everything into LCSSA form to rotate
      the loop before running LICM. =/ Now LICM should handle that fine and
      preserve it correctly. The down side is that LICM has to require LCSSA
      in order to preserve it. This is just a fact of life for LCSSA. It's
      entirely possible we should completely remove LCSSA from the optimizer.
      
      The test updates are essentially accomodating LCSSA phi nodes in the
      output of LICM, and the fact that we now completely sink every
      instruction in ashr-crash below the loop bodies prior to unrolling.
      
      With this change, LCSSA is computed only three times in the pass
      pipeline. One of them could be removed (and potentially a SCEV run and
      a separate LoopPassManager entirely!) if we had a LoopPass variant of
      InstCombine that ran InstCombine on the loop body but refused to combine
      away LCSSA PHI nodes. Currently, this also prevents loop unrolling from
      being in the same loop pass manager is rotate, LICM, and unswitch.
      
      There is one thing that I *really* don't like -- preserving LCSSA in
      LICM is quite expensive. We end up having to re-run LCSSA twice for some
      loops after LICM runs because LICM can undo LCSSA both in the current
      loop and the parent loop. I don't really see good solutions to this
      other than to completely move away from LCSSA and using tools like
      SSAUpdater instead.
      
      llvm-svn: 200067
      8765cf70
    • Juergen Ributzka's avatar
      Revert "Revert "Add Constant Hoisting Pass" (r200034)" · f26beda7
      Juergen Ributzka authored
      This reverts commit r200058 and adds the using directive for
      ARMTargetTransformInfo to silence two g++ overload warnings.
      
      llvm-svn: 200062
      f26beda7
    • Hans Wennborg's avatar
      Revert "Add Constant Hoisting Pass" (r200034) · 4d67a2e8
      Hans Wennborg authored
      This commit caused -Woverloaded-virtual warnings. The two new
      TargetTransformInfo::getIntImmCost functions were only added to the superclass,
      and to the X86 subclass. The other targets were not updated, and the
      warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
      hiding the two new getIntImmCost variants.
      
      We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
      to the various subclasses, or turning it off, but I suspect that it's wrong to
      leave the functions unimplemnted in those targets. The default implementations
      return TCC_Free, which I don't think is right e.g. for ARM.
      
      llvm-svn: 200058
      4d67a2e8
  9. Jan 24, 2014
    • Juergen Ributzka's avatar
      Add Constant Hoisting Pass · 4f3df4ad
      Juergen Ributzka authored
      Retry commit r200022 with a fix for the build bot errors. Constant expressions
      have (unlike instructions) module scope use lists and therefore may have users
      in different functions. The fix is to simply ignore these out-of-function uses.
      
      llvm-svn: 200034
      4f3df4ad
    • Benjamin Kramer's avatar
      InstCombine: Don't try to use aggregate elements of ConstantExprs. · 09b0f88a
      Benjamin Kramer authored
      PR18600.
      
      llvm-svn: 200028
      09b0f88a
    • Juergen Ributzka's avatar
      Revert "Add Constant Hoisting Pass" · 50e7e80d
      Juergen Ributzka authored
      This reverts commit r200022 to unbreak the build bots.
      
      llvm-svn: 200024
      50e7e80d
    • Juergen Ributzka's avatar
      Add Constant Hoisting Pass · 38b67d0c
      Juergen Ributzka authored
      This pass identifies expensive constants to hoist and coalesces them to
      better prepare it for SelectionDAG-based code generation. This works around the
      limitations of the basic-block-at-a-time approach.
      
      First it scans all instructions for integer constants and calculates its
      cost. If the constant can be folded into the instruction (the cost is
      TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
      consider it expensive and leave it alone. This is the default behavior and
      the default implementation of getIntImmCost will always return TCC_Free.
      
      If the cost is more than TCC_BASIC, then the integer constant can't be folded
      into the instruction and it might be beneficial to hoist the constant.
      Similar constants are coalesced to reduce register pressure and
      materialization code.
      
      When a constant is hoisted, it is also hidden behind a bitcast to force it to
      be live-out of the basic block. Otherwise the constant would be just
      duplicated and each basic block would have its own copy in the SelectionDAG.
      The SelectionDAG recognizes such constants as opaque and doesn't perform
      certain transformations on them, which would create a new expensive constant.
      
      This optimization is only applied to integer constants in instructions and
      simple (this means not nested) constant cast experessions. For example:
      %0 = load i64* inttoptr (i64 big_constant to i64*)
      
      Reviewed by Eric
      
      llvm-svn: 200022
      38b67d0c
    • Alp Toker's avatar
      Fix known typos · cb402911
      Alp Toker authored
      Sweep the codebase for common typos. Includes some changes to visible function
      names that were misspelt.
      
      llvm-svn: 200018
      cb402911
    • Chandler Carruth's avatar
      [LPM] Fix a logic error in LICM spotted by inspection. · cc497b6a
      Chandler Carruth authored
      We completely skipped promotion in LICM if the loop has a preheader or
      dedicated exits, but not *both*. We hoist if there is a preheader, and
      sink if there are dedicated exits, but either hoisting or sinking can
      move loop invariant code out of the loop!
      
      I have no idea if this has a practical consequence. If anyone has ideas
      for a test case, let me know.
      
      llvm-svn: 199966
      cc497b6a
    • Chandler Carruth's avatar
      [cleanup] Use the type-based preservation method rather than a string · abfa3e56
      Chandler Carruth authored
      literal that bakes a pass name and forces parsing it in the pass
      manager.
      
      llvm-svn: 199963
      abfa3e56
  10. Jan 23, 2014
    • Rafael Espindola's avatar
      Remove tail marker when changing an argument to an alloca. · 2a05ea5c
      Rafael Espindola authored
      Argument promotion can replace an argument of a call with an alloca. This
      requires clearing the tail marker as it is very likely that the callee is now
      using an alloca in the caller.
      
      This fixes pr14710.
      
      llvm-svn: 199909
      2a05ea5c
    • Chandler Carruth's avatar
      [LPM] Make LoopSimplify no longer a LoopPass and instead both a utility · aa7fa5e4
      Chandler Carruth authored
      function and a FunctionPass.
      
      This has many benefits. The motivating use case was to be able to
      compute function analysis passes *after* running LoopSimplify (to avoid
      invalidating them) and then to run other passes which require
      LoopSimplify. Specifically passes like unrolling and vectorization are
      critical to wire up to BranchProbabilityInfo and BlockFrequencyInfo so
      that they can be profile aware. For the LoopVectorize pass the only
      things in the way are LoopSimplify and LCSSA. This fixes LoopSimplify
      and LCSSA is next on my list.
      
      There are also a bunch of other benefits of doing this:
      - It is now very feasible to make more passes *preserve* LoopSimplify
        because they can simply run it after changing a loop. Because
        subsequence passes can assume LoopSimplify is preserved we can reduce
        the runs of this pass to the times when we actually mutate a loop
        structure.
      - The new pass manager should be able to more easily support loop passes
        factored in this way.
      - We can at long, long last observe that LoopSimplify is preserved
        across SCEV. This *halves* the number of times we run LoopSimplify!!!
      
      Now, getting here wasn't trivial. First off, the interfaces used by
      LoopSimplify are all over the map regarding how analysis are updated. We
      end up with weird "pass" parameters as a consequence. I'll try to clean
      at least some of this up later -- I'll have to have it all clean for the
      new pass manager.
      
      Next up I discovered a really frustrating bug. LoopUnroll *claims* to
      preserve LoopSimplify. That's actually a lie. But the way the
      LoopPassManager ends up running the passes, it always ran LoopSimplify
      on the unrolled-into loop, rectifying this oversight before any
      verification could kick in and point out that in fact nothing was
      preserved. So I've added code to the unroller to *actually* simplify the
      surrounding loop when it succeeds at unrolling.
      
      The only functional change in the test suite is that we now catch a case
      that was previously missed because SCEV and other loop transforms see
      their containing loops as simplified and thus don't miss some
      opportunities. One test case has been converted to check that we catch
      this case rather than checking that we miss it but at least don't get
      the wrong answer.
      
      Note that I have #if-ed out all of the verification logic in
      LoopSimplify! This is a temporary workaround while extracting these bits
      from the LoopPassManager. Currently, there is no way to have a pass in
      the LoopPassManager which preserves LoopSimplify along with one which
      does not. The LPM will try to verify on each loop in the nest that
      LoopSimplify holds but the now-Function-pass cannot distinguish what
      loop is being verified and so must try to verify all of them. The inner
      most loop is clearly no longer simplified as there is a pass which
      didn't even *attempt* to preserve it. =/ Once I get LCSSA out (and maybe
      LoopVectorize and some other fixes) I'll be able to re-enable this check
      and catch any places where we are still failing to preserve
      LoopSimplify. If this causes problems I can back this out and try to
      commit *all* of this at once, but so far this seems to work and allow
      much more incremental progress.
      
      llvm-svn: 199884
      aa7fa5e4
  11. Jan 22, 2014
Loading