Skip to content
  1. Feb 06, 2014
  2. Feb 04, 2014
  3. Feb 01, 2014
    • Chandler Carruth's avatar
      [LPM] Apply a really big hammer to fix PR18688 by recursively reforming · 1665152c
      Chandler Carruth authored
      LCSSA when we promote to SSA registers inside of LICM.
      
      Currently, this is actually necessary. The promotion logic in LICM uses
      SSAUpdater which doesn't understand how to place LCSSA PHI nodes.
      Teaching it to do so would be a very significant undertaking. It may be
      worthwhile and I've left a FIXME about this in the code as well as
      starting a thread on llvmdev to try to figure out the right long-term
      solution.
      
      For now, the PR needs to be fixed. Short of using the promition
      SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes,
      I don't see a cleaner or cheaper way of achieving this. Fortunately,
      LCSSA is relatively lazy and sparse -- it should only update
      instructions which need it. We can also skip the recursive variant when
      we don't promote to SSA values.
      
      llvm-svn: 200612
      1665152c
  4. Jan 29, 2014
    • Chandler Carruth's avatar
      [LPM] Fix PR18643, another scary place where loop transforms failed to · d4be9dc0
      Chandler Carruth authored
      preserve loop simplify of enclosing loops.
      
      The problem here starts with LoopRotation which ends up cloning code out
      of the latch into the new preheader it is buidling. This can create
      a new edge from the preheader into the exit block of the loop which
      breaks LoopSimplify form. The code tries to fix this by splitting the
      critical edge between the latch and the exit block to get a new exit
      block that only the latch dominates. This sadly isn't sufficient.
      
      The exit block may be an exit block for multiple nested loops. When we
      clone an edge from the latch of the inner loop to the new preheader
      being built in the outer loop, we create an exiting edge from the outer
      loop to this exit block. Despite breaking the LoopSimplify form for the
      inner loop, this is fine for the outer loop. However, when we split the
      edge from the inner loop to the exit block, we create a new block which
      is in neither the inner nor outer loop as the new exit block. This is
      a predecessor to the old exit block, and so the split itself takes the
      outer loop out of LoopSimplify form. We need to split every edge
      entering the exit block from inside a loop nested more deeply than the
      exit block in order to preserve all of the loop simplify constraints.
      
      Once we try to do that, a problem with splitting critical edges
      surfaces. Previously, we tried a very brute force to update LoopSimplify
      form by re-computing it for all exit blocks. We don't need to do this,
      and doing this much will sometimes but not always overlap with the
      LoopRotate bug fix. Instead, the code needs to specifically handle the
      cases which can start to violate LoopSimplify -- they aren't that
      common. We need to see if the destination of the split edge was a loop
      exit block in simplified form for the loop of the source of the edge.
      For this to be true, all the predecessors need to be in the exact same
      loop as the source of the edge being split. If the dest block was
      originally in this form, we have to split all of the deges back into
      this loop to recover it. The old mechanism of doing this was
      conservatively correct because at least *one* of the exiting blocks it
      rewrote was the DestBB and so the DestBB's predecessors were fixed. But
      this is a much more targeted way of doing it. Making it targeted is
      important, because ballooning the set of edges touched prevents
      LoopRotate from being able to split edges *it* needs to split to
      preserve loop simplify in a coherent way -- the critical edge splitting
      would sometimes find the other edges in need of splitting but not
      others.
      
      Many, *many* thanks for help from Nick reducing these test cases
      mightily. And helping lots with the analysis here as this one was quite
      tricky to track down.
      
      llvm-svn: 200393
      d4be9dc0
    • Chandler Carruth's avatar
      [LPM] Fix PR18642, a pretty nasty bug in IndVars that "never mattered" · 66f0b163
      Chandler Carruth authored
      because of the inside-out run of LoopSimplify in the LoopPassManager and
      the fact that LoopSimplify couldn't be "preserved" across two
      independent LoopPassManagers.
      
      Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI
      node because it thought it was rewriting (via SCEV) the incoming value
      to a loop invariant value. While it may well be invariant for the
      current loop, it may be rewritten in terms of an enclosing loop's
      values. This in and of itself is fine, as the LCSSA PHI node in the
      enclosing loop for the inner loop value we're rewriting will have its
      own LCSSA PHI node if used outside of the enclosing loop. With me so
      far?
      
      Well, the current loop and the enclosing loop may share an exiting
      block and exit block, and when they do they also share LCSSA PHI nodes.
      In this case, its not valid to RAUW through the LCSSA PHI node.
      
      Expected crazy test included.
      
      llvm-svn: 200372
      66f0b163
  5. Jan 28, 2014
  6. Jan 27, 2014
  7. Jan 25, 2014
    • Chandler Carruth's avatar
      [LPM] Make LCSSA a utility with a FunctionPass that applies it to all · 8765cf70
      Chandler Carruth authored
      the loops in a function, and teach LICM to work in the presance of
      LCSSA.
      
      Previously, LCSSA was a loop pass. That made passes requiring it also be
      loop passes and unable to depend on function analysis passes easily. It
      also caused outer loops to have a different "canonical" form from inner
      loops during analysis. Instead, we go into LCSSA form and preserve it
      through the loop pass manager run.
      
      Note that this has the same problem as LoopSimplify that prevents
      enabling its verification -- loop passes which run at the end of the loop
      pass manager and don't preserve these are valid, but the subsequent loop
      pass runs of outer loops that do preserve this pass trigger too much
      verification and fail because the inner loop no longer verifies.
      
      The other problem this exposed is that LICM was completely unable to
      handle LCSSA form. It didn't preserve it and it actually would give up
      on moving instructions in many cases when they were used by an LCSSA phi
      node. I've taught LICM to support detecting LCSSA-form PHI nodes and to
      hoist and sink around them. This may actually let LICM fire
      significantly more because we put everything into LCSSA form to rotate
      the loop before running LICM. =/ Now LICM should handle that fine and
      preserve it correctly. The down side is that LICM has to require LCSSA
      in order to preserve it. This is just a fact of life for LCSSA. It's
      entirely possible we should completely remove LCSSA from the optimizer.
      
      The test updates are essentially accomodating LCSSA phi nodes in the
      output of LICM, and the fact that we now completely sink every
      instruction in ashr-crash below the loop bodies prior to unrolling.
      
      With this change, LCSSA is computed only three times in the pass
      pipeline. One of them could be removed (and potentially a SCEV run and
      a separate LoopPassManager entirely!) if we had a LoopPass variant of
      InstCombine that ran InstCombine on the loop body but refused to combine
      away LCSSA PHI nodes. Currently, this also prevents loop unrolling from
      being in the same loop pass manager is rotate, LICM, and unswitch.
      
      There is one thing that I *really* don't like -- preserving LCSSA in
      LICM is quite expensive. We end up having to re-run LCSSA twice for some
      loops after LICM runs because LICM can undo LCSSA both in the current
      loop and the parent loop. I don't really see good solutions to this
      other than to completely move away from LCSSA and using tools like
      SSAUpdater instead.
      
      llvm-svn: 200067
      8765cf70
    • Juergen Ributzka's avatar
      Revert "Revert "Add Constant Hoisting Pass" (r200034)" · f26beda7
      Juergen Ributzka authored
      This reverts commit r200058 and adds the using directive for
      ARMTargetTransformInfo to silence two g++ overload warnings.
      
      llvm-svn: 200062
      f26beda7
    • Hans Wennborg's avatar
      Revert "Add Constant Hoisting Pass" (r200034) · 4d67a2e8
      Hans Wennborg authored
      This commit caused -Woverloaded-virtual warnings. The two new
      TargetTransformInfo::getIntImmCost functions were only added to the superclass,
      and to the X86 subclass. The other targets were not updated, and the
      warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
      hiding the two new getIntImmCost variants.
      
      We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
      to the various subclasses, or turning it off, but I suspect that it's wrong to
      leave the functions unimplemnted in those targets. The default implementations
      return TCC_Free, which I don't think is right e.g. for ARM.
      
      llvm-svn: 200058
      4d67a2e8
  8. Jan 24, 2014
    • Juergen Ributzka's avatar
      Add Constant Hoisting Pass · 4f3df4ad
      Juergen Ributzka authored
      Retry commit r200022 with a fix for the build bot errors. Constant expressions
      have (unlike instructions) module scope use lists and therefore may have users
      in different functions. The fix is to simply ignore these out-of-function uses.
      
      llvm-svn: 200034
      4f3df4ad
    • Juergen Ributzka's avatar
      Revert "Add Constant Hoisting Pass" · 50e7e80d
      Juergen Ributzka authored
      This reverts commit r200022 to unbreak the build bots.
      
      llvm-svn: 200024
      50e7e80d
    • Juergen Ributzka's avatar
      Add Constant Hoisting Pass · 38b67d0c
      Juergen Ributzka authored
      This pass identifies expensive constants to hoist and coalesces them to
      better prepare it for SelectionDAG-based code generation. This works around the
      limitations of the basic-block-at-a-time approach.
      
      First it scans all instructions for integer constants and calculates its
      cost. If the constant can be folded into the instruction (the cost is
      TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
      consider it expensive and leave it alone. This is the default behavior and
      the default implementation of getIntImmCost will always return TCC_Free.
      
      If the cost is more than TCC_BASIC, then the integer constant can't be folded
      into the instruction and it might be beneficial to hoist the constant.
      Similar constants are coalesced to reduce register pressure and
      materialization code.
      
      When a constant is hoisted, it is also hidden behind a bitcast to force it to
      be live-out of the basic block. Otherwise the constant would be just
      duplicated and each basic block would have its own copy in the SelectionDAG.
      The SelectionDAG recognizes such constants as opaque and doesn't perform
      certain transformations on them, which would create a new expensive constant.
      
      This optimization is only applied to integer constants in instructions and
      simple (this means not nested) constant cast experessions. For example:
      %0 = load i64* inttoptr (i64 big_constant to i64*)
      
      Reviewed by Eric
      
      llvm-svn: 200022
      38b67d0c
    • Alp Toker's avatar
      Fix known typos · cb402911
      Alp Toker authored
      Sweep the codebase for common typos. Includes some changes to visible function
      names that were misspelt.
      
      llvm-svn: 200018
      cb402911
    • Chandler Carruth's avatar
      [LPM] Fix a logic error in LICM spotted by inspection. · cc497b6a
      Chandler Carruth authored
      We completely skipped promotion in LICM if the loop has a preheader or
      dedicated exits, but not *both*. We hoist if there is a preheader, and
      sink if there are dedicated exits, but either hoisting or sinking can
      move loop invariant code out of the loop!
      
      I have no idea if this has a practical consequence. If anyone has ideas
      for a test case, let me know.
      
      llvm-svn: 199966
      cc497b6a
    • Chandler Carruth's avatar
      [cleanup] Use the type-based preservation method rather than a string · abfa3e56
      Chandler Carruth authored
      literal that bakes a pass name and forces parsing it in the pass
      manager.
      
      llvm-svn: 199963
      abfa3e56
  9. Jan 23, 2014
    • Chandler Carruth's avatar
      [LPM] Make LoopSimplify no longer a LoopPass and instead both a utility · aa7fa5e4
      Chandler Carruth authored
      function and a FunctionPass.
      
      This has many benefits. The motivating use case was to be able to
      compute function analysis passes *after* running LoopSimplify (to avoid
      invalidating them) and then to run other passes which require
      LoopSimplify. Specifically passes like unrolling and vectorization are
      critical to wire up to BranchProbabilityInfo and BlockFrequencyInfo so
      that they can be profile aware. For the LoopVectorize pass the only
      things in the way are LoopSimplify and LCSSA. This fixes LoopSimplify
      and LCSSA is next on my list.
      
      There are also a bunch of other benefits of doing this:
      - It is now very feasible to make more passes *preserve* LoopSimplify
        because they can simply run it after changing a loop. Because
        subsequence passes can assume LoopSimplify is preserved we can reduce
        the runs of this pass to the times when we actually mutate a loop
        structure.
      - The new pass manager should be able to more easily support loop passes
        factored in this way.
      - We can at long, long last observe that LoopSimplify is preserved
        across SCEV. This *halves* the number of times we run LoopSimplify!!!
      
      Now, getting here wasn't trivial. First off, the interfaces used by
      LoopSimplify are all over the map regarding how analysis are updated. We
      end up with weird "pass" parameters as a consequence. I'll try to clean
      at least some of this up later -- I'll have to have it all clean for the
      new pass manager.
      
      Next up I discovered a really frustrating bug. LoopUnroll *claims* to
      preserve LoopSimplify. That's actually a lie. But the way the
      LoopPassManager ends up running the passes, it always ran LoopSimplify
      on the unrolled-into loop, rectifying this oversight before any
      verification could kick in and point out that in fact nothing was
      preserved. So I've added code to the unroller to *actually* simplify the
      surrounding loop when it succeeds at unrolling.
      
      The only functional change in the test suite is that we now catch a case
      that was previously missed because SCEV and other loop transforms see
      their containing loops as simplified and thus don't miss some
      opportunities. One test case has been converted to check that we catch
      this case rather than checking that we miss it but at least don't get
      the wrong answer.
      
      Note that I have #if-ed out all of the verification logic in
      LoopSimplify! This is a temporary workaround while extracting these bits
      from the LoopPassManager. Currently, there is no way to have a pass in
      the LoopPassManager which preserves LoopSimplify along with one which
      does not. The LPM will try to verify on each loop in the nest that
      LoopSimplify holds but the now-Function-pass cannot distinguish what
      loop is being verified and so must try to verify all of them. The inner
      most loop is clearly no longer simplified as there is a pass which
      didn't even *attempt* to preserve it. =/ Once I get LCSSA out (and maybe
      LoopVectorize and some other fixes) I'll be able to re-enable this check
      and catch any places where we are still failing to preserve
      LoopSimplify. If this causes problems I can back this out and try to
      commit *all* of this at once, but so far this seems to work and allow
      much more incremental progress.
      
      llvm-svn: 199884
      aa7fa5e4
  10. Jan 22, 2014
  11. Jan 19, 2014
    • Chandler Carruth's avatar
      Fix a really nasty SROA bug with how we handled out-of-bounds memcpy · 1bf38c6a
      Chandler Carruth authored
      intrinsics.
      
      Reported on the list by Evan with a couple of attempts to fix, but it
      took a while to dig down to the root cause. There are two overlapping
      bugs here, both centering around the circumstance of discovering
      a memcpy operand which is known to be completely outside the bounds of
      the alloca.
      
      First, we need to kill the *other* side of the memcpy if it was added to
      this alloca. Otherwise we'll factor it into our slicing and try to
      rewrite it even though we know for a fact that it is dead. This is made
      more tricky because we can visit the sides in either order. So we have
      to both kill the other side and skip instructions marked as dead. The
      latter really should be goodness in every case, but here is a matter of
      correctness.
      
      Second, we need to actually remove the *uses* of the alloca by the
      memcpy when queuing it for later deletion. Otherwise it may still be
      using the alloca when we go to promote it (if the rewrite re-uses the
      existing alloca instruction). Do this by factoring out the
      use-clobbering used when for nixing a Phi argument and re-using it
      across the operands of a to-be-deleted instruction.
      
      llvm-svn: 199590
      1bf38c6a
  12. Jan 16, 2014
    • Quentin Colombet's avatar
      [opt][PassInfo] Allow opt to run passes that need target machine. · dc0b2ea2
      Quentin Colombet authored
      When registering a pass, a pass can now specify a second construct that takes as
      argument a pointer to TargetMachine.
      The PassInfo class has been updated to reflect that possibility.
      If such a constructor exists opt will use it instead of the default constructor
      when instantiating the pass.
      
      Since such IR passes are supposed to be rare, no specific support has been
      added to this commit to allow an easy registration of such a pass.
      In other words, for such pass, the initialization function has to be
      hand-written (see CodeGenPrepare for instance).
      
      Now, codegenprepare can be tested using opt:
      opt -codegenprepare -mtriple=mytriple input.ll
      
      llvm-svn: 199430
      dc0b2ea2
  13. Jan 13, 2014
    • Chandler Carruth's avatar
      [PM] Split DominatorTree into a concrete analysis result object which · 73523021
      Chandler Carruth authored
      can be used by both the new pass manager and the old.
      
      This removes it from any of the virtual mess of the pass interfaces and
      lets it derive cleanly from the DominatorTreeBase<> template. In turn,
      tons of boilerplate interface can be nuked and it turns into a very
      straightforward extension of the base DominatorTree interface.
      
      The old analysis pass is now a simple wrapper. The names and style of
      this split should match the split between CallGraph and
      CallGraphWrapperPass. All of the users of DominatorTree have been
      updated to match using many of the same tricks as with CallGraph. The
      goal is that the common type remains the resulting DominatorTree rather
      than the pass. This will make subsequent work toward the new pass
      manager significantly easier.
      
      Also in numerous places things became cleaner because I switched from
      re-running the pass (!!! mid way through some other passes run!!!) to
      directly recomputing the domtree.
      
      llvm-svn: 199104
      73523021
    • Chandler Carruth's avatar
      [PM] Pull the generic graph algorithms and data structures for dominator · e509db41
      Chandler Carruth authored
      trees into the Support library.
      
      These are all expressed in terms of the generic GraphTraits and CFG,
      with no reliance on any concrete IR types. Putting them in support
      clarifies that and makes the fact that the static analyzer in Clang uses
      them much more sane. When moving the Dominators.h file into the IR
      library I claimed that this was the right home for it but not something
      I planned to work on. Oops.
      
      So why am I doing this? It happens to be one step toward breaking the
      requirement that IR verification can only be performed from inside of
      a pass context, which completely blocks the implementation of
      verification for the new pass manager infrastructure. Fixing it will
      also allow removing the concept of the "preverify" step (WTF???) and
      allow the verifier to cleanly flag functions which fail verification in
      a way that precludes even computing dominance information. Currently,
      that results in a fatal error even when you ask the verifier to not
      fatally error. It's awesome like that.
      
      The yak shaving will continue...
      
      llvm-svn: 199095
      e509db41
    • Chandler Carruth's avatar
      [cleanup] Move the Dominators.h and Verifier.h headers into the IR · 5ad5f15c
      Chandler Carruth authored
      directory. These passes are already defined in the IR library, and it
      doesn't make any sense to have the headers in Analysis.
      
      Long term, I think there is going to be a much better way to divide
      these matters. The dominators code should be fully separated into the
      abstract graph algorithm and have that put in Support where it becomes
      obvious that evn Clang's CFGBlock's can use it. Then the verifier can
      manually construct dominance information from the Support-driven
      interface while the Analysis library can provide a pass which both
      caches, reconstructs, and supports a nice update API.
      
      But those are very long term, and so I don't want to leave the really
      confusing structure until that day arrives.
      
      llvm-svn: 199082
      5ad5f15c
    • Chandler Carruth's avatar
      Re-sort #include lines again, prior to moving headers around. · 07baed53
      Chandler Carruth authored
      llvm-svn: 199080
      07baed53
  14. Jan 11, 2014
    • Diego Novillo's avatar
      Extend and simplify the sample profile input file. · 9518b63b
      Diego Novillo authored
      1- Use the line_iterator class to read profile files.
      
      2- Allow comments in profile file. Lines starting with '#'
         are completely ignored while reading the profile.
      
      3- Add parsing support for discriminators and indirect call samples.
      
         Our external profiler can emit more profile information that we are
         currently not handling. This patch does not add new functionality to
         support this information, but it allows profile files to provide it.
      
         I will add actual support later on (for at least one of these
         features, I need support for DWARF discriminators in Clang).
      
         A sample line may contain the following additional information:
      
         Discriminator. This is used if the sampled program was compiled with
         DWARF discriminator support
         (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This
         is currently only emitted by GCC and we just ignore it.
      
         Potential call targets and samples. If present, this line contains a
         call instruction. This models both direct and indirect calls. Each
         called target is listed together with the number of samples. For
         example,
      
                          130: 7  foo:3  bar:2  baz:7
      
         The above means that at relative line offset 130 there is a call
         instruction that calls one of foo(), bar() and baz(). With baz()
         being the relatively more frequent call target.
      
         Differential Revision: http://llvm-reviews.chandlerc.com/D2355
      
      4- Simplify format of profile input file.
      
         This implements earlier suggestions to simplify the format of the
         sample profile file. The symbol table is not necessary and function
         profiles do not need to know the number of samples in advance.
      
         Differential Revision: http://llvm-reviews.chandlerc.com/D2419
      
      llvm-svn: 198973
      9518b63b
    • Diego Novillo's avatar
      Propagation of profile samples through the CFG. · 0accb3d2
      Diego Novillo authored
      This adds a propagation heuristic to convert instruction samples
      into branch weights. It implements a similar heuristic to the one
      implemented by Dehao Chen on GCC.
      
      The propagation proceeds in 3 phases:
      
      1- Assignment of block weights. All the basic blocks in the function
         are initial assigned the same weight as their most frequently
         executed instruction.
      
      2- Creation of equivalence classes. Since samples may be missing from
         blocks, we can fill in the gaps by setting the weights of all the
         blocks in the same equivalence class to the same weight. To compute
         the concept of equivalence, we use dominance and loop information.
         Two blocks B1 and B2 are in the same equivalence class if B1
         dominates B2, B2 post-dominates B1 and both are in the same loop.
      
      3- Propagation of block weights into edges. This uses a simple
         propagation heuristic. The following rules are applied to every
         block B in the CFG:
      
         - If B has a single predecessor/successor, then the weight
           of that edge is the weight of the block.
      
         - If all the edges are known except one, and the weight of the
           block is already known, the weight of the unknown edge will
           be the weight of the block minus the sum of all the known
           edges. If the sum of all the known edges is larger than B's weight,
           we set the unknown edge weight to zero.
      
         - If there is a self-referential edge, and the weight of the block is
           known, the weight for that edge is set to the weight of the block
           minus the weight of the other incoming edges to that block (if
           known).
      
      Since this propagation is not guaranteed to finalize for every CFG, we
      only allow it to proceed for a limited number of iterations (controlled
      by -sample-profile-max-propagate-iterations). It currently uses the same
      GCC default of 100.
      
      Before propagation starts, the pass builds (for each block) a list of
      unique predecessors and successors. This is necessary to handle
      identical edges in multiway branches. Since we visit all blocks and all
      edges of the CFG, it is cleaner to build these lists once at the start
      of the pass.
      
      Finally, the patch fixes the computation of relative line locations.
      The profiler emits lines relative to the function header. To discover
      it, we traverse the compilation unit looking for the subprogram
      corresponding to the function. The line number of that subprogram is the
      line where the function begins. That becomes line zero for all the
      relative locations.
      
      llvm-svn: 198972
      0accb3d2
  15. Jan 09, 2014
    • Chandler Carruth's avatar
      Put the functionality for printing a value to a raw_ostream as an · d48cdbf0
      Chandler Carruth authored
      operand into the Value interface just like the core print method is.
      That gives a more conistent organization to the IR printing interfaces
      -- they are all attached to the IR objects themselves. Also, update all
      the users.
      
      This removes the 'Writer.h' header which contained only a single function
      declaration.
      
      llvm-svn: 198836
      d48cdbf0
  16. Jan 07, 2014
    • Chandler Carruth's avatar
      Move the LLVM IR asm writer header files into the IR directory, as they · 9aca918d
      Chandler Carruth authored
      are part of the core IR library in order to support dumping and other
      basic functionality.
      
      Rename the 'Assembly' include directory to 'AsmParser' to match the
      library name and the only functionality left their -- printing has been
      in the core IR library for quite some time.
      
      Update all of the #includes to match.
      
      All of this started because I wanted to have the layering in good shape
      before I started adding support for printing LLVM IR using the new pass
      infrastructure, and commandline support for the new pass infrastructure.
      
      llvm-svn: 198688
      9aca918d
    • Chandler Carruth's avatar
      Re-sort all of the includes with ./utils/sort_includes.py so that · 8a8cd2ba
      Chandler Carruth authored
      subsequent changes are easier to review. About to fix some layering
      issues, and wanted to separate out the necessary churn.
      
      Also comment and sink the include of "Windows.h" in three .inc files to
      match the usage in Memory.inc.
      
      llvm-svn: 198685
      8a8cd2ba
    • Andrew Trick's avatar
      Reapply r198654 "indvars: sink truncates outside the loop." · e4a18605
      Andrew Trick authored
      This doesn't seem to have actually broken anything. It was paranoia
      on my part. Trying again now that bots are more stable.
      
      This is a follow up of the r198338 commit that added truncates for
      lcssa phi nodes. Sinking the truncates below the phis cleans up the
      loop and simplifies subsequent analysis within the indvars pass.
      
      llvm-svn: 198678
      e4a18605
    • Andrew Trick's avatar
      Revert "indvars: sink truncates outside the loop." · 3c0ed089
      Andrew Trick authored
      This reverts commit r198654.
      
      One of the bots reported a SciMark failure.
      
      llvm-svn: 198659
      3c0ed089
    • Andrew Trick's avatar
      indvars: sink truncates outside the loop. · 0b8e3b2c
      Andrew Trick authored
      This is a follow up of the r198338 commit that added truncates for
      lcssa phi nodes. Sinking the truncates below the phis cleans up the
      loop and simplifies subsequent analysis within the indvars pass.
      
      llvm-svn: 198654
      0b8e3b2c
    • Andrew Trick's avatar
      80 col. comment. · b70d9780
      Andrew Trick authored
      llvm-svn: 198653
      b70d9780
  17. Jan 04, 2014
    • Alp Toker's avatar
      Add missed cleanup from r198456 · f929e09b
      Alp Toker authored
      All other uses of this macro in LLVM/clang have been moved to the function
      definition so follow suite (and the usage advice) here too for consistency.
      
      llvm-svn: 198516
      f929e09b
  18. Jan 03, 2014
    • Nico Weber's avatar
      Add a LLVM_DUMP_METHOD macro. · 7408c706
      Nico Weber authored
      The motivation is to mark dump methods as used in debug builds so that they can
      be called from lldb, but to not do so in release builds so that they can be
      dead-stripped.
      
      There's lots of potential follow-up work suggested in the thread
      "Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev,
      but everyone seems to agreen on this subset.
      
      Macro name chosen by fair coin toss.
      
      llvm-svn: 198456
      7408c706
    • David Peixotto's avatar
      Fix loop rerolling pass failure with non-consant loop lower bound · ea9ba446
      David Peixotto authored
      The loop rerolling pass was failing with an assertion failure from a
      failed cast on loops like this:
      
        void foo(int *A, int *B, int m, int n) {
          for (int i = m; i < n; i+=4) {
            A[i+0] = B[i+0] * 4;
            A[i+1] = B[i+1] * 4;
            A[i+2] = B[i+2] * 4;
            A[i+3] = B[i+3] * 4;
          }
        }
      
      The code was casting the SCEV-expanded code for the new
      induction variable to a phi-node. When the loop had a non-constant
      lower bound, the SCEV expander would end the code expansion with an
      add insted of a phi node and the cast would fail.
      
      It looks like the cast to a phi node was only needed to get the
      induction variable value coming from the backedge to compute the end
      of loop condition. This patch changes the loop reroller to compare
      the induction variable to the number of times the backedge is taken
      instead of the iteration count of the loop. In other words, we stop
      the loop when the current value of the induction variable ==
      IterationCount-1. Previously, the comparison was comparing the
      induction variable value from the next iteration == IterationCount.
      
      This problem only seems to occur on 32-bit targets. For some reason,
      the loop is not rerolled on 64-bit targets.
      
      PR18290
      
      llvm-svn: 198425
      ea9ba446
  19. Jan 02, 2014
    • Hal Finkel's avatar
      Disable compare sinking in CodeGenPrepare when multiple condition registers are available · decb024c
      Hal Finkel authored
      As noted in the comment above CodeGenPrepare::OptimizeInst, which aggressively
      sinks compares to reduce pressure on the condition register(s), for targets
      such as PowerPC with multiple condition registers, this may not be the right
      thing to do. This adds an HasMultipleConditionRegisters boolean to TLI, and
      CodeGenPrepare::OptimizeInst is skipped when HasMultipleConditionRegisters is
      true.
      
      This functionality will be used by the PowerPC backend in an upcoming commit.
      Especially when the PowerPC backend starts tracking individual condition
      register bits as separate allocatable entities (which will happen in this
      upcoming commit), this sinking from CodeGenPrepare::OptimizeInst is
      significantly suboptimial.
      
      llvm-svn: 198354
      decb024c
Loading