Skip to content
  1. Jan 13, 2014
  2. Jan 12, 2014
    • Hans Wennborg's avatar
      Switch-to-lookup tables: Don't require a result for the default · ac114a3c
      Hans Wennborg authored
      case when the lookup table doesn't have any holes.
      
      This means we can build a lookup table for switches like this:
      
        switch (x) {
          case 0: return 1;
          case 1: return 2;
          case 2: return 3;
          case 3: return 4;
          default: exit(1);
        }
      
      The default case doesn't yield a constant result here, but that doesn't matter,
      since a default result is only necessary for filling holes in the lookup table,
      and this table doesn't have any holes.
      
      This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB
      off the resulting clang binary.
      
      llvm-svn: 199025
      ac114a3c
  3. Jan 11, 2014
    • Arnold Schwaighofer's avatar
      LoopVectorizer: Enable strided memory accesses versioning per default · 66c742ae
      Arnold Schwaighofer authored
      I saw no compile or execution time regressions on x86_64 -mavx -O3.
      
      radar://13075509
      
      llvm-svn: 199015
      66c742ae
    • NAKAMURA Takumi's avatar
      LoopVectorize.cpp: Appease MSC16. · 41c409ce
      NAKAMURA Takumi authored
      Excuse me, I hope msc16 builders would be fine till its end day.
      Introduce nullptr then. ;)
      
      llvm-svn: 199001
      41c409ce
    • Diego Novillo's avatar
      Extend and simplify the sample profile input file. · 9518b63b
      Diego Novillo authored
      1- Use the line_iterator class to read profile files.
      
      2- Allow comments in profile file. Lines starting with '#'
         are completely ignored while reading the profile.
      
      3- Add parsing support for discriminators and indirect call samples.
      
         Our external profiler can emit more profile information that we are
         currently not handling. This patch does not add new functionality to
         support this information, but it allows profile files to provide it.
      
         I will add actual support later on (for at least one of these
         features, I need support for DWARF discriminators in Clang).
      
         A sample line may contain the following additional information:
      
         Discriminator. This is used if the sampled program was compiled with
         DWARF discriminator support
         (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This
         is currently only emitted by GCC and we just ignore it.
      
         Potential call targets and samples. If present, this line contains a
         call instruction. This models both direct and indirect calls. Each
         called target is listed together with the number of samples. For
         example,
      
                          130: 7  foo:3  bar:2  baz:7
      
         The above means that at relative line offset 130 there is a call
         instruction that calls one of foo(), bar() and baz(). With baz()
         being the relatively more frequent call target.
      
         Differential Revision: http://llvm-reviews.chandlerc.com/D2355
      
      4- Simplify format of profile input file.
      
         This implements earlier suggestions to simplify the format of the
         sample profile file. The symbol table is not necessary and function
         profiles do not need to know the number of samples in advance.
      
         Differential Revision: http://llvm-reviews.chandlerc.com/D2419
      
      llvm-svn: 198973
      9518b63b
    • Diego Novillo's avatar
      Propagation of profile samples through the CFG. · 0accb3d2
      Diego Novillo authored
      This adds a propagation heuristic to convert instruction samples
      into branch weights. It implements a similar heuristic to the one
      implemented by Dehao Chen on GCC.
      
      The propagation proceeds in 3 phases:
      
      1- Assignment of block weights. All the basic blocks in the function
         are initial assigned the same weight as their most frequently
         executed instruction.
      
      2- Creation of equivalence classes. Since samples may be missing from
         blocks, we can fill in the gaps by setting the weights of all the
         blocks in the same equivalence class to the same weight. To compute
         the concept of equivalence, we use dominance and loop information.
         Two blocks B1 and B2 are in the same equivalence class if B1
         dominates B2, B2 post-dominates B1 and both are in the same loop.
      
      3- Propagation of block weights into edges. This uses a simple
         propagation heuristic. The following rules are applied to every
         block B in the CFG:
      
         - If B has a single predecessor/successor, then the weight
           of that edge is the weight of the block.
      
         - If all the edges are known except one, and the weight of the
           block is already known, the weight of the unknown edge will
           be the weight of the block minus the sum of all the known
           edges. If the sum of all the known edges is larger than B's weight,
           we set the unknown edge weight to zero.
      
         - If there is a self-referential edge, and the weight of the block is
           known, the weight for that edge is set to the weight of the block
           minus the weight of the other incoming edges to that block (if
           known).
      
      Since this propagation is not guaranteed to finalize for every CFG, we
      only allow it to proceed for a limited number of iterations (controlled
      by -sample-profile-max-propagate-iterations). It currently uses the same
      GCC default of 100.
      
      Before propagation starts, the pass builds (for each block) a list of
      unique predecessors and successors. This is necessary to handle
      identical edges in multiway branches. Since we visit all blocks and all
      edges of the CFG, it is cleaner to build these lists once at the start
      of the pass.
      
      Finally, the patch fixes the computation of relative line locations.
      The profiler emits lines relative to the function header. To discover
      it, we traverse the compilation unit looking for the subprogram
      corresponding to the function. The line number of that subprogram is the
      line where the function begins. That becomes line zero for all the
      relative locations.
      
      llvm-svn: 198972
      0accb3d2
  4. Jan 10, 2014
  5. Jan 09, 2014
    • Chandler Carruth's avatar
      Put the functionality for printing a value to a raw_ostream as an · d48cdbf0
      Chandler Carruth authored
      operand into the Value interface just like the core print method is.
      That gives a more conistent organization to the IR printing interfaces
      -- they are all attached to the IR objects themselves. Also, update all
      the users.
      
      This removes the 'Writer.h' header which contained only a single function
      declaration.
      
      llvm-svn: 198836
      d48cdbf0
  6. Jan 08, 2014
  7. Jan 07, 2014
    • Chandler Carruth's avatar
      Move the LLVM IR asm writer header files into the IR directory, as they · 9aca918d
      Chandler Carruth authored
      are part of the core IR library in order to support dumping and other
      basic functionality.
      
      Rename the 'Assembly' include directory to 'AsmParser' to match the
      library name and the only functionality left their -- printing has been
      in the core IR library for quite some time.
      
      Update all of the #includes to match.
      
      All of this started because I wanted to have the layering in good shape
      before I started adding support for printing LLVM IR using the new pass
      infrastructure, and commandline support for the new pass infrastructure.
      
      llvm-svn: 198688
      9aca918d
    • Chandler Carruth's avatar
      Re-sort all of the includes with ./utils/sort_includes.py so that · 8a8cd2ba
      Chandler Carruth authored
      subsequent changes are easier to review. About to fix some layering
      issues, and wanted to separate out the necessary churn.
      
      Also comment and sink the include of "Windows.h" in three .inc files to
      match the usage in Memory.inc.
      
      llvm-svn: 198685
      8a8cd2ba
    • Andrew Trick's avatar
      Reapply r198654 "indvars: sink truncates outside the loop." · e4a18605
      Andrew Trick authored
      This doesn't seem to have actually broken anything. It was paranoia
      on my part. Trying again now that bots are more stable.
      
      This is a follow up of the r198338 commit that added truncates for
      lcssa phi nodes. Sinking the truncates below the phis cleans up the
      loop and simplifies subsequent analysis within the indvars pass.
      
      llvm-svn: 198678
      e4a18605
    • Andrew Trick's avatar
      Revert "indvars: sink truncates outside the loop." · 3c0ed089
      Andrew Trick authored
      This reverts commit r198654.
      
      One of the bots reported a SciMark failure.
      
      llvm-svn: 198659
      3c0ed089
    • Andrew Trick's avatar
      indvars: sink truncates outside the loop. · 0b8e3b2c
      Andrew Trick authored
      This is a follow up of the r198338 commit that added truncates for
      lcssa phi nodes. Sinking the truncates below the phis cleans up the
      loop and simplifies subsequent analysis within the indvars pass.
      
      llvm-svn: 198654
      0b8e3b2c
    • Andrew Trick's avatar
      80 col. comment. · b70d9780
      Andrew Trick authored
      llvm-svn: 198653
      b70d9780
  8. Jan 06, 2014
  9. Jan 04, 2014
    • Alp Toker's avatar
      Add missed cleanup from r198456 · f929e09b
      Alp Toker authored
      All other uses of this macro in LLVM/clang have been moved to the function
      definition so follow suite (and the usage advice) here too for consistency.
      
      llvm-svn: 198516
      f929e09b
    • Alp Toker's avatar
      Revert "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things." · 5e9f3265
      Alp Toker authored
      This commit was the source of crasher PR18384:
      
      While deleting: label %for.cond127
      An asserting value handle still pointed to this value!
      UNREACHABLE executed at llvm/lib/IR/Value.cpp:671!
      
      Reverting to get the builders green, feel free to re-land after fixing up.
      (Renato has a handy isolated repro if you need it.)
      
      This reverts commit r198478.
      
      llvm-svn: 198503
      5e9f3265
    • Andrew Trick's avatar
      Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things. · aceac974
      Andrew Trick authored
      getSCEV for an ashr instruction creates an intermediate zext
      expression when it truncates its operand.
      
      The operand is initially inside the loop, so the narrow zext
      expression has a non-loop-invariant loop disposition.
      
      LoopSimplify then runs on an outer loop, hoists the ashr operand, and
      properly invalidate the SCEVs that are mapped to value.
      
      The SCEV expression for the ashr is now an AddRec with the hoisted
      value as the now loop-invariant start value.
      
      The LoopDisposition of this wide value was properly invalidated during
      LoopSimplify.
      
      However, if we later get the ashr SCEV again, we again try to create
      the intermediate zext expression. We get the same SCEV that we did
      earlier, and it is still cached because it was never mapped to a
      Value. When we try to create a new AddRec we abort because we're using
      the old non-loop-invariant LoopDisposition.
      
      I don't have a solution for this other than to clear LoopDisposition
      when LoopSimplify hoists things.
      
      I think the long-term strategy should be to perform LoopSimplify on
      all loops before computing SCEV and before running any loop opts on
      individual loops. It's possible we may want to rerun LoopSimplify on
      individual loops, but it should rarely do anything, so rarely require
      invalidating SCEV.
      
      llvm-svn: 198478
      aceac974
  10. Jan 03, 2014
    • Nico Weber's avatar
      Add a LLVM_DUMP_METHOD macro. · 7408c706
      Nico Weber authored
      The motivation is to mark dump methods as used in debug builds so that they can
      be called from lldb, but to not do so in release builds so that they can be
      dead-stripped.
      
      There's lots of potential follow-up work suggested in the thread
      "Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev,
      but everyone seems to agreen on this subset.
      
      Macro name chosen by fair coin toss.
      
      llvm-svn: 198456
      7408c706
    • David Peixotto's avatar
      Fix loop rerolling pass failure with non-consant loop lower bound · ea9ba446
      David Peixotto authored
      The loop rerolling pass was failing with an assertion failure from a
      failed cast on loops like this:
      
        void foo(int *A, int *B, int m, int n) {
          for (int i = m; i < n; i+=4) {
            A[i+0] = B[i+0] * 4;
            A[i+1] = B[i+1] * 4;
            A[i+2] = B[i+2] * 4;
            A[i+3] = B[i+3] * 4;
          }
        }
      
      The code was casting the SCEV-expanded code for the new
      induction variable to a phi-node. When the loop had a non-constant
      lower bound, the SCEV expander would end the code expansion with an
      add insted of a phi node and the cast would fail.
      
      It looks like the cast to a phi node was only needed to get the
      induction variable value coming from the backedge to compute the end
      of loop condition. This patch changes the loop reroller to compare
      the induction variable to the number of times the backedge is taken
      instead of the iteration count of the loop. In other words, we stop
      the loop when the current value of the induction variable ==
      IterationCount-1. Previously, the comparison was comparing the
      induction variable value from the next iteration == IterationCount.
      
      This problem only seems to occur on 32-bit targets. For some reason,
      the loop is not rerolled on 64-bit targets.
      
      PR18290
      
      llvm-svn: 198425
      ea9ba446
  11. Jan 02, 2014
  12. Dec 30, 2013
  13. Dec 25, 2013
  14. Dec 24, 2013
    • Andrew Trick's avatar
      Add support to indvars for optimizing sadd.with.overflow. · 0ba77a07
      Andrew Trick authored
      Split sadd.with.overflow into add + sadd.with.overflow to allow
      analysis and optimization. This should ideally be done after
      InstCombine, which can perform code motion (eventually indvars should
      run after all canonical instcombines). We want ISEL to recombine the
      add and the check, at least on x86.
      
      This is currently under an option for reducing live induction
      variables: -liv-reduce. The next step is reducing liveness of IVs that
      are live out of the overflow check paths. Once the related
      optimizations are fully developed, reviewed and tested, I do expect
      this to become default.
      
      llvm-svn: 197926
      0ba77a07
  15. Dec 23, 2013
  16. Dec 20, 2013
  17. Dec 19, 2013
  18. Dec 17, 2013
Loading