Skip to content
  1. Jan 13, 2014
  2. Jan 11, 2014
    • Diego Novillo's avatar
      Extend and simplify the sample profile input file. · 9518b63b
      Diego Novillo authored
      1- Use the line_iterator class to read profile files.
      
      2- Allow comments in profile file. Lines starting with '#'
         are completely ignored while reading the profile.
      
      3- Add parsing support for discriminators and indirect call samples.
      
         Our external profiler can emit more profile information that we are
         currently not handling. This patch does not add new functionality to
         support this information, but it allows profile files to provide it.
      
         I will add actual support later on (for at least one of these
         features, I need support for DWARF discriminators in Clang).
      
         A sample line may contain the following additional information:
      
         Discriminator. This is used if the sampled program was compiled with
         DWARF discriminator support
         (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This
         is currently only emitted by GCC and we just ignore it.
      
         Potential call targets and samples. If present, this line contains a
         call instruction. This models both direct and indirect calls. Each
         called target is listed together with the number of samples. For
         example,
      
                          130: 7  foo:3  bar:2  baz:7
      
         The above means that at relative line offset 130 there is a call
         instruction that calls one of foo(), bar() and baz(). With baz()
         being the relatively more frequent call target.
      
         Differential Revision: http://llvm-reviews.chandlerc.com/D2355
      
      4- Simplify format of profile input file.
      
         This implements earlier suggestions to simplify the format of the
         sample profile file. The symbol table is not necessary and function
         profiles do not need to know the number of samples in advance.
      
         Differential Revision: http://llvm-reviews.chandlerc.com/D2419
      
      llvm-svn: 198973
      9518b63b
    • Diego Novillo's avatar
      Propagation of profile samples through the CFG. · 0accb3d2
      Diego Novillo authored
      This adds a propagation heuristic to convert instruction samples
      into branch weights. It implements a similar heuristic to the one
      implemented by Dehao Chen on GCC.
      
      The propagation proceeds in 3 phases:
      
      1- Assignment of block weights. All the basic blocks in the function
         are initial assigned the same weight as their most frequently
         executed instruction.
      
      2- Creation of equivalence classes. Since samples may be missing from
         blocks, we can fill in the gaps by setting the weights of all the
         blocks in the same equivalence class to the same weight. To compute
         the concept of equivalence, we use dominance and loop information.
         Two blocks B1 and B2 are in the same equivalence class if B1
         dominates B2, B2 post-dominates B1 and both are in the same loop.
      
      3- Propagation of block weights into edges. This uses a simple
         propagation heuristic. The following rules are applied to every
         block B in the CFG:
      
         - If B has a single predecessor/successor, then the weight
           of that edge is the weight of the block.
      
         - If all the edges are known except one, and the weight of the
           block is already known, the weight of the unknown edge will
           be the weight of the block minus the sum of all the known
           edges. If the sum of all the known edges is larger than B's weight,
           we set the unknown edge weight to zero.
      
         - If there is a self-referential edge, and the weight of the block is
           known, the weight for that edge is set to the weight of the block
           minus the weight of the other incoming edges to that block (if
           known).
      
      Since this propagation is not guaranteed to finalize for every CFG, we
      only allow it to proceed for a limited number of iterations (controlled
      by -sample-profile-max-propagate-iterations). It currently uses the same
      GCC default of 100.
      
      Before propagation starts, the pass builds (for each block) a list of
      unique predecessors and successors. This is necessary to handle
      identical edges in multiway branches. Since we visit all blocks and all
      edges of the CFG, it is cleaner to build these lists once at the start
      of the pass.
      
      Finally, the patch fixes the computation of relative line locations.
      The profiler emits lines relative to the function header. To discover
      it, we traverse the compilation unit looking for the subprogram
      corresponding to the function. The line number of that subprogram is the
      line where the function begins. That becomes line zero for all the
      relative locations.
      
      llvm-svn: 198972
      0accb3d2
  3. Jan 09, 2014
    • Chandler Carruth's avatar
      Put the functionality for printing a value to a raw_ostream as an · d48cdbf0
      Chandler Carruth authored
      operand into the Value interface just like the core print method is.
      That gives a more conistent organization to the IR printing interfaces
      -- they are all attached to the IR objects themselves. Also, update all
      the users.
      
      This removes the 'Writer.h' header which contained only a single function
      declaration.
      
      llvm-svn: 198836
      d48cdbf0
  4. Jan 07, 2014
    • Chandler Carruth's avatar
      Move the LLVM IR asm writer header files into the IR directory, as they · 9aca918d
      Chandler Carruth authored
      are part of the core IR library in order to support dumping and other
      basic functionality.
      
      Rename the 'Assembly' include directory to 'AsmParser' to match the
      library name and the only functionality left their -- printing has been
      in the core IR library for quite some time.
      
      Update all of the #includes to match.
      
      All of this started because I wanted to have the layering in good shape
      before I started adding support for printing LLVM IR using the new pass
      infrastructure, and commandline support for the new pass infrastructure.
      
      llvm-svn: 198688
      9aca918d
    • Chandler Carruth's avatar
      Re-sort all of the includes with ./utils/sort_includes.py so that · 8a8cd2ba
      Chandler Carruth authored
      subsequent changes are easier to review. About to fix some layering
      issues, and wanted to separate out the necessary churn.
      
      Also comment and sink the include of "Windows.h" in three .inc files to
      match the usage in Memory.inc.
      
      llvm-svn: 198685
      8a8cd2ba
    • Andrew Trick's avatar
      Reapply r198654 "indvars: sink truncates outside the loop." · e4a18605
      Andrew Trick authored
      This doesn't seem to have actually broken anything. It was paranoia
      on my part. Trying again now that bots are more stable.
      
      This is a follow up of the r198338 commit that added truncates for
      lcssa phi nodes. Sinking the truncates below the phis cleans up the
      loop and simplifies subsequent analysis within the indvars pass.
      
      llvm-svn: 198678
      e4a18605
    • Andrew Trick's avatar
      Revert "indvars: sink truncates outside the loop." · 3c0ed089
      Andrew Trick authored
      This reverts commit r198654.
      
      One of the bots reported a SciMark failure.
      
      llvm-svn: 198659
      3c0ed089
    • Andrew Trick's avatar
      indvars: sink truncates outside the loop. · 0b8e3b2c
      Andrew Trick authored
      This is a follow up of the r198338 commit that added truncates for
      lcssa phi nodes. Sinking the truncates below the phis cleans up the
      loop and simplifies subsequent analysis within the indvars pass.
      
      llvm-svn: 198654
      0b8e3b2c
    • Andrew Trick's avatar
      80 col. comment. · b70d9780
      Andrew Trick authored
      llvm-svn: 198653
      b70d9780
  5. Jan 04, 2014
    • Alp Toker's avatar
      Add missed cleanup from r198456 · f929e09b
      Alp Toker authored
      All other uses of this macro in LLVM/clang have been moved to the function
      definition so follow suite (and the usage advice) here too for consistency.
      
      llvm-svn: 198516
      f929e09b
  6. Jan 03, 2014
    • Nico Weber's avatar
      Add a LLVM_DUMP_METHOD macro. · 7408c706
      Nico Weber authored
      The motivation is to mark dump methods as used in debug builds so that they can
      be called from lldb, but to not do so in release builds so that they can be
      dead-stripped.
      
      There's lots of potential follow-up work suggested in the thread
      "Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev,
      but everyone seems to agreen on this subset.
      
      Macro name chosen by fair coin toss.
      
      llvm-svn: 198456
      7408c706
    • David Peixotto's avatar
      Fix loop rerolling pass failure with non-consant loop lower bound · ea9ba446
      David Peixotto authored
      The loop rerolling pass was failing with an assertion failure from a
      failed cast on loops like this:
      
        void foo(int *A, int *B, int m, int n) {
          for (int i = m; i < n; i+=4) {
            A[i+0] = B[i+0] * 4;
            A[i+1] = B[i+1] * 4;
            A[i+2] = B[i+2] * 4;
            A[i+3] = B[i+3] * 4;
          }
        }
      
      The code was casting the SCEV-expanded code for the new
      induction variable to a phi-node. When the loop had a non-constant
      lower bound, the SCEV expander would end the code expansion with an
      add insted of a phi node and the cast would fail.
      
      It looks like the cast to a phi node was only needed to get the
      induction variable value coming from the backedge to compute the end
      of loop condition. This patch changes the loop reroller to compare
      the induction variable to the number of times the backedge is taken
      instead of the iteration count of the loop. In other words, we stop
      the loop when the current value of the induction variable ==
      IterationCount-1. Previously, the comparison was comparing the
      induction variable value from the next iteration == IterationCount.
      
      This problem only seems to occur on 32-bit targets. For some reason,
      the loop is not rerolled on 64-bit targets.
      
      PR18290
      
      llvm-svn: 198425
      ea9ba446
  7. Jan 02, 2014
    • Hal Finkel's avatar
      Disable compare sinking in CodeGenPrepare when multiple condition registers are available · decb024c
      Hal Finkel authored
      As noted in the comment above CodeGenPrepare::OptimizeInst, which aggressively
      sinks compares to reduce pressure on the condition register(s), for targets
      such as PowerPC with multiple condition registers, this may not be the right
      thing to do. This adds an HasMultipleConditionRegisters boolean to TLI, and
      CodeGenPrepare::OptimizeInst is skipped when HasMultipleConditionRegisters is
      true.
      
      This functionality will be used by the PowerPC backend in an upcoming commit.
      Especially when the PowerPC backend starts tracking individual condition
      register bits as separate allocatable entities (which will happen in this
      upcoming commit), this sinking from CodeGenPrepare::OptimizeInst is
      significantly suboptimial.
      
      llvm-svn: 198354
      decb024c
    • Andrew Trick's avatar
      b6bc7830
    • Andrew Trick's avatar
      indvars: insert truncate at loop boundary to avoid redundant IVs. · 020dd898
      Andrew Trick authored
      When widening an IV to remove s/zext, we generally try to eliminate
      the original narrow IV. However, LCSSA phi nodes outside the loop were
      still using the original IV. Clean this up more aggressively to avoid
      redundancy in generated code.
      
      llvm-svn: 198338
      020dd898
  8. Dec 24, 2013
    • Andrew Trick's avatar
      Add support to indvars for optimizing sadd.with.overflow. · 0ba77a07
      Andrew Trick authored
      Split sadd.with.overflow into add + sadd.with.overflow to allow
      analysis and optimization. This should ideally be done after
      InstCombine, which can perform code motion (eventually indvars should
      run after all canonical instcombines). We want ISEL to recombine the
      add and the check, at least on x86.
      
      This is currently under an option for reducing live induction
      variables: -liv-reduce. The next step is reducing liveness of IVs that
      are live out of the overflow check paths. Once the related
      optimizations are fully developed, reviewed and tested, I do expect
      this to become default.
      
      llvm-svn: 197926
      0ba77a07
  9. Dec 23, 2013
  10. Dec 10, 2013
  11. Dec 07, 2013
  12. Dec 05, 2013
  13. Nov 26, 2013
    • Diego Novillo's avatar
      Refactor some code in SampleProfile.cpp · c0dd1037
      Diego Novillo authored
      I'm adding new functionality in the sample profiler. This will
      require more data to be kept around for each function, so I moved
      the structure SampleProfile that we keep for each function into
      a separate class.
      
      There are no functional changes in this patch. It simply provides
      a new home where to place all the new data that I need to propagate
      weights through edges.
      
      There are some other name and minor edits throughout.
      
      llvm-svn: 195780
      c0dd1037
  14. Nov 22, 2013
  15. Nov 19, 2013
    • Chandler Carruth's avatar
      Fix an issue where SROA computed different results based on the relative · a1262006
      Chandler Carruth authored
      order of slices of the alloca which have exactly the same size and other
      properties. This was found by a perniciously unstable sort
      implementation used to flush out buggy uses of the algorithm.
      
      The fundamental idea is that findCommonType should return the best
      common type it can find across all of the slices in the range. There
      were two bugs here previously:
      
      1) We would accept an integer type smaller than a byte-width multiple,
         and if there were different bit-width integer types, we would accept
         the first one. This caused an actual failure in the testcase updated
         here when the sort order changed.
      2) If we found a bad combination of types or a non-load, non-store use
         before an integer typed load or store we would bail, but if we found
         the integere typed load or store, we would use it. The correct
         behavior is to always use an integer typed operation which covers the
         partition if one exists.
      
      While a clever debugging sort algorithm found problem #1 in our existing
      test cases, I have no useful test case ideas for #2. I spotted in by
      inspection when looking at this code.
      
      llvm-svn: 195118
      a1262006
  16. Nov 17, 2013
    • Hal Finkel's avatar
      Fix ndebug-build unused variable in loop rerolling · 67107ea1
      Hal Finkel authored
      llvm-svn: 194941
      67107ea1
    • Hal Finkel's avatar
      Add a loop rerolling pass · bf45efde
      Hal Finkel authored
      This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The
      transformation aims to take loops like this:
      
      for (int i = 0; i < 3200; i += 5) {
        a[i]     += alpha * b[i];
        a[i + 1] += alpha * b[i + 1];
        a[i + 2] += alpha * b[i + 2];
        a[i + 3] += alpha * b[i + 3];
        a[i + 4] += alpha * b[i + 4];
      }
      
      and turn them into this:
      
      for (int i = 0; i < 3200; ++i) {
        a[i] += alpha * b[i];
      }
      
      and loops like this:
      
      for (int i = 0; i < 500; ++i) {
        x[3*i] = foo(0);
        x[3*i+1] = foo(0);
        x[3*i+2] = foo(0);
      }
      
      and turn them into this:
      
      for (int i = 0; i < 1500; ++i) {
        x[i] = foo(0);
      }
      
      There are two motivations for this transformation:
      
        1. Code-size reduction (especially relevant, obviously, when compiling for
      code size).
      
        2. Providing greater choice to the loop vectorizer (and generic unroller) to
      choose the unrolling factor (and a better ability to vectorize). The loop
      vectorizer can take vector lengths and register pressure into account when
      choosing an unrolling factor, for example, and a pre-unrolled loop limits that
      choice. This is especially problematic if the manual unrolling was optimized
      for a machine different from the current target.
      
      The current implementation is limited to single basic-block loops only. The
      rerolling recognition should work regardless of how the loop iterations are
      intermixed within the loop body (subject to dependency and side-effect
      constraints), but the significant restriction is that the order of the
      instructions in each iteration must be identical. This seems sufficient to
      capture all current use cases.
      
      This pass is not currently enabled by default at any optimization level.
      
      llvm-svn: 194939
      bf45efde
  17. Nov 13, 2013
    • Alexey Samsonov's avatar
    • Diego Novillo's avatar
      SampleProfileLoader pass. Initial setup. · 8d6568b5
      Diego Novillo authored
      This adds a new scalar pass that reads a file with samples generated
      by 'perf' during runtime. The samples read from the profile are
      incorporated and emmited as IR metadata reflecting that profile.
      
      The profile file is assumed to have been generated by an external
      profile source. The profile information is converted into IR metadata,
      which is later used by the analysis routines to estimate block
      frequencies, edge weights and other related data.
      
      External profile information files have no fixed format, each profiler
      is free to define its own. This includes both the on-disk representation
      of the profile and the kind of profile information stored in the file.
      A common kind of profile is based on sampling (e.g., perf), which
      essentially counts how many times each line of the program has been
      executed during the run.
      
      The SampleProfileLoader pass is organized as a scalar transformation.
      On startup, it reads the file given in -sample-profile-file to
      determine what kind of profile it contains.  This file is assumed to
      contain profile information for the whole application. The profile
      data in the file is read and incorporated into the internal state of
      the corresponding profiler.
      
      To facilitate testing, I've organized the profilers to support two file
      formats: text and native. The native format is whatever on-disk
      representation the profiler wants to support, I think this will mostly
      be bitcode files, but it could be anything the profiler wants to
      support. To do this, every profiler must implement the
      SampleProfile::loadNative() function.
      
      The text format is mostly meant for debugging. Records are separated by
      newlines, but each profiler is free to interpret records as it sees fit.
      Profilers must implement the SampleProfile::loadText() function.
      
      Finally, the pass will call SampleProfile::emitAnnotations() for each
      function in the current translation unit. This function needs to
      translate the loaded profile into IR metadata, which the analyzer will
      later be able to use.
      
      This patch implements the first steps towards the above design. I've
      implemented a sample-based flat profiler. The format of the profile is
      fairly simplistic. Each sampled function contains a list of relative
      line locations (from the start of the function) together with a count
      representing how many samples were collected at that line during
      execution. I generate this profile using perf and a separate converter
      tool.
      
      Currently, I have only implemented a text format for these profiles. I
      am interested in initial feedback to the whole approach before I send
      the other parts of the implementation for review.
      
      This patch implements:
      
      - The SampleProfileLoader pass.
      - The base ExternalProfile class with the core interface.
      - A SampleProfile sub-class using the above interface. The profiler
        generates branch weight metadata on every branch instructions that
        matches the profiles.
      - A text loader class to assist the implementation of
        SampleProfile::loadText().
      - Basic unit tests for the pass.
      
      Additionally, the patch uses profile information to compute branch
      weights based on instruction samples.
      
      This patch converts instruction samples into branch weights. It
      does a fairly simplistic conversion:
      
      Given a multi-way branch instruction, it calculates the weight of
      each branch based on the maximum sample count gathered from each
      target basic block.
      
      Note that this assignment of branch weights is somewhat lossy and can be
      misleading. If a basic block has more than one incoming branch, all the
      incoming branches will get the same weight. In reality, it may be that
      only one of them is the most heavily taken branch.
      
      I will adjust this assignment in subsequent patches.
      
      llvm-svn: 194566
      8d6568b5
  18. Nov 12, 2013
  19. Nov 11, 2013
    • Shuxin Yang's avatar
      Fix PR17952. · 3168ab33
      Shuxin Yang authored
        The symptom is that an assertion is triggered. The assertion was added by
      me to detect the situation when value is propagated from dead blocks.
      (We can certainly get rid of assertion; it is safe to do so, because propagating
       value from dead block to alive join node is certainly ok.)
      
        The root cause of this bug is : edge-splitting is conducted on the fly,
      the edge being split could be a dead edge, therefore the block that 
      split the critial edge needs to be flagged "dead" as well.
      
        There are 3 ways to fix this bug:
        1) Get rid of the assertion as I mentioned eariler 
        2) When an dead edge is split, flag the inserted block "dead".
        3) proactively split the critical edges connecting dead and live blocks when
           new dead blocks are revealed.
      
        This fix go for 3) with additional 2 LOC.
      
        Testing case was added by Rafael the other day.
      
      llvm-svn: 194424
      3168ab33
  20. Nov 10, 2013
  21. Nov 08, 2013
    • Hal Finkel's avatar
      Remove dead code from LoopUnswitch · 1a642aef
      Hal Finkel authored
      LoopUnswitch's code simplification routine has logic to convert conditional
      branches into unconditional branches, after unswitching makes the condition
      constant, and then remove any blocks that renders dead. Unfortunately, this
      code is dead, currently broken, and furthermore, has never been alive (at least
      as far back at 2006).
      
      No functionality change intended.
      
      llvm-svn: 194277
      1a642aef
  22. Nov 05, 2013
    • Hal Finkel's avatar
      Add a runtime unrolling parameter to the LoopUnroll pass constructor · 081eaef6
      Hal Finkel authored
      As with the other loop unrolling parameters (the unrolling threshold, partial
      unrolling, etc.) runtime unrolling can now also be controlled via the
      constructor. This will be necessary for moving non-trivial unrolling late in
      the pass manager (after loop vectorization).
      
      No functionality change intended.
      
      llvm-svn: 194027
      081eaef6
  23. Oct 30, 2013
  24. Oct 25, 2013
    • Andrew Trick's avatar
      Fix SCEVExpander: don't try to expand quadratic recurrences outside a loop. · 57243da7
      Andrew Trick authored
      Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
      (affecting trunk and 3.3)
      
      When SCEV expands a recurrence outside of a loop it attempts to scale
      by the stride of the recurrence. Chained recurrences don't work that
      way. We could compute binomial coefficients, but would hve to
      guarantee that the chained AddRec's are in a perfectly reduced form.
      
      llvm-svn: 193438
      57243da7
Loading