Skip to content
  1. Oct 08, 2014
  2. Oct 07, 2014
    • Duncan P. N. Exon Smith's avatar
      LoopUnroll: Create sub-loops in LoopInfo · c46cfcbb
      Duncan P. N. Exon Smith authored
      `LoopUnrollPass` says that it preserves `LoopInfo` -- make it so.  In
      particular, tell `LoopInfo` about copies of inner loops when unrolling
      the outer loop.
      
      Conservatively, also tell `ScalarEvolution` to forget about the original
      versions of these loops, since their inputs may have changed.
      
      Fixes PR20987.
      
      llvm-svn: 219241
      c46cfcbb
    • Duncan P. N. Exon Smith's avatar
      LoopUnroll: Only check for ScalarEvolution analysis once, NFC · 9b4d37e8
      Duncan P. N. Exon Smith authored
      A follow-up commit will add use to a tight loop.  We might as well just
      find it once anyway.
      
      llvm-svn: 219239
      9b4d37e8
    • Marcello Maggioni's avatar
      Two case switch to select optimization · 963bc87d
      Marcello Maggioni authored
      This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block
      to a select or a couple of selects (depending if the default block is reachable or not).
      
      The typical case this optimization wants to be able to optimize is this one:
      
      Example:
      switch (a) {
        case 10:                %0 = icmp eq i32 %a, 10
          return 10;            %1 = select i1 %0, i32 10, i32 4
        case 20:        ---->   %2 = icmp eq i32 %a, 20
          return 2;             %3 = select i1 %2, i32 2, i32 %1
        default:
          return 4;
      }
      
      It also sets the base for further optimizations that are planned and being reviewed.
      
      llvm-svn: 219223
      963bc87d
    • David Blaikie's avatar
      DebugInfo+DeadArgElimination: Ensure llvm::Function*s from debug info are... · 17364d4e
      David Blaikie authored
      DebugInfo+DeadArgElimination: Ensure llvm::Function*s from debug info are updated even when DAE removes both varargs and non-varargs arguments on the same function.
      
      After some stellar (& inspired) help from Reid Kleckner providing a test
      case for some rather unstable undefined behavior showing up as
      assertions produced by r214761, I was able to fix this issue in DAE
      involving the application of both varargs removal, followed by normal
      argument removal.
      
      Indeed I introduced this same bug into ArgumentPromotion (r212128) by
      copying the code from DAE, and when I fixed the bug in ArgPromo
      (r213805) and commented in that patch that I didn't need to address the
      same issue in DAE because it was a single pass. Turns out it's two pass,
      one for the varargs and one for the normal arguments, so the same fix is
      needed (at least during varargs removal). So here it is.
      
      (the observable/net effect of this bug, even when it didn't result in
      assertion failure, is that debug info would describe the DAE'd function
      in the abstract, but wouldn't provide high/low_pc, variable locations,
      line table, etc (it would appear as though the function had been
      entirely optimized away), see the original PR14016 for details of the
      general problem)
      
      I'm not recommitting the assertion just yet, as there's been another
      regression of it since I last tried. It might just be a few test cases
      weren't adequately updated after Adrian or Duncan's recent schema
      changes.
      
      llvm-svn: 219210
      17364d4e
    • Suyog Sarda's avatar
      Reformat if statement to comply with LLVM standards. NFC. · 65f5ae99
      Suyog Sarda authored
      Differential Revision: http://reviews.llvm.org/D5644
      
      llvm-svn: 219203
      65f5ae99
    • Suyog Sarda's avatar
      Reformat to comply with LLVM coding standards using clang-format. · ea205517
      Suyog Sarda authored
      NFC.
      
      Differential Revision: http://reviews.llvm.org/D5645
      
      llvm-svn: 219202
      ea205517
    • Tilmann Scheller's avatar
      [InstCombine] Reformat if statements to comply with LLVM Coding Standards. · 2bc5cb68
      Tilmann Scheller authored
      Patch by Sonam Kumari!
      
      Differential Revision: http://reviews.llvm.org/D5643
      
      llvm-svn: 219198
      2bc5cb68
    • David Majnemer's avatar
      GlobalDCE: Don't drop any COMDAT members · e025321d
      David Majnemer authored
      If we require a single member of a comdat, require all of the other
      members as well.
      
      This fixes PR20981.
      
      llvm-svn: 219191
      e025321d
    • Gerolf Hoflehner's avatar
      [InstCombine] re-commit r218721 icmp-select-icmp optimization · c0b4c20e
      Gerolf Hoflehner authored
      Takes care of the assert that caused build fails.
      Rather than asserting the code checks now that the definition
      and use are in the same block, and does not attempt
      to optimize when that is not the case.
      
      llvm-svn: 219175
      c0b4c20e
    • David Blaikie's avatar
      range-for some loops in DAE · e44ee92a
      David Blaikie authored
      llvm-svn: 219167
      e44ee92a
    • Duncan P. N. Exon Smith's avatar
      LoopUnroll: Change code order of changes to new basic blocks · e5d7d979
      Duncan P. N. Exon Smith authored
      Add new basic blocks to `LoopInfo` earlier.  No functionality change
      intended (simplifies upcoming bugfix patch).
      
      llvm-svn: 219150
      e5d7d979
    • Duncan P. N. Exon Smith's avatar
      Sink comment, NFC · 0bbf5418
      Duncan P. N. Exon Smith authored
      llvm-svn: 219149
      0bbf5418
  3. Oct 06, 2014
  4. Oct 05, 2014
    • Hal Finkel's avatar
      [InstCombine] Simplify the logic from r219067 using ValueTracking · 45646888
      Hal Finkel authored
      Joerg suggested on IRC that I look at generalizing the logic from r219067 to
      handle more general redundancies (like removing an assume(x > 3) dominated by
      an assume(x > 5)). The way to do this would be to ask ValueTracking to
      determine the value of the i1 argument. It turns out that ValueTracking is not
      very good at this right now (although it does get the trivial redundancy case)
      because it does not understand ICmps. Nevertheless, the resulting code in
      InstCombine is simpler than r219067, so we might as well do it now.
      
      llvm-svn: 219070
      45646888
  5. Oct 04, 2014
  6. Oct 03, 2014
  7. Oct 02, 2014
  8. Oct 01, 2014
    • Duncan P. N. Exon Smith's avatar
      DIBuilder: Encapsulate DIExpression's element type · 611afb22
      Duncan P. N. Exon Smith authored
      `DIExpression`'s elements are 64-bit integers that are stored as
      `ConstantInt`.  The accessors already encapsulate the storage.  This
      commit updates the `DIBuilder` API to also encapsulate that.
      
      llvm-svn: 218797
      611afb22
    • Adrian Prantl's avatar
      Move the complex address expression out of DIVariable and into an extra · 87b7eb9d
      Adrian Prantl authored
      argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
      
      Previously, DIVariable was a variable-length field that has an optional
      reference to a Metadata array consisting of a variable number of
      complex address expressions. In the case of OpPiece expressions this is
      wasting a lot of storage in IR, because when an aggregate type is, e.g.,
      SROA'd into all of its n individual members, the IR will contain n copies
      of the DIVariable, all alike, only differing in the complex address
      reference at the end.
      
      By making the complex address into an extra argument of the
      dbg.value/dbg.declare intrinsics, all of the pieces can reference the
      same variable and the complex address expressions can be uniqued across
      the CU, too.
      Down the road, this will allow us to move other flags, such as
      "indirection" out of the DIVariable, too.
      
      The new intrinsics look like this:
      declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
      declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
      
      This patch adds a new LLVM-local tag to DIExpressions, so we can detect
      and pretty-print DIExpression metadata nodes.
      
      What this patch doesn't do:
      
      This patch does not touch the "Indirect" field in DIVariable; but moving
      that into the expression would be a natural next step.
      
      http://reviews.llvm.org/D4919
      rdar://problem/17994491
      
      Thanks to dblaikie and dexonsmith for reviewing this patch!
      
      Note: I accidentally committed a bogus older version of this patch previously.
      llvm-svn: 218787
      87b7eb9d
    • Adrian Prantl's avatar
      Revert r218778 while investigating buldbot breakage. · b458dc2e
      Adrian Prantl authored
      "Move the complex address expression out of DIVariable and into an extra"
      
      llvm-svn: 218782
      b458dc2e
    • Adrian Prantl's avatar
      Move the complex address expression out of DIVariable and into an extra · 25a7174e
      Adrian Prantl authored
      argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
      
      Previously, DIVariable was a variable-length field that has an optional
      reference to a Metadata array consisting of a variable number of
      complex address expressions. In the case of OpPiece expressions this is
      wasting a lot of storage in IR, because when an aggregate type is, e.g.,
      SROA'd into all of its n individual members, the IR will contain n copies
      of the DIVariable, all alike, only differing in the complex address
      reference at the end.
      
      By making the complex address into an extra argument of the
      dbg.value/dbg.declare intrinsics, all of the pieces can reference the
      same variable and the complex address expressions can be uniqued across
      the CU, too.
      Down the road, this will allow us to move other flags, such as
      "indirection" out of the DIVariable, too.
      
      The new intrinsics look like this:
      declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
      declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
      
      This patch adds a new LLVM-local tag to DIExpressions, so we can detect
      and pretty-print DIExpression metadata nodes.
      
      What this patch doesn't do:
      
      This patch does not touch the "Indirect" field in DIVariable; but moving
      that into the expression would be a natural next step.
      
      http://reviews.llvm.org/D4919
      rdar://problem/17994491
      
      Thanks to dblaikie and dexonsmith for reviewing this patch!
      
      llvm-svn: 218778
      25a7174e
    • Tom Stellard's avatar
      C API: Add LLVMCloneModule() · 0a4e9a3b
      Tom Stellard authored
      llvm-svn: 218775
      0a4e9a3b
    • Evgeniy Stepanov's avatar
    • Gerolf Hoflehner's avatar
      [InstCombine] Fix for assert build failures caused by r218721 · 19fc3daf
      Gerolf Hoflehner authored
      The icmp-select-icmp optimization made the implicit assumption
      that the select-icmp instructions are in the same block and asserted on it.
      The fix explicitly checks for that condition and conservatively suppresses
      the optimization when it is violated.
      
      llvm-svn: 218735
      19fc3daf
    • Gerolf Hoflehner's avatar
      [InstCombine] Optimize icmp-select-icmp · 08cc4b95
      Gerolf Hoflehner authored
      In special cases select instructions can be eliminated by
      replacing them with a cheaper bitwise operation even when the
      select result is used outside its home block. The instances implemented
      are patterns like
          %x=icmp.eq
          %y=select %x,%r, null
          %z=icmp.eq|neq %y, null
          br %z,true, false
      ==> %x=icmp.ne
          %y=icmp.eq %r,null
          %z=or %x,%y
          br %z,true,false
      The optimization is integrated into the instruction
      combiner and performed only when all uses of the select result can
      be replaced by the select operand proper. For this dominator information
      is used and dominance is now a required analysis pass in the combiner.
      The optimization itself is iterative. The critical step is to replace the
      select result with the non-constant select operand. So the select becomes
      local and the combiner iteratively works out simpler code pattern and
      eventually eliminates the select.
      
      rdar://17853760
      
      llvm-svn: 218721
      08cc4b95
    • Jingyue Wu's avatar
      [SimplifyCFG] threshold for folding branches with common destination · fc029670
      Jingyue Wu authored
      Summary:
      This patch adds a threshold that controls the number of bonus instructions
      allowed for folding branches with common destination. The original code allows
      at most one bonus instruction. With this patch, users can customize the
      threshold to allow multiple bonus instructions. The default threshold is still
      1, so that the code behaves the same as before when users do not specify this
      threshold.
      
      The motivation of this change is that tuning this threshold significantly (up
      to 25%) improves the performance of some CUDA programs in our internal code
      base. In general, branch instructions are very expensive for GPU programs.
      Therefore, it is sometimes worth trading more arithmetic computation for a more
      straightened control flow. Here's a reduced example:
      
        __global__ void foo(int a, int b, int c, int d, int e, int n,
                            const int *input, int *output) {
          int sum = 0;
          for (int i = 0; i < n; ++i)
            sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
          *output = sum;
        }
      
      The select statement in the loop body translates to two branch instructions "if
      ((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
      With the default threshold, SimplifyCFG is unable to fold them, because
      computing the condition of the second branch "(i | c) ^ d > e" requires two
      bonus instructions. With the threshold increased, SimplifyCFG can fold the two
      branches so that the loop body contains only one branch, making the code
      conceptually look like:
      
        sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];
      
      Increasing the threshold significantly improves the performance of this
      particular example. In the configuration where both conditions are guaranteed
      to be true, increasing the threshold from 1 to 2 improves the performance by
      18.24%. Even in the configuration where the first condition is false and the
      second condition is true, which favors shortcuts, increasing the threshold from
      1 to 2 still improves the performance by 4.35%.
      
      We are still looking for a good threshold and maybe a better cost model than
      just counting the number of bonus instructions. However, according to the above
      numbers, we think it is at least worth adding a threshold to enable more
      experiments and tuning. Let me know what you think. Thanks!
      
      Test Plan: Added one test case to check the threshold is in effect
      
      Reviewers: nadav, eliben, meheff, resistor, hfinkel
      
      Reviewed By: hfinkel
      
      Subscribers: hfinkel, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D5529
      
      llvm-svn: 218711
      fc029670
  9. Sep 30, 2014
Loading