Skip to content
  1. Aug 17, 2015
  2. Aug 16, 2015
  3. Aug 15, 2015
    • JF Bastien's avatar
      Accelerate MergeFunctions with hashing · 5e4303dc
      JF Bastien authored
      This patch makes the Merge Functions pass faster by calculating and comparing
      a hash value which captures the essential structure of a function before
      performing a full function comparison.
      
      The hash is calculated by hashing the function signature, then walking the basic
      blocks of the function in the same order as the main comparison function. The
      opcode of each instruction is hashed in sequence, which means that different
      functions according to the existing total order cannot have the same hash, as
      the comparison requires the opcodes of the two functions to be the same order.
      
      The hash function is a static member of the FunctionComparator class because it
      is tightly coupled to the exact comparison function used. For example, functions
      which are equivalent modulo a single variant callsite might be merged by a more
      aggressive MergeFunctions, and the hash function would need to be insensitive to
      these differences in order to exploit this.
      
      The hashing function uses a utility class which accumulates the values into an
      internal state using a standard bit-mixing function. Note that this is a different interface
      than a regular hashing routine, because the values to be hashed are scattered
      amongst the properties of a llvm::Function, not linear in memory. This scheme is
      fast because only one word of state needs to be kept, and the mixing function is
      a few instructions.
      
      The main runOnModule function first computes the hash of each function, and only
      further processes functions which do not have a unique function hash. The hash
      is also used to order the sorted function set. If the hashes differ, their
      values are used to order the functions, otherwise the full comparison is done.
      
      Both of these are helpful in speeding up MergeFunctions. Together they result in
      speedups of 9% for mysqld (a mostly C application with little redundancy), 46%
      for libxul in Firefox, and 117% for Chromium. (These are all LTO builds.) In all
      three cases, the new speed of MergeFunctions is about half that of the module
      verifier, making it relatively inexpensive even for large LTO builds with
      hundreds of thousands of functions. The same functions are merged, so this
      change is free performance.
      
      Author: jrkoenig
      
      Reviewers: nlewycky, dschuff, jfb
      
      Subscribers: llvm-commits, aemerson
      
      Differential revision: http://reviews.llvm.org/D11923
      
      llvm-svn: 245140
      5e4303dc
    • Matt Arsenault's avatar
      LoopStrengthReduce: Try to pass address space to isLegalAddressingMode · 427a0fd2
      Matt Arsenault authored
      This seems to only work some of the time. In some situations,
      this seems to use a nonsensical type and isn't actually aware of the
      memory being accessed. e.g. if branch condition is an icmp of a pointer,
      it checks the addressing mode of i1.
      
      llvm-svn: 245137
      427a0fd2
    • Nick Lewycky's avatar
      Fix a crash where a utility function wasn't aware of fcmp vectors and created... · 8075fd22
      Nick Lewycky authored
      Fix a crash where a utility function wasn't aware of fcmp vectors and created a value with the wrong type. Fixes PR24458!
      
      llvm-svn: 245119
      8075fd22
    • Bjarke Hammersholt Roune's avatar
      [SCEV] Apply NSW and NUW flags via poison value analysis for sub, mul and shl · 9791ed47
      Bjarke Hammersholt Roune authored
      Summary:
      http://reviews.llvm.org/D11212 made Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs for add instructions. This patch expands that to sub, mul and shl instructions.
      
      This change makes LSR able to generate pointer induction variables for loops like these, where the index is 32 bit and the pointer is 64 bit:
      
        for (int i = 0; i < numIterations; ++i)
          sum += ptr[i - offset];
      
        for (int i = 0; i < numIterations; ++i)
          sum += ptr[i * stride];
      
        for (int i = 0; i < numIterations; ++i)
          sum += ptr[3 * (i << 7)];
      
      
      Reviewers: atrick, sanjoy
      
      Subscribers: sanjoy, majnemer, hfinkel, llvm-commits, meheff, jingyue, eliben
      
      Differential Revision: http://reviews.llvm.org/D11860
      
      llvm-svn: 245118
      9791ed47
  4. Aug 14, 2015
    • Chad Rosier's avatar
      Cleanup test whitespace or lack thereof. NFC. · 67dca908
      Chad Rosier authored
      llvm-svn: 245065
      67dca908
    • Karthik Bhat's avatar
      Add support for cross block dse. · ddc2a86a
      Karthik Bhat authored
      This patch enables dead stroe elimination across basicblocks.
      
      Example:
      define void @test_02(i32 %N) {
        %1 = alloca i32
        store i32 %N, i32* %1
        store i32 10, i32* @x
        %2 = load i32, i32* %1
        %3 = icmp ne i32 %2, 0
        br i1 %3, label %4, label %5
      
      ; <label>:4
        store i32 5, i32* @x
        br label %7
      
      ; <label>:5
        %6 = load i32, i32* @x
        store i32 %6, i32* @y
        br label %7
      
      ; <label>:7
        store i32 15, i32* @x
        ret void
      }
      In the above example dead store "store i32 5, i32* @x" is now eliminated.
      
      Differential Revision: http://reviews.llvm.org/D11143
      
      llvm-svn: 245025
      ddc2a86a
    • Jingyue Wu's avatar
      [SeparateConstOffsetFromGEP] sext(a)+sext(b) => sext(a+b) when a+b can't sign-overflow. · 1238f341
      Jingyue Wu authored
      Summary:
      This patch implements my promised optimization to reunites certain sexts from
      operands after we extract the constant offset. See the header comment of
      reuniteExts for its motivation.
      
      One key building block that enables this optimization is Bjarke's poison value
      analysis (D11212). That helps to prove "a +nsw b" can't overflow.
      
      Reviewers: broune
      
      Subscribers: jholewinski, sanjoy, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D12016
      
      llvm-svn: 245003
      1238f341
  5. Aug 13, 2015
    • Davide Italiano's avatar
      [SimplifyLibCalls] Correctly set the is_zero_undef flag for llvm.cttz · a195386c
      Davide Italiano authored
      If <src> is non-zero we can safely set the flag to true, and this
      results in less code generated for, e.g. ffs(x) + 1 on FreeBSD.
      Thanks to majnemer for suggesting the fix and reviewing.
      
      Code generated before the patch was applied:
      
      
       0:   0f bc c7                bsf    %edi,%eax
       3:   b9 20 00 00 00          mov    $0x20,%ecx
       8:   0f 45 c8                cmovne %eax,%ecx
       b:   83 c1 02                add    $0x2,%ecx
       e:   b8 01 00 00 00          mov    $0x1,%eax
      13:   85 ff                   test   %edi,%edi
      15:   0f 45 c1                cmovne %ecx,%eax
      18:   c3                      retq
      
      Code generated after the patch was applied:
      
       0:   0f bc cf                bsf    %edi,%ecx
       3:   83 c1 02                add    $0x2,%ecx
       6:   85 ff                   test   %edi,%edi
       8:   b8 01 00 00 00          mov    $0x1,%eax
       d:   0f 45 c1                cmovne %ecx,%eax
      10:   c3                      retq
      
      It seems we can still use cmove and save another 'test' instruction, but
      that can be tackled separately.
      
      Differential Revision:  http://reviews.llvm.org/D11989	
      
      llvm-svn: 244947
      a195386c
    • Jingyue Wu's avatar
      [SeparateConstOffsetFromGEP] strengthen the inbounds attribute · 13a80eac
      Jingyue Wu authored
      We used to be over-conservative about preserving inbounds. Actually, the second
      GEP (which applies the constant offset) can inherit the inbounds attribute of
      the original GEP, because the resultant pointer is equivalent to that of the
      original GEP. For example,
      
        x  = GEP inbounds a, i+5
          =>
        y = GEP a, i               // inbounds removed
        x = GEP inbounds y, 5      // inbounds preserved
      
      llvm-svn: 244937
      13a80eac
    • Igor Laevsky's avatar
      Emit argmemonly attribute for intrinsics. · 30143aee
      Igor Laevsky authored
      Differential Revision: http://reviews.llvm.org/D11352
      
      llvm-svn: 244920
      30143aee
    • Erik Eckstein's avatar
      [DeadStoreElimination] remove a redundant store even if the load is in a different block. · 11fc8175
      Erik Eckstein authored
      DeadStoreElimination does eliminate a store if it stores a value which was loaded from the same memory location.
      So far this worked only if the store is in the same block as the load.
      Now we can also handle stores which are in a different block than the load.
      Example:
      
      define i32 @test(i1, i32*) {
      entry:
        %l2 = load i32, i32* %1, align 4
        br i1 %0, label %bb1, label %bb2
      bb1:
        br label %bb3
      bb2:
        ; This store is redundant
        store i32 %l2, i32* %1, align 4
        br label %bb3
      bb3:
        ret i32 0
      }
      
      Differential Revision: http://reviews.llvm.org/D11854
      
      llvm-svn: 244901
      11fc8175
    • Charlie Turner's avatar
      [InstCombinePHI] Partial simplification of identity operations. · 6153698f
      Charlie Turner authored
      Consider this code:
      
      BB:
        %i = phi i32 [ 0, %if.then ], [ %c, %if.else ]
        %add = add nsw i32 %i, %b
        ...
      
      In this common case the add can be moved to the %if.else basic block, because
      adding zero is an identity operation. If we go though %if.then branch it's
      always a win, because add is not executed; if not, the number of instructions
      stays the same.
      
      This pattern applies also to other instructions like sub, shl, shr, ashr | 0,
      mul, sdiv, div | 1.
      
      Patch by Jakub Kuderski!
      
      llvm-svn: 244887
      6153698f
    • Simon Pilgrim's avatar
      [InstCombine] SSE/AVX vector shifts demanded shift amount bits · becd5e8a
      Simon Pilgrim authored
      Most SSE/AVX (non-constant) vector shift instructions only use the lower 64-bits of the 128-bit shift amount vector operand, this patch calls SimplifyDemandedVectorElts to optimize for this.
      
      I had to refactor some of my recent InstCombiner work on the vector shifts to avoid quite a bit of duplicate code, it means that SimplifyX86immshift now (re)decodes the type of shift.
      
      Differential Revision: http://reviews.llvm.org/D11938
      
      llvm-svn: 244872
      becd5e8a
    • Philip Reames's avatar
      [RewriteStatepointsForGC] Avoid using unrelocated pointers after safepoints · 971dc3a8
      Philip Reames authored
      To be clear: this is an *optimization* not a correctness change.
      
      CodeGenPrep likes to duplicate icmps feeding branch instructions to take advantage of x86's ability to fuze many comparison/branch patterns into a single micro-op and to reduce the need for materializing i1s into general registers. PlaceSafepoints likes to place safepoint polls right at the end of basic blocks (immediately before terminators) when inserting entry and backedge safepoints. These two heuristics interact in a somewhat unfortunate way where the branch terminating the original block will be controlled by a condition driven by unrelocated pointers. This forces the register allocator to keep both the relocated and unrelocated values of the pointers feeding the icmp alive over the safepoint poll.
      
      One simple fix would have been to just adjust PlaceSafepoints to move one back in the basic block, but you can reach similar cases as a result of LICM or other hoisting passes. As a result, doing a post insertion fixup seems to be more robust.
      
      I considered doing this in CodeGenPrep itself, but having to update the live sets of already rewritten safepoints gets complicated fast. In particular, you can't just use def/use information because by moving the icmp, we're extending the live range of it's inputs potentially.
      
      Instead, this patch teaches RewriteStatepointsForGC to make the required adjustments before making the relocations explicit in the IR. This change really highlights the fact that RSForGC is a CodeGenPrep-like pass which is performing target specific lowering. In the long run, we may even want to combine the two though this would require a lot more smarts to be integrated into RSForGC first. We currently rely on being able to run a set of cleanup passes post rewriting because the IR RSForGC generates is pretty damn ugly.
      
      Differential Revision: http://reviews.llvm.org/D11819
      
      llvm-svn: 244821
      971dc3a8
  6. Aug 12, 2015
    • Philip Reames's avatar
      [RewriteStatepointsForGC] Handle extractelement fully in the base pointer algorithm · 9ac4e38a
      Philip Reames authored
      When rewriting the IR such that base pointers are available for every live pointer, we potentially need to duplicate instructions to propagate the base. The original code had only handled PHI and Select under the belief those were the only instructions which would need duplicated. When I added support for vector instructions, I'd added a collection of hacks for ExtractElement which caught most of the common cases. Of course, I then found the one test case my hacks couldn't cover. :)
      
      This change removes all of the early hacks for extract element. By defining extractelement as a BDV (rather than trying to look through it), we can extend the rewriting algorithm to duplicate the extract as needed.  Note that a couple of peephole optimizations were left in for the moment, because while we now handle extractelement as a first class citizen, we're not yet handling insertelement.  That change will follow in the near future.  
      
      llvm-svn: 244808
      9ac4e38a
    • Simon Pilgrim's avatar
      [InstCombine] Move SSE/AVX vector blend folding to instcombiner · 8c049d5c
      Simon Pilgrim authored
      As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely).
      
      InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask).
      
      I also moved all the relevant combine tests into InstCombine/blend_x86.ll
      
      Differential Revision: http://reviews.llvm.org/D11934
      
      llvm-svn: 244723
      8c049d5c
    • Adam Nemet's avatar
      [LoopDist] Add test for missing coverage · e2f6d34d
      Adam Nemet authored
      Add a testcase to ensure that if we can't find bounds for a necessary
      memcheck we don't distribute.
      
      llvm-svn: 244703
      e2f6d34d
  7. Aug 11, 2015
    • Sanjoy Das's avatar
      Fix PR24354. · 827529e7
      Sanjoy Das authored
      `InstCombiner::OptimizeOverflowCheck` was asserting an
      invariant (operands to binary operations are ordered by decreasing
      complexity) that wasn't really an invariant.  Fix this by instead having
      `InstCombiner::OptimizeOverflowCheck` establish the invariant if it does
      not hold.
      
      llvm-svn: 244676
      827529e7
    • Chen Li's avatar
      [LowerSwitch] Fix a bug when LowerSwitch deletes the default block · 10f01bd4
      Chen Li authored
      Summary: LowerSwitch crashed with the attached test case after deleting the default block. This happened because the current implementation of deleting dead blocks is wrong. After the default block being deleted, it contains no instruction or terminator, and it should no be traversed anymore. However, since the iterator is advanced before processSwitchInst() function is executed, the block advanced to could be deleted inside processSwitchInst(). The deleted block would then be visited next and crash dyn_cast<SwitchInst>(Cur->getTerminator()) because Cur->getTerminator() returns a nullptr. This patch fixes this problem by recording dead default blocks into a list, and delete them after all processSwitchInst() has been done. It still possible to visit dead default blocks and waste time process them. But it is a compile time issue, and I plan to have another patch to add support to skip dead blocks.
      
      Reviewers: kariddi, resistor, hans, reames
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D11852
      
      llvm-svn: 244642
      10f01bd4
    • Sanjay Patel's avatar
      fix typos; NFC · cdd5ec47
      Sanjay Patel authored
      llvm-svn: 244619
      cdd5ec47
    • Sanjay Patel's avatar
      fix minsize detection: minsize attribute implies optimizing for size · fec7965b
      Sanjay Patel authored
      llvm-svn: 244617
      fec7965b
    • Mehdi Amini's avatar
      Fix InstCombine test: invalid CHECK line slipped in r231270 · b10555cc
      Mehdi Amini authored
      I incorrectly wrote CHECK-NEXT with followin with ':', the check was
      ignored by FileCheck.
      The non-inbound GEP is folded here because the DataLayout is no longer
      optional, the fold was originally guarded with a comment that said:
          We need TD information to know the pointer size unless this is inbounds.
      Now we always have "TD information" and perform the fold.
      
      Thanks Jonathan Roelofs for noticing.
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 244613
      b10555cc
    • Sanjay Patel's avatar
      remove unnecessary settings/attributes from test case · b5c0c587
      Sanjay Patel authored
      llvm-svn: 244612
      b5c0c587
    • James Molloy's avatar
      Add support for floating-point minnum and maxnum · 134bec27
      James Molloy authored
      The select pattern recognition in ValueTracking (as used by InstCombine
      and SelectionDAGBuilder) only knew about integer patterns. This teaches
      it about minimum and maximum operations.
      
      matchSelectPattern() has been extended to return a struct containing the
      existing Flavor and a new enum defining the pattern's behavior when
      given one NaN operand.
      
      C minnum() is defined to return the non-NaN operand in this case, but
      the idiomatic C "a < b ? a : b" would return the NaN operand.
      
      ARM and AArch64 at least have different instructions for these different cases.
      
      llvm-svn: 244580
      134bec27
    • Tyler Nowicki's avatar
      Print vectorization analysis when loop hint is specified. · c94d6ad2
      Tyler Nowicki authored
      This patch and a relatec clang patch solve the problem of having to explicitly enable analysis when specifying a loop hint pragma to get the diagnostics. Passing AlwasyPrint as the pass name (see below) causes the front-end to print the diagnostic if the user has specified '-Rpass-analysis' without an '=<target-pass>’. Users of loop hints can pass that compiler option without having to specify the pass and they will get diagnostics for only those loops with loop hints.
      
      llvm-svn: 244555
      c94d6ad2
    • Sanjoy Das's avatar
      Address post-commit review from r243378. · 7742b8ba
      Sanjoy Das authored
      This checks that bork_directive occurs exactly twice in the test output.
      
      llvm-svn: 244543
      7742b8ba
    • Tyler Nowicki's avatar
      Extend late diagnostics to include late test for runtime pointer checks. · 652b0dab
      Tyler Nowicki authored
      This patch moves checking the threshold of runtime pointer checks to the vectorization requirements (late diagnostics) and emits a diagnostic that infroms the user the loop would be vectorized if not for exceeding the pointer-check threshold. Clang will also append the options that can be used to allow vectorization.
      
      llvm-svn: 244523
      652b0dab
  8. Aug 10, 2015
Loading