Skip to content
  1. Dec 30, 2017
  2. Dec 29, 2017
  3. Dec 28, 2017
    • Benjamin Kramer's avatar
      Remove superfluous copies in sample profiling. · 24cb28bb
      Benjamin Kramer authored
      No functionliaty change intended.
      
      llvm-svn: 321530
      24cb28bb
    • Guozhi Wei's avatar
      Revert r321377, it causes regression to https://reviews.llvm.org/P8055. · 29697c13
      Guozhi Wei authored
      llvm-svn: 321528
      29697c13
    • Benjamin Kramer's avatar
      Avoid int to string conversion in Twine or raw_ostream contexts. · 3a13ed60
      Benjamin Kramer authored
      Some output changes from uppercase hex to lowercase hex, no other functionality change intended.
      
      llvm-svn: 321526
      3a13ed60
    • Max Kazantsev's avatar
      [RewriteStatepoints] Fix incorrect assertion · a13e163a
      Max Kazantsev authored
      `RewriteStatepointsForGC` iterates over function blocks and their predecessors
      in order of declaration. One of outcomes of this is that callsites are placed in
      arbitrary order which has nothing to do with travelsar order.
      
      On the other hand, function `recomputeLiveInValues` asserts that bases are
      added to `Info.PointerToBase` before their deried pointers are updated. But
      if call sites are processed in order different from RPOT, this is not necessarily
      true. We cannot guarantee that the base was placed there before every
      pointer derived from it. All we can guarantee is that this base was marked as
      known base by this point.
      
      This patch replaces the fact that we assert from checking that the base was
      added to the map with assert that the base was marked as known base.
      
      Differential Revision: https://reviews.llvm.org/D41593
      
      llvm-svn: 321517
      a13e163a
    • Simon Pilgrim's avatar
      [InstCombine] Check for isa<Instruction> before using cast<> · 472689a1
      Simon Pilgrim authored
      Protects against casts from constexpr etc.
      
      Reduced from oss-fuzz #4788 test case
      
      llvm-svn: 321515
      472689a1
    • Reid Kleckner's avatar
      Revert "[memcpyopt] Teach memcpyopt to optimize across basic blocks" · 6d31001c
      Reid Kleckner authored
      This reverts r321138. It seems there are still underlying issues with
      memdep. PR35519 seems to still be present if debug info is enabled. We
      end up losing a memcpy. Somehow during store to memset merging, we
      insert the memset after the memcpy or fail to update the memdep analysis
      to account for the newly inserted memset of a pair.
      
      Reduced test case:
      
        #include <assert.h>
        #include <stdio.h>
        #include <string>
        #include <utility>
        #include <vector>
      
        void do_push_back(
            std::vector<std::pair<std::string, std::vector<std::string>>>* crls) {
          crls->push_back(std::make_pair(std::string(), std::vector<std::string>()));
        }
      
        int __attribute__((optnone)) main() {
          // Put some data in the vector and then remove it so we take the push_back
          // fast path.
          std::vector<std::pair<std::string, std::vector<std::string>>> crl_set;
          crl_set.push_back({"asdf", {}});
          crl_set.pop_back();
          printf("first word in vector storage: %p\n", *(void**)crl_set.data());
      
          // Do the push_back which may fail to initialize the data.
          do_push_back(&crl_set);
          auto* first = &crl_set.back().first;
          printf("first word in vector storage (should be zero): %p\n",
                 *(void**)crl_set.data());
          assert(first->empty());
          puts("ok");
        }
      
      Compile with libc++, enable optimizations, and enable debug info:
      $ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib
      
      This program will assert with this change.
      
      llvm-svn: 321510
      6d31001c
  4. Dec 27, 2017
  5. Dec 26, 2017
  6. Dec 24, 2017
  7. Dec 23, 2017
    • Florian Hahn's avatar
      [CallSiteSplitting] Remove isOrHeader restriction. · 7e932890
      Florian Hahn authored
      By following the single predecessors of the predecessors of the call
      site, we do not need to restrict the control flow.
      
      Reviewed By: junbuml, davide
      
      Differential Revision: https://reviews.llvm.org/D40729
      
      llvm-svn: 321413
      7e932890
    • Davide Italiano's avatar
      [SCCP] Manually fold branches on undef. · 55b66343
      Davide Italiano authored
      This code was originally removed and replace with an assertion
      because believed unnecessary. It turns out there was simply
      no test coverage for this case, and the constant folder doesn't
      yet know about patterns like `br undef %label1, %label2`.
      Presumably at some point the constant folder might learn about
      these patterns, but it's a broader change.
      A testcase will be added to make sure this doesn't regress again
      in the future.
      
      Fixes PR35723.
      
      llvm-svn: 321402
      55b66343
  8. Dec 22, 2017
    • Guozhi Wei's avatar
      [SimplifyCFG] Don't do if-conversion if there is a long dependence chain · 33250340
      Guozhi Wei authored
      If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor.
      
      This patch checks for the long dependence chain, and give up if-conversion if find one.
      
      Differential Revision: https://reviews.llvm.org/D39352
      
      llvm-svn: 321377
      33250340
    • Easwaran Raman's avatar
      Add hasProfileData() to check if a function has profile data. NFC. · a17f2205
      Easwaran Raman authored
      Summary:
      This replaces calls to getEntryCount().hasValue() with hasProfileData
      that does the same thing. This refactoring is useful to do before adding
      synthetic function entry counts but also a useful cleanup IMO even
      otherwise. I have used hasProfileData instead of hasRealProfileData as
      David had earlier suggested since I think profile implies "real" and I
      use the phrase "synthetic entry count" and not "synthetic profile count"
      but I am fine calling it hasRealProfileData if you prefer.
      
      Reviewers: davidxl, silvas
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D41461
      
      llvm-svn: 321331
      a17f2205
  9. Dec 21, 2017
    • Michael Zolotukhin's avatar
      [SimplifyCFG] Avoid quadratic on a predecessors number behavior in instruction sinking. · ad371e0c
      Michael Zolotukhin authored
      If a block has N predecessors, then the current algorithm will try to
      sink common code to this block N times (whenever we visit a
      predecessor). Every attempt to sink the common code includes going
      through all predecessors, so the complexity of the algorithm becomes
      O(N^2).
      With this patch we try to sink common code only when we visit the block
      itself. With this, the complexity goes down to O(N).
      As a side effect, the moment the code is sunk is slightly different than
      before (the order of simplifications has been changed), that's why I had
      to adjust two tests (note that neither of the tests is supposed to test
      SimplifyCFG):
      * test/CodeGen/AArch64/arm64-jumptable.ll - changes in this test mimic
      the changes that previous implementation of SimplifyCFG would do.
      * test/CodeGen/ARM/avoid-cpsr-rmw.ll - in this test I disabled common
      code sinking by a command line flag.
      
      llvm-svn: 321236
      ad371e0c
  10. Dec 20, 2017
    • Matthew Simpson's avatar
      [ICP] Expose unconditional call promotion interface · cb35c5d5
      Matthew Simpson authored
      This patch modifies the indirect call promotion utilities by exposing and using
      an unconditional call promotion interface. The unconditional promotion
      interface (i.e., call promotion without creating an if-then-else) can be used
      if it's known that an indirect call has only one possible callee. The existing
      conditional promotion interface uses this unconditional interface to promote an
      indirect call after it has been versioned and placed within the "then" block.
      
      A consequence of unconditional promotion is that the fix-up operations for phi
      nodes in the normal destination of invoke instructions are changed. This is
      necessary because the existing implementation assumed that an invoke had been
      versioned, creating a "merge" block where a return value bitcast could be
      placed. In the new implementation, the edge between a promoted invoke's parent
      block and its normal destination is split if needed to add a bitcast for the
      return value. If the invoke is also versioned, the phi node merging the return
      value of the promoted and original invoke instructions is placed in the "merge"
      block.
      
      Differential Revision: https://reviews.llvm.org/D40751
      
      llvm-svn: 321210
      cb35c5d5
    • Evgeniy Stepanov's avatar
      [hwasan] Implement -fsanitize-recover=hwaddress. · 3fd1b1a7
      Evgeniy Stepanov authored
      Summary: Very similar to AddressSanitizer, with the exception of the error type encoding.
      
      Reviewers: kcc, alekseyshl
      
      Subscribers: cfe-commits, kubamracek, llvm-commits, hiraditya
      
      Differential Revision: https://reviews.llvm.org/D41417
      
      llvm-svn: 321203
      3fd1b1a7
    • Florian Hahn's avatar
      [InstCombine] Add debug location to new caller. · 012c8f97
      Florian Hahn authored
      Reviewers: rnk, aprantl, majnemer
      
      Reviewed By: aprantl
      
      Differential Revision: https://reviews.llvm.org/D414
      
      llvm-svn: 321191
      012c8f97
    • Mohammad Shahid's avatar
      Revert r320548:[SLP] Vectorize jumbled memory loads · 3a934d6a
      Mohammad Shahid authored
      llvm-svn: 321181
      3a934d6a
    • Florian Hahn's avatar
      [LV] Remove unnecessary DoExtraAnalysis guard (silent bug) · 467abe3e
      Florian Hahn authored
      canVectorize is only checking if the loop has a normalized pre-header if DoExtraAnalysis is true.
      This doesn't make sense to me because reporting analysis information shouldn't alter legality
      checks. This is probably the result of a last minute minor change before committing (?).
      
      Patch by Diego Caballero.
      
      Reviewed By: fhahn
      
      Differential Revision: https://reviews.llvm.org/D40973
      
      llvm-svn: 321172
      467abe3e
    • Dan Gohman's avatar
      [memcpyopt] Teach memcpyopt to optimize across basic blocks · aa392281
      Dan Gohman authored
      This teaches memcpyopt to make a non-local memdep query when a local query
      indicates that the dependency is non-local. This notably allows it to
      eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
      
      This is r319482 and r319483, along with fixes for PR35519: fix the 
      optimization that merges stores into memsets to preserve cached memdep
      info, and fix memdep's non-local caching strategy to not assume that larger
      queries are always more conservative than smaller ones.
      
      Fixes PR28958 and PR35519.
      
      Differential Revision: https://reviews.llvm.org/D40802
      
      llvm-svn: 321138
      aa392281
  11. Dec 19, 2017
  12. Dec 18, 2017
  13. Dec 16, 2017
    • Sean Fertile's avatar
      [Memcpy Loop Lowering] Only calculate residual size/bytes copied when needed. · 68d7f9da
      Sean Fertile authored
      If the loop operand type is int8 then there will be no residual loop for the
      unknown size expansion. Dont create the residual-size and bytes-copied values
      when they are not needed.
      
      llvm-svn: 320929
      68d7f9da
    • Sanjay Patel's avatar
      [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel · 5a0cdac1
      Sanjay Patel authored
      We want to do this for 2 reasons:
      1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
      2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.
      
      More detail about what happens in the backend:
      1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs 
         into the shift variant. That is the opposite of this IR canonicalization.
      2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs 
         into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
      3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
         into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node 
         when that's legal/custom.
      4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a 
         variety of ways.
         a. For #2, the vector path is missing the case for setlt with a '1' constant.
         b. For #3, we are missing a match for commuted versions of the shift variants.
      5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel 
         produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the 
         shift sequence when not.
      6. In the following examples with this patch applied, we may get conditional moves rather than the shift 
         produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific 
         decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.
      
      define i32 @abs_shifty(i32 %x) {
        %signbit = ashr i32 %x, 31 
        %add = add i32 %signbit, %x  
        %abs = xor i32 %signbit, %add 
        ret i32 %abs
      }
      
      define i32 @abs_cmpsubsel(i32 %x) {
        %cmp = icmp slt i32 %x, zeroinitializer
        %sub = sub i32 zeroinitializer, %x
        %abs = select i1 %cmp, i32 %sub, i32 %x
        ret i32 %abs
      }
      
      define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
        %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> 
        %add = add <4 x i32> %signbit, %x  
        %abs = xor <4 x i32> %signbit, %add 
        ret <4 x i32> %abs
      }
      
      define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
        %cmp = icmp slt <4 x i32> %x, zeroinitializer
        %sub = sub <4 x i32> zeroinitializer, %x
        %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
        ret <4 x i32> %abs
      }
      
      > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx 
      > abs_shifty:
      > 	movl	%edi, %eax
      > 	negl	%eax
      > 	cmovll	%edi, %eax
      > 	retq
      > 
      > abs_cmpsubsel:
      > 	movl	%edi, %eax
      > 	negl	%eax
      > 	cmovll	%edi, %eax
      > 	retq
      > 
      > abs_shifty_vec:
      > 	vpabsd	%xmm0, %xmm0
      > 	retq
      > 
      > abs_cmpsubsel_vec:
      > 	vpabsd	%xmm0, %xmm0
      > 	retq
      > 
      > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
      > abs_shifty:
      > 	cmp	w0, #0                  // =0
      > 	cneg	w0, w0, mi
      > 	ret
      > 
      > abs_cmpsubsel: 
      > 	cmp	w0, #0                  // =0
      > 	cneg	w0, w0, mi
      > 	ret
      >                                        
      > abs_shifty_vec: 
      > 	abs	v0.4s, v0.4s
      > 	ret
      > 
      > abs_cmpsubsel_vec: 
      > 	abs	v0.4s, v0.4s
      > 	ret
      > 
      > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le 
      > abs_shifty:  
      > 	srawi 4, 3, 31
      > 	add 3, 3, 4
      > 	xor 3, 3, 4
      > 	blr
      > 
      > abs_cmpsubsel:
      > 	srawi 4, 3, 31
      > 	add 3, 3, 4
      > 	xor 3, 3, 4
      > 	blr
      > 
      > abs_shifty_vec:   
      > 	vspltisw 3, -16
      > 	vspltisw 4, 15
      > 	vsubuwm 3, 4, 3
      > 	vsraw 3, 2, 3
      > 	vadduwm 2, 2, 3
      > 	xxlxor 34, 34, 35
      > 	blr
      > 
      > abs_cmpsubsel_vec: 
      > 	vspltisw 3, -16
      > 	vspltisw 4, 15
      > 	vsubuwm 3, 4, 3
      > 	vsraw 3, 2, 3
      > 	vadduwm 2, 2, 3
      > 	xxlxor 34, 34, 35
      > 	blr
      >
      
      Differential Revision: https://reviews.llvm.org/D40984
      
      llvm-svn: 320921
      5a0cdac1
    • Hal Finkel's avatar
      [LV] Extend InstWidening with CM_Widen_Recursive · 5444f409
      Hal Finkel authored
      Changes to the original scalar loop during LV code gen cause the return value
      of Legal->isConsecutivePtr() to be inconsistent with the return value during
      legal/cost phases (further analysis and information of the bug is in D39346).
      This patch is an alternative fix to PR34965 following the CM_Widen approach
      proposed by Ayal and Gil in D39346. It extends InstWidening enum with
      CM_Widen_Reverse to properly record the widening decision for consecutive
      reverse memory accesses and, consequently, get rid of the
      Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner
      solution to PR34965 than the one in D39346.
      
      Fixes PR34965.
      
      Patch by Diego Caballero, thanks!
      
      Differential Revision: https://reviews.llvm.org/D40742
      
      llvm-svn: 320913
      5444f409
    • Hal Finkel's avatar
      [SimplifyLibCalls] Inline calls to cabs when it's safe to do so · 2ff24731
      Hal Finkel authored
      When unsafe algerbra is allowed calls to cabs(r) can be replaced by:
      
        sqrt(creal(r)*creal(r) + cimag(r)*cimag(r))
      
      Patch by Paul Walker, thanks!
      
      Differential Revision: https://reviews.llvm.org/D40069
      
      llvm-svn: 320901
      2ff24731
Loading