Skip to content
  1. Jan 30, 2017
  2. Jan 29, 2017
  3. Jan 28, 2017
    • Mohammad Shahid's avatar
      [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way. · 3121334d
      Mohammad Shahid authored
      The jumbled scalar loads will be sorted while building the tree and these accesses will be marked to generate shufflevector after the vectorized load with proper mask.
      
      Reviewers: hfinkel, mssimpso, mkuper
      
      Differential Revision: https://reviews.llvm.org/D26905
      
      Change-Id: I9c0c8e6f91a00076a7ee1465440a3f6ae092f7ad
      llvm-svn: 293386
      3121334d
    • Taewook Oh's avatar
      [InstCombine] Merge DebugLoc when speculatively hoisting store instruction · 505a25ae
      Taewook Oh authored
      Summary: Along with https://reviews.llvm.org/D27804, debug locations need to be merged when hoisting store instructions as well. Not sure if just dropping debug locations would make more sense for this case, but as the branch instruction will have at least different discriminator with the hoisted store instruction, I think there will be no difference in practice.
      
      Reviewers: aprantl, andreadb, danielcdh
      
      Reviewed By: aprantl
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D29062
      
      llvm-svn: 293372
      505a25ae
    • Matthias Braun's avatar
      Use print() instead of dump() in code · 194ded55
      Matthias Braun authored
      llvm-svn: 293371
      194ded55
    • Daniel Berlin's avatar
      MemorySSA: Allow movement to arbitrary places · ee6e3a59
      Daniel Berlin authored
      Summary: Extend the MemorySSAUpdater API to allow movement to arbitrary places
      
      Reviewers: davide, george.burgess.iv
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D29239
      
      llvm-svn: 293363
      ee6e3a59
    • Daniel Berlin's avatar
    • Matthias Braun's avatar
      Cleanup dump() functions. · 8c209aa8
      Matthias Braun authored
      We had various variants of defining dump() functions in LLVM. Normalize
      them (this should just consistently implement the things discussed in
      http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html
      
      For reference:
      - Public headers should just declare the dump() method but not use
        LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
      - The definition of a dump method should look like this:
        #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
        LLVM_DUMP_METHOD void MyClass::dump() {
          // print stuff to dbgs()...
        }
        #endif
      
      llvm-svn: 293359
      8c209aa8
    • Daniel Berlin's avatar
      MemorySSA: Move updater to its own file · ae6b8b69
      Daniel Berlin authored
      llvm-svn: 293357
      ae6b8b69
    • Daniel Berlin's avatar
      Introduce a basic MemorySSA updater, that supports insertDef, · 60ead05f
      Daniel Berlin authored
      insertUse, moveBefore and moveAfter operations.
      
      Summary:
      This creates a basic MemorySSA updater that handles arbitrary
      insertion of uses and defs into MemorySSA, as well as arbitrary
      movement around the CFG. It replaces the current splice API.
      
      It can be made to handle arbitrary control flow changes.
      Currently, it uses the same updater algorithm from D28934.
      
      The main difference is because MemorySSA is single variable, we have
      the complete def and use list, and don't need anyone to give it to us
      as part of the API.  We also have to rename stores below us in some
      cases.
      
      If we go that direction in that patch, i will merge all the updater
      implementations (using an updater_traits or something to provide the
      get* functions we use, called read*/write* in that patch).
      
      Sadly, the current SSAUpdater algorithm is way too slow to use for
      what we are doing here.
      
      I have updated the tests we have to basically build memoryssa
      incrementally using the updater api, and make sure it still comes out
      the same.
      
      Reviewers: george.burgess.iv
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D29047
      
      llvm-svn: 293356
      60ead05f
    • Quentin Colombet's avatar
      [RegisterCoalescing] Recommit the patch "Remove partial redundent copy". · 35109902
      Quentin Colombet authored
      In r292621, the recommit fixes a bug related with live interval update
      after the partial redundent copy is moved.
      
      This recommit solves an additional bug related to the lack of update of
      subranges.
      
      The original patch is to solve the performance problem described in
      PR27827. Register coalescing sometimes cannot remove a copy because of
      interference. But if we can find a reverse copy in one of the predecessor
      block of the copy, the copy is partially redundent and we may remove the
      copy partially by moving it to the predecessor block without the
      reverse copy.
      
      Differential Revision: https://reviews.llvm.org/D28585
      
      Re-apply r292621
      
      Revert "Revert rL292621. Caused some internal build bot failures in apple."
      
      This reverts commit r292984.
      
      Original patch: Wei Mi <wmi@google.com>
      Subrange fix: Mostly Matthias Braun <matze@braunis.de>
      
      llvm-svn: 293353
      35109902
    • Sanjay Patel's avatar
      [InstCombine] move icmp transforms that might be recognized as min/max and inf-loop (PR31751) · febcb9ce
      Sanjay Patel authored
      This is a minimal patch to avoid the infinite loop in:
      https://llvm.org/bugs/show_bug.cgi?id=31751
      
      But the general problem is bigger: we're not canonicalizing all of the min/max forms reported
      by value tracking's matchSelectPattern(), and we don't define min/max consistently. Some code
      uses matchSelectPattern(), other code uses matchers like m_Umax, and others have their own
      inline definitions which may be subtly different from any of the above.
      
      The reason that the test cases in this patch need a cast op to trigger is because we don't
      (yet) canonicalize all min/max forms based on matchSelectPattern() in 
      canonicalizeMinMaxWithConstant(), but we do make min/max+cast transforms based on 
      matchSelectPattern() in visitSelectInst().
      
      The location of the icmp transforms that trigger the inf-loop seems arbitrary at best, so
      I'm moving those behind the min/max fence in visitICmpInst() as the quick fix.
      
      llvm-svn: 293345
      febcb9ce
  4. Jan 27, 2017
    • Mehdi Amini's avatar
      Global DCE performance improvement · 888dee44
      Mehdi Amini authored
      Change the original algorithm so that it scales better when meeting
      very large bitcode where every instruction does not implies a global.
      
      The target query is "how to you get all the globals referenced by
      another global"?
      
      Before this patch, it was doing this by walking the body (or the
      initializer) and collecting the references. What this patch is doing,
      it precomputing the answer to this query for the whole module by
      walking the use-list of every global instead.
      
      Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
      
      Differential Revision: https://reviews.llvm.org/D28549
      
      llvm-svn: 293328
      888dee44
    • Xinliang David Li's avatar
      [PGO] add debug option to view raw count after prof use annotation · d289e454
      Xinliang David Li authored
      Differential Revision: https://reviews.llvm.org/D29045
      
      llvm-svn: 293325
      d289e454
    • Anna Thomas's avatar
      NFC: Add debug tracing for more cases where loop unrolling fails. · e7d865e3
      Anna Thomas authored
      llvm-svn: 293313
      e7d865e3
    • Alexey Bataev's avatar
      [SLP] Refactoring of horizontal reduction analysis, NFC. · 4015bf83
      Alexey Bataev authored
      Some checks in SLP horizontal reduction analysis function are performed
      several times, though it is enough to perform these checks only once
      during an initial attempt at adding candidate for the reduction
      instruction/reduced value.
      
      Differential Revision: https://reviews.llvm.org/D29175
      
      llvm-svn: 293274
      4015bf83
    • Chandler Carruth's avatar
      [LICM] When we are recomputing the alias sets for a subloop, we cannot · fd2d7c72
      Chandler Carruth authored
      skip sub-subloops.
      
      The logic to skip subloops dated from when this code was shared with the
      cached case. Once it was factored out to only run in the case of
      recomputed subloops it became a dangerous bug. If a subsubloop contained
      an interfering instruction it would be silently skipped from the alias
      sets for LICM.
      
      With the old pass manager this was extremely hard to trigger as it would
      require failing to visit these subloops with the LICM pass but then
      visiting the outer loop somehow. I've not yet contrived any test case
      that actually manages to trigger this.
      
      But with the new pass manager we don't do the cross-loop caching hack
      that the old PM does and so we recompute alias set information from
      first principles. While this seems much cleaner and simpler it exposed
      this bug and would subtly miscompile code due to failing to correctly
      model the aliasing constraints of deeply nested loops.
      
      llvm-svn: 293273
      fd2d7c72
    • Richard Trieu's avatar
      Fix unused variable warning. · 0b79aa33
      Richard Trieu authored
      llvm-svn: 293260
      0b79aa33
    • Daniel Berlin's avatar
      NewGVN: Add basic dead and redundant store elimination · c479686a
      Daniel Berlin authored
      Summary:
      This adds basic dead and redundant store elimination to
      NewGVN.  Unlike our current DSE, it will happily do cross-block DSE if
      it meets our requirements.
      
      We get a bunch of DSE's simple.ll cases, and some stuff it doesn't.
      Unlike DSE, however, we only try to eliminate stores of the same value
      to the same memory location, not just general stores to the same
      memory location.
      
      Reviewers: davide
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D29149
      
      llvm-svn: 293258
      c479686a
    • Justin Lebar's avatar
      [NVPTX] [InstCombine] Add llvm_unreachable to appease MSVC. · 25ebe2d7
      Justin Lebar authored
      llvm-svn: 293253
      25ebe2d7
    • Justin Lebar's avatar
      [NVPTX] Fix use-after-stack-free bug in InstCombineCalls. · e3ac0fb9
      Justin Lebar authored
      Introduced in r293244.
      
      llvm-svn: 293251
      e3ac0fb9
    • Xin Tong's avatar
      Constant fold switch inst when looking for trivial conditions to unswitch on. · e5f8d643
      Xin Tong authored
      Summary: Constant fold switch inst when looking for trivial conditions to unswitch on.
      
      Reviewers: sanjoy, chenli, hfinkel, efriedma
      
      Subscribers: llvm-commits, mzolotukhin
      
      Differential Revision: https://reviews.llvm.org/D29037
      
      llvm-svn: 293250
      e5f8d643
    • Chandler Carruth's avatar
      [PM] Port LoopLoadElimination to the new pass manager and wire it into · baabda93
      Chandler Carruth authored
      the main pipeline.
      
      This is a very straight forward port. Nothing weird or surprising.
      
      This brings the number of missing passes from the new PM's pipeline down
      to three.
      
      llvm-svn: 293249
      baabda93
    • Justin Lebar's avatar
      [NVPTX] Upgrade NVVM intrinsics in InstCombineCalls. · 698c31b8
      Justin Lebar authored
      Summary:
      There are many NVVM intrinsics that we can't entirely get rid of, but
      that nonetheless often correspond to target-generic LLVM intrinsics.
      
      For example, if flush denormals to zero (ftz) is enabled, we can convert
      @llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32.  On the other hand, if ftz is
      disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a
      non-ftz PTX instruction.  In this case, we can, however, simplify the
      non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32.
      
      These transformations are particularly useful because they let us
      constant fold instructions that appear in libdevice, the bitcode library
      that ships with CUDA and essentially functions as its libm.
      
      Reviewers: tra
      
      Subscribers: hfinkel, majnemer, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D28794
      
      llvm-svn: 293244
      698c31b8
    • Justin Lebar's avatar
      [LangRef] Make @llvm.sqrt(x) return undef, rather than have UB, for negative x. · cb9b41dd
      Justin Lebar authored
      Summary:
      Some frontends emit a speculate-and-select idiom for sqrt, wherein they compute
      sqrt(x), check if x is negative, and select NaN if it is:
      
        %cmp = fcmp olt double %a, -0.000000e+00
        %sqrt = call double @llvm.sqrt.f64(double %a)
        %ret = select i1 %cmp, double 0x7FF8000000000000, double %sqrt
      
      This is technically UB as the LangRef is written today if %a is ever less than
      -0.  But emitting code that's compliant with the current definition of sqrt
      would require a branch, which would then prevent us from matching this idiom in
      SelectionDAG (which we do today -- ISD::FSQRT has defined behavior on negative
      inputs), because SelectionDAG looks at one BB at a time.
      
      Nothing in LLVM takes advantage of this undefined behavior, as far as we can
      tell, and the fact that llvm.sqrt has UB dates from its initial addition to the
      LangRef.
      
      Reviewers: arsenm, mehdi_amini, hfinkel
      
      Subscribers: wdng, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D28797
      
      llvm-svn: 293242
      cb9b41dd
    • Sanjoy Das's avatar
      Revert a couple of InstCombine/Guard checkins · 7516192a
      Sanjoy Das authored
      This change reverts:
      
      r293061: "[InstCombine] Canonicalize guards for NOT OR condition"
      r293058: "[InstCombine] Canonicalize guards for AND condition"
      
      They miscompile cases like:
      
      ```
      declare void @llvm.experimental.guard(i1, ...)
      
      define void @test_guard_not_or(i1 %A, i1 %B) {
        %C = or i1 %A, %B
        %D = xor i1 %C, true
        call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ]
        ret void
      }
      ```
      
      because they do transfer the `i32 20, i32 30` parameters to newly
      created guard instructions.
      
      llvm-svn: 293227
      7516192a
  5. Jan 26, 2017
    • Daniel Berlin's avatar
      NewGVN: Fix bug exposed by PR31761 · 1ea5f324
      Daniel Berlin authored
      Summary:
      This does not actually fix the testcase in PR31761 (discussion is
      ongoing on the testcase), but does fix a bug it exposes, where stores
      were not properly clobbering loads.
      
      We accomplish this by unifying the memory equivalence infratructure
      back into the normal congruence infrastructure, and then properly
      destroying congruence classes when memory state leaders disappear.
      
      Reviewers: davide
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D29195
      
      llvm-svn: 293216
      1ea5f324
    • Sanjay Patel's avatar
      [InstCombine] fold (X >>u C) << C --> X & (-1 << C) · 50753f02
      Sanjay Patel authored
      We already have this fold when the lshr has one use, but it doesn't need that
      restriction. We may be able to remove some code from foldShiftedShift().
      
      Also, move the similar:
      (X << C) >>u C --> X & (-1 >>u C)
      ...directly into visitLShr to help clean up foldShiftByConstOfShiftByConst().
      
      That whole function seems questionable since it is called by commonShiftTransforms(),
      but there's really not much in common if we're checking the shift opcodes for every
      fold.
      
      llvm-svn: 293215
      50753f02
    • Daniel Berlin's avatar
      NewGVN: Add algorithm overview · db3c7be0
      Daniel Berlin authored
      llvm-svn: 293212
      db3c7be0
Loading