Skip to content
  1. Mar 08, 2016
  2. Mar 07, 2016
    • Adam Nemet's avatar
      [LoopDataPrefetch] If prefetch distance is not set, skip pass · bb3680bd
      Adam Nemet authored
      This lets select sub-targets enable this pass.  The patch implements the
      idea from the recent llvm-dev thread:
      http://thread.gmane.org/gmane.comp.compilers.llvm.devel/94925
      
      The goal is to enable the LoopDataPrefetch pass for the Cyclone
      sub-target only within Aarch64.
      
      Positive and negative tests will be included in an upcoming patch that
      enables selective prefetching of large-strided accesses on Cyclone.
      
      llvm-svn: 262844
      bb3680bd
    • Adam Nemet's avatar
      Revert "Enable LoopLoadElimination by default" · 81113ef6
      Adam Nemet authored
      This reverts commit r262250.
      
      It causes SPEC2006/gcc to generate wrong result (166.s) in AArch64 when
      running with *ref* data set.  The error happens with
      "-Ofast -flto -fuse-ld=gold" or "-O3 -fno-strict-aliasing".
      
      llvm-svn: 262839
      81113ef6
    • Chandler Carruth's avatar
      [DFSan] Remove an overly aggressive assert reported in PR26068. · 9ca96384
      Chandler Carruth authored
      This code has been successfully used to bootstrap libc++ in a no-asserts
      mode for a very long time, so the code that follows cannot be completely
      incorrect. I've added a test that shows the current behavior for this
      kind of code with DFSan. If it is desirable for DFSan to do something
      special when processing an invoke of a variadic function, it can be
      added, but we shouldn't keep an assert that we've been ignoring due to
      release builds anyways.
      
      llvm-svn: 262829
      9ca96384
  3. Mar 04, 2016
  4. Mar 03, 2016
    • Sanjay Patel's avatar
      [InstCombine] transform bitcasted bitwise logic ops with constants (PR26702) · 9bba7508
      Sanjay Patel authored
      Given that we're not actually reducing the instruction count in the included
      regression tests, I think we would call this a canonicalization step.
      
      The motivation comes from the example in PR26702:
      https://llvm.org/bugs/show_bug.cgi?id=26702
      
      If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable
      example of:
      
      define <4 x i32> @is_negative(<4 x i32> %x) {
        %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
        %not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
        %bc = bitcast <4 x i32> %not to <2 x i64>
        %notnot = xor <2 x i64> %bc, <i64 -1, i64 -1>
        %bc2 = bitcast <2 x i64> %notnot to <4 x i32>
        ret <4 x i32> %bc2
      }
      
      Simplifies to the expected:
      
      define <4 x i32> @is_negative(<4 x i32> %x) {
        %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
        ret <4 x i32> %lobit
      }
      
      Differential Revision: http://reviews.llvm.org/D17583
      
      llvm-svn: 262645
      9bba7508
    • Easwaran Raman's avatar
      Infrastructure for PGO enhancements in inliner · 3035719c
      Easwaran Raman authored
      This patch provides the following infrastructure for PGO enhancements in inliner:
      
      Enable the use of block level profile information in inliner
      Incremental update of block frequency information during inlining
      Update the function entry counts of callees when they get inlined into callers.
      
      Differential Revision: http://reviews.llvm.org/D16381
      
      llvm-svn: 262636
      3035719c
    • Dehao Chen's avatar
      Use LineLocation instead of CallsiteLocation to index callsite profile. · 57d1dda5
      Dehao Chen authored
      Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples).
      
      Reviewers: davidxl, dnovillo
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D17827
      
      llvm-svn: 262634
      57d1dda5
    • Matthew Simpson's avatar
      [LoopUtils, LV] Fix PR26734 · b840a6d6
      Matthew Simpson authored
      The vectorization of first-order recurrences (r261346) caused PR26734. When
      detecting these recurrences, we need to ensure that the previous value is
      actually defined inside the loop. This patch includes the fix and test case.
      
      llvm-svn: 262624
      b840a6d6
  5. Mar 02, 2016
    • Amaury Sechet's avatar
      Explode store of arrays in instcombine · 3b8b2ea2
      Amaury Sechet authored
      Summary: This is the last step toward supporting aggregate memory access in instcombine. This explodes stores of arrays into a serie of stores for each element, allowing them to be optimized.
      
      Reviewers: joker.eph, reames, hfinkel, majnemer, mgrang
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D17828
      
      llvm-svn: 262530
      3b8b2ea2
    • Amaury Sechet's avatar
      Unpack array of all sizes in InstCombine · 7cd3fe7d
      Amaury Sechet authored
      Summary: This is another step toward improving fca support. This unpack load of array in a series of load to array's elements.
      
      Reviewers: chandlerc, joker.eph, majnemer, reames, hfinkel
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D15890
      
      llvm-svn: 262521
      7cd3fe7d
    • Daniel Berlin's avatar
      Really fix ASAN leak/etc issues with MemorySSA unittests · 6412002d
      Daniel Berlin authored
      llvm-svn: 262519
      6412002d
    • Daniel Berlin's avatar
      Revert "Fix ASAN detected errors in code and test" (it was not meant to be committed yet) · 989e601b
      Daniel Berlin authored
      This reverts commit 890bbccd600ba1eb050353d06a29650ad0f2eb95.
      
      llvm-svn: 262512
      989e601b
    • Daniel Berlin's avatar
      Fix ASAN detected errors in code and test · 27ed1c2e
      Daniel Berlin authored
      llvm-svn: 262511
      27ed1c2e
    • Chandler Carruth's avatar
      [AA] Hoist the logic to reformulate various AA queries in terms of other · 12884f7f
      Chandler Carruth authored
      parts of the AA interface out of the base class of every single AA
      result object.
      
      Because this logic reformulates the query in terms of some other aspect
      of the API, it would easily cause O(n^2) query patterns in alias
      analysis. These could in turn be magnified further based on the number
      of call arguments, and then further based on the number of AA queries
      made for a particular call. This ended up causing problems for Rust that
      were actually noticable enough to get a bug (PR26564) and probably other
      places as well.
      
      When originally re-working the AA infrastructure, the desire was to
      regularize the pattern of refinement without losing any generality.
      While I think it was successful, that is clearly proving to be too
      costly. And the cost is needless: we gain no actual improvement for this
      generality of making a direct query to tbaa actually be able to
      re-use some other alias analysis's refinement logic for one of the other
      APIs, or some such. In short, this is entirely wasted work.
      
      To the extent possible, delegation to other API surfaces should be done
      at the aggregation layer so that we can avoid re-walking the
      aggregation. In fact, this significantly simplifies the logic as we no
      longer need to smuggle the aggregation layer into each alias analysis
      (or the TargetLibraryInfo into each alias analysis just so we can form
      argument memory locations!).
      
      However, we also have some delegation logic inside of BasicAA and some
      of it even makes sense. When the delegation logic is baking in specific
      knowledge of aliasing properties of the LLVM IR, as opposed to simply
      reformulating the query to utilize a different alias analysis interface
      entry point, it makes a lot of sense to restrict that logic to
      a different layer such as BasicAA. So one aspect of the delegation that
      was in every AA base class is that when we don't have operand bundles,
      we re-use function AA results as a fallback for callsite alias results.
      This relies on the IR properties of calls and functions w.r.t. aliasing,
      and so seems a better fit to BasicAA. I've lifted the logic up to that
      point where it seems to be a natural fit. This still does a bit of
      redundant work (we query function attributes twice, once via the
      callsite and once via the function AA query) but it is *exactly* twice
      here, no more.
      
      The end result is that all of the delegation logic is hoisted out of the
      base class and into either the aggregation layer when it is a pure
      retargeting to a different API surface, or into BasicAA when it relies
      on the IR's aliasing properties. This should fix the quadratic query
      pattern reported in PR26564, although I don't have a stand-alone test
      case to reproduce it.
      
      It also seems general goodness. Now the numerous AAs that don't need
      target library info don't carry it around and depend on it. I think
      I can even rip out the general access to the aggregation layer and only
      expose that in BasicAA as it is the only place where we re-query in that
      manner.
      
      However, this is a non-trivial change to the AA infrastructure so I want
      to get some additional eyes on this before it lands. Sadly, it can't
      wait long because we should really cherry pick this into 3.8 if we're
      going to go this route.
      
      Differential Revision: http://reviews.llvm.org/D17329
      
      llvm-svn: 262490
      12884f7f
    • George Burgess IV's avatar
      Attempt to fix ASAN failure in a MemorySSA test. · e0e6e48b
      George Burgess IV authored
      llvm-svn: 262452
      e0e6e48b
    • Sanjay Patel's avatar
      revert r262424 because there's a *clang test* for AArch64 that checks -O3 asm output · 5e4c46de
      Sanjay Patel authored
      that is broken by this change
      
      llvm-svn: 262440
      5e4c46de
    • Sanjay Patel's avatar
      [InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to shifts (PR26701) · 147e9279
      Sanjay Patel authored
      As noted in the code comment, I don't think we can do the same transform that we do for
      *scalar* integers comparisons to *vector* integers comparisons because it might pessimize
      the general case. 
      
      Exhibit A for an incomplete integer comparison ISA remains x86 SSE/AVX: it only has EQ and GT
      for integer vectors.
      
      But we should now recognize all the variants of this construct and produce the optimal code
      for the cases shown in:
      https://llvm.org/bugs/show_bug.cgi?id=26701
       
      
      llvm-svn: 262424
      147e9279
  6. Mar 01, 2016
  7. Feb 29, 2016
    • Adam Nemet's avatar
      [LLE] Fix SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper with Polly · 83be06e5
      Adam Nemet authored
      We can actually have dependences between accesses with different
      underlying types.  Bail in this case.
      
      A test will follow shortly.
      
      llvm-svn: 262267
      83be06e5
    • Adam Nemet's avatar
      Enable LoopLoadElimination by default · dd9e637a
      Adam Nemet authored
      Summary:
      I re-benchmarked this and results are similar to original results in
      D13259:
      
      On ARM64:
        SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27%
        SingleSource/Benchmarks/Polybench/stencils/adi                   -19.78%
      
      On x86:
        SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog  -27.14%
      
      And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop
      Distribution.
      
      In terms of compile time, there is ~5% increase on both
      SingleSource/Benchmarks/Misc/oourafft and
      SingleSource/Benchmarks/Linkpack/linkpack-pc.  These are both very tiny
      loop-intensive programs where SCEV computations dominates compile time.
      
      The reason that time spent in SCEV increases has to do with the design
      of the old pass manager.  If a transform pass does not preserve an
      analysis we *invalidate* the analysis even if there was *no*
      modification made by the transform pass.
      
      This means that currently we don't take advantage of LLE and LV sharing
      the same analysis (LAA) and unfortunately we recompute LAA *and* SCEV
      for LLE.
      
      (There should be a way to work around this limitation in the case of
      SCEV and LAA since both compute things on demand and internally cache
      their result.  Thus we could pretend that transform passes preserve
      these analyses and manually invalidate them upon actual modification.
      On the other hand the new pass manager is supposed to solve so I am not
      sure if this is worthwhile.)
      
      Reviewers: hfinkel, dberlin
      
      Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D16300
      
      llvm-svn: 262250
      dd9e637a
    • Rong Xu's avatar
      Minor code cleanup. NFC · 9e926e8b
      Rong Xu authored
      llvm-svn: 262242
      9e926e8b
    • Dehao Chen's avatar
      Move discriminator assignment to the right place. · 939993ff
      Dehao Chen authored
      Summary: Now discriminator is assigned per-function instead of per-module.
      
      Reviewers: davidxl, dnovillo
      
      Subscribers: dblaikie, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D17664
      
      llvm-svn: 262240
      939993ff
  8. Feb 28, 2016
  9. Feb 27, 2016
Loading