Skip to content
  1. Jan 04, 2015
  2. Jan 03, 2015
  3. Jan 02, 2015
    • David Majnemer's avatar
      InstCombine: Detect when llvm.umul.with.overflow always overflows · c8a576b5
      David Majnemer authored
      We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero
      and LHSKnownOne * RHSKnownOne overflow.
      
      llvm-svn: 225077
      c8a576b5
    • David Majnemer's avatar
      Analysis: Reformulate WillNotOverflowUnsignedMul for reusability · 491331ac
      David Majnemer authored
      WillNotOverflowUnsignedMul's smarts will live in ValueTracking as
      computeOverflowForUnsignedMul.  It now returns a tri-state result:
      never overflows, always overflows and sometimes overflows.
      
      llvm-svn: 225076
      491331ac
    • Chandler Carruth's avatar
      [SROA] Teach SROA to be more aggressive in splitting now that we have · 24ac830d
      Chandler Carruth authored
      a pre-splitting pass over loads and stores.
      
      Historically, splitting could cause enough problems that I hamstrung the
      entire process with a requirement that splittable integer loads and
      stores must cover the entire alloca. All smaller loads and stores were
      unsplittable to prevent chaos from ensuing. With the new pre-splitting
      logic that does load/store pair splitting I introduced in r225061, we
      can now very nicely handle arbitrarily splittable loads and stores. In
      order to fully benefit from these smarts, we need to mark all of the
      integer loads and stores as splittable.
      
      However, we don't actually want to rewrite partitions with all integer
      loads and stores marked as splittable. This will fail to extract scalar
      integers from aggregates, which is kind of the point of SROA. =] In
      order to resolve this, what we really want to do is only do
      pre-splitting on the alloca slices with integer loads and stores fully
      splittable. This allows us to uncover all non-integer uses of the alloca
      that would benefit from a split in an integer load or store (and where
      introducing the split is safe because it is just memory transfer from
      a load to a store). Once done, we make all the non-whole-alloca integer
      loads and stores unsplittable just as they have historically been,
      repartition and rewrite.
      
      The result is that when there are integer loads and stores anywhere
      within an alloca (such as from a memcpy of a sub-object of a larger
      object), we can split them up if there are non-integer components to the
      aggregate hiding beneath. I've added the challenging test cases to
      demonstrate how this is able to promote to scalars even a case where we
      have even *partially* overlapping loads and stores.
      
      This restores the single-store behavior for small arrays of i8s which is
      really nice. I've restored both the little endian testing and big endian
      testing for these exactly as they were prior to r225061. It also forced
      me to be more aggressive in an alignment test to actually defeat SROA.
      =] Without the added volatiles there, we actually split up the weird i16
      loads and produce nice double allocas with better alignment.
      
      This also uncovered a number of bugs where we failed to handle
      splittable load and store slices which didn't have a begininng offset of
      zero. Those fixes are included, and without them the existing test cases
      explode in glorious fireworks. =]
      
      I've kept support for leaving whole-alloca integer loads and stores as
      splittable even for the purpose of rewriting, but I think that's likely
      no longer needed. With the new pre-splitting, we might be able to remove
      all the splitting support for loads and stores from the rewriter. Not
      doing that in this patch to try to isolate any performance regressions
      that causes in an easy to find and revert chunk.
      
      llvm-svn: 225074
      24ac830d
    • Chandler Carruth's avatar
      [SROA] Make the computation of adjusted pointers not leak GEP · 5986b541
      Chandler Carruth authored
      instructions.
      
      I noticed this when working on dialing up how aggressively we can
      pre-split loads and stores. My test case wasn't passing because dead
      GEPs into the allocas persisted when they were built by this routine.
      This isn't terribly harmful, we still rewrote and promoted the alloca
      and I can't conceive of how to cause this to happen in a case where we
      will keep the exact same alloca but rewrite and promote the uses of it.
      If that ever happened, we'd get an assert out of mem2reg.
      
      So I don't have a direct test case yet, but the subsequent commit's test
      case wouldn't pass without this. There are other problems fixed by this
      patch that I spotted purely by inspection such as the fact that
      getAdjustedPtr could have actually deleted dead base pointers. I don't
      know how to get a base pointer to go into getAdjustedPtr today, so
      I think this bug could never have manifested (and I certainly can't
      write a test case for it) but, it wasn't the intent of the code. The
      code really just wanted to GC the new instructions built. That can be
      done more directly by comparing with the base pointer which is the only
      non-new instruction that this code can return.
      
      llvm-svn: 225073
      5986b541
    • Chandler Carruth's avatar
      [SROA] Fix the loop exit placement to be prior to indexing the splits · 29c22fae
      Chandler Carruth authored
      array. This prevents it from walking out of bounds on the splits array.
      
      Bug found with the existing tests by ASan and by the MSVC debug build.
      
      llvm-svn: 225069
      29c22fae
    • Chandler Carruth's avatar
      [SROA] Fix two total think-os in r225061 that should have been caught on · c39eaa50
      Chandler Carruth authored
      a +asserts bootstrap, but my bootstrap had asserts off. Oops.
      
      Anyways, in some places it is reasonable to cast (as a sanity check) the
      pointer operand to a load or store to an instruction within SROA --
      namely when the pointer operand is expected to be derived from an
      alloca, and thus always an instruction. However, the pre-splitting code
      also deals with loads and stores to non-alloca pointers and there we
      need to just use the Value*. Nothing about the code relied on the
      instruction cast, it was only there essentially as an invariant
      assertion. Remove the two that don't actually hold.
      
      This should fix the proximate issue in PR22080, but I'm also doing an
      asserts bootstrap myself to see if there are other issues lurking.
      
      I'll craft a reduced test case in a moment, but I wanted to get the tree
      healthy as quickly as possible.
      
      llvm-svn: 225068
      c39eaa50
  4. Jan 01, 2015
    • Chandler Carruth's avatar
      [SROA] Switch to using a more direct debug logging technique in one part · 6044c0bc
      Chandler Carruth authored
      of my new load and store splitting, and fix a bug where it logged
      a totally irrelevant slice rather than the actual slice in question.
      
      The logging here previously worked because we used to place new slices
      onto the back of the core sequence, but that caused other problems.
      I updated the actual code to store new slices in their own vector but
      didn't update the logging. There isn't a good way to reuse the logging
      any more, and frankly it wasn't needed. We can directly log this bit
      more easily.
      
      llvm-svn: 225063
      6044c0bc
    • Chandler Carruth's avatar
      [SROA] Fix formatting with clang-format which I managed to fail to do · 994cde88
      Chandler Carruth authored
      prior to committing r225061. Sorry for that.
      
      llvm-svn: 225062
      994cde88
    • Chandler Carruth's avatar
      [SROA] Teach SROA how to much more intelligently handle split loads and · 0715cba0
      Chandler Carruth authored
      stores.
      
      When there are accesses to an entire alloca with an integer
      load or store as well as accesses to small pieces of the alloca, SROA
      splits up the large integer accesses. In order to do that, it uses bit
      math to merge the small accesses into large integers. While this is
      effective, it produces insane IR that can cause significant problems in
      the rest of the optimizer:
      
      - It can cause load and store mismatches with GVN on the non-alloca side
        where we end up loading an i64 (or some such) rather than loading
        specific elements that are stored.
      - We can't always get rid of the integer bit math, which is why we can't
        always fix the loads and stores to work well with GVN.
      - This is especially bad when we have operations that mix poorly with
        integer bit math such as floating point operations.
      - It will block things like the vectorizer which might be able to handle
        the scalar stores that underly the aggregate.
      
      At the same time, we can't just directly split up these loads and stores
      in all cases. If there is actual integer arithmetic involved on the
      values, then using integer bit math is actually the perfect lowering
      because we can often combine it heavily with the surrounding math.
      
      The solution this patch provides is to find places where SROA is
      partitioning aggregates into small elements, and look for splittable
      loads and stores that it can split all the way to some other adjacent
      load and store. These are uniformly the cases where failing to split the
      loads and stores hurts the optimizer that I have seen, and I've looked
      extensively at the code produced both from more and less aggressive
      approaches to this problem.
      
      However, it is quite tricky to actually do this in SROA. We may have
      loads and stores to the same alloca, or other complex patterns that are
      hard to handle. This complexity leads to the somewhat subtle algorithm
      implemented here. We have to do this entire process as a separate pass
      over the partitioning of the alloca, and split up all of the loads prior
      to splitting the stores so that we can handle safely the cases of
      overlapping, including partially overlapping, loads and stores to the
      same alloca. We also have to reconstitute the post-split slice
      configuration so we can avoid iterating again over all the alloca uses
      (the slow part of SROA). But we also have to ensure that when we split
      up loads and stores to *other* allocas, we *do* re-iterate over them in
      SROA to adapt to the more refined partitioning now required.
      
      With this, I actually think we can fix a long-standing TODO in SROA
      where I avoided splitting as many loads and stores as probably should be
      splittable. This limitation historically mitigated the fallout of all
      the bad things mentioned above. Now that we have more intelligent
      handling, I plan to remove the FIXME and more aggressively mark integer
      loads and stores as splittable. I'll do that in a follow-up patch to
      help with bisecting any fallout.
      
      The net result of this change should be more fine-grained and accurate
      scalars being formed out of aggregates. At the very least, Clang now
      generates perfect code for this high-level test case using
      std::complex<float>:
      
        #include <complex>
      
        void g1(std::complex<float> &x, float a, float b) {
          x += std::complex<float>(a, b);
        }
        void g2(std::complex<float> &x, float a, float b) {
          x -= std::complex<float>(a, b);
        }
      
        void foo(const std::complex<float> &x, float a, float b,
                 std::complex<float> &x1, std::complex<float> &x2) {
          std::complex<float> l1 = x;
          g1(l1, a, b);
          std::complex<float> l2 = x;
          g2(l2, a, b);
          x1 = l1;
          x2 = l2;
        }
      
      This code isn't just hypothetical either. It was reduced out of the hot
      inner loops of essentially every part of the Eigen math library when
      using std::complex<float>. Those loops would consistently and
      pervasively hop between the floating point unit and the integer unit due
      to bit math extraction and insertion of floating point values that were
      "stored" in a 64-bit integer register around the loop backedge.
      
      So far, this change has passed a bootstrap and I have done some other
      testing and so far, no issues. That doesn't mean there won't be though,
      so I'll be prepared to help with any fallout. If you performance swings
      in particular, please let me know. I'm very curious what all the impact
      of this change will be. Stay tuned for the follow-up to also split more
      integer loads and stores.
      
      llvm-svn: 225061
      0715cba0
  5. Dec 31, 2014
  6. Dec 30, 2014
    • Kostya Serebryany's avatar
      aa185bfc
    • Elena Demikhovsky's avatar
      Some code improvements in Masked Load/Store. · 84d1997b
      Elena Demikhovsky authored
      No functional changes.
      
      llvm-svn: 224986
      84d1997b
    • Philip Reames's avatar
      Carry facts about nullness and undef across GC relocation · 9db26ffc
      Philip Reames authored
      This change implements four basic optimizations:
      
          If a relocated value isn't used, it doesn't need to be relocated.
          If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.)
          If the value being relocated is undef, the relocation is meaningless.
          If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.)
      
      I outlined other planned work in comments.
      
      Differential Revision: http://reviews.llvm.org/D6600
      
      llvm-svn: 224968
      9db26ffc
    • Philip Reames's avatar
      Refine the notion of MayThrow in LICM to include a header specific version · b35f46ce
      Philip Reames authored
      In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up.
      
      This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs.
      
      define void @nothrow_header(i64 %x, i64 %y, i1 %cond) {
      ; CHECK-LABEL: nothrow_header
      ; CHECK-LABEL: entry
      ; CHECK: %div = udiv i64 %x, %y
      ; CHECK-LABEL: loop
      ; CHECK: call void @use(i64 %div)
      entry:
        br label %loop
      loop: ; preds = %entry, %for.inc
        %div = udiv i64 %x, %y
        br i1 %cond, label %loop-if, label %exit
      loop-if:
        call void @use(i64 %div)
        br label %loop
      exit:
        ret void
      }
      
      The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load.  The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata.
      
      Differential Revision: http://reviews.llvm.org/D6725
      
      llvm-svn: 224965
      b35f46ce
  7. Dec 29, 2014
    • Philip Reames's avatar
      Loading from null is valid outside of addrspace 0 · 5ad26c35
      Philip Reames authored
      This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen.  This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces.
      
      We really should introduce a hook to control this property on a per target per address space basis.  We may be loosing valuable optimizations in some address spaces by being too conservative.
      
      Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me.
      
      llvm-svn: 224961
      5ad26c35
  8. Dec 26, 2014
  9. Dec 25, 2014
  10. Dec 24, 2014
    • Chandler Carruth's avatar
      [SROA] Update the documentation and names for accessing the slices · ffb7ce56
      Chandler Carruth authored
      within a partition of an alloca in SROA.
      
      This reflects the fact that the organization of the slices isn't really
      ideal for analysis, but is the naive way in which the slices are
      available while we're processing them in the core partitioning
      algorithm.
      
      It is possible we could improve matters, and I've left a FIXME with
      one of my ideas for how to do this, but it is a lot of work, the benefit
      is somewhat minor, and it isn't clear that it would be strictly better.
      =/ Not really satisfying, but I'm out of really good ideas.
      
      This also improves one place where the debug logging failed to mark some
      split partitions. Now we log in one place, slightly later, and with
      accurate information about whether the slice is split by the partition
      being rewritten.
      
      llvm-svn: 224800
      ffb7ce56
    • Chandler Carruth's avatar
      [SROA] Refactor the integer and vector promotion testing logic to · 5031bbe8
      Chandler Carruth authored
      operate in terms of the new Partition class, and generally have a more
      clear set of arguments. No functionality changed.
      
      The most notable improvements here are consistently using the
      terminology of 'partition' for a collection of slices that will be
      rewritten together and 'slice' for a region of an alloca that is used by
      a particular instruction.
      
      This also makes it more clear that the split things are actually slices
      as well, just ones that will be split by the proposed partition.
      
      This doesn't yet address the confusing aspects of the partition's
      interface where slices that will be split by the partition and start
      prior to the partition are accesssed via Partition::splitSlices() while
      the core range of slices exposed by a Partition includes both unsplit
      slices and slices which will be split by the end, but started within the
      offset range of the partition. This is particularly hard to address
      because the algorithm which computes partitions quite literally doesn't
      know which slices these will end up being until too late. I'm looking at
      whether I can fix that or not, but I'm not optimistic. I'll update the
      comments and/or names to further explain this either way. I've also
      added one FIXME in this patch relating to this confusion so that I don't
      forget about it.
      
      llvm-svn: 224798
      5031bbe8
  11. Dec 23, 2014
    • Kostya Serebryany's avatar
      [asan] change the coverage collection scheme so that we can easily emit... · 9fdeb37b
      Kostya Serebryany authored
      [asan] change the coverage collection scheme so that we can easily emit coverage for the entire process as a single bit set, and if coverage_bitset=1 actually emit that bitset
      
      llvm-svn: 224789
      9fdeb37b
    • Michael Liao's avatar
      [SimplifyCFG] Revise common code sinking · 5313da32
      Michael Liao authored
      - Fix the case where more than 1 common instructions derived from the same
        operand cannot be sunk. When a pair of value has more than 1 derived values
        in both branches, only 1 derived value could be sunk.
      - Replace BB1 -> (BB2, PN) map with joint value map, i.e.
        map of (BB1, BB2) -> PN, which is more accurate to track common ops.
      
      llvm-svn: 224757
      5313da32
    • Michael Kuperstein's avatar
      Remove a bad cast in CloneModule() · 0bf33ffd
      Michael Kuperstein authored
      A cast that was introduced in r209007 was accidentally left in after the changes made to GlobalAlias rules in r210062. This crashes if the aliasee is a now-leggal ConstantExpr.
      
      llvm-svn: 224756
      0bf33ffd
    • Chandler Carruth's avatar
      Revert r224739: Debug info: Teach SROA how to update debug info for · c7d1e24b
      Chandler Carruth authored
      fragmented variables.
      
      This caused codegen to start crashing when we built somewhat large
      programs with debug info and optimizations. 'check-msan' hit in, and
      I suspect a bootstrap would as well. I mailed a test case to the
      review thread.
      
      llvm-svn: 224750
      c7d1e24b
    • David Blaikie's avatar
      Remove dynamic allocation/indirection from GCOVBlocks owned by GCOVFunction · ea37c117
      David Blaikie authored
      Since these are all created in the DenseMap before they are referenced,
      there's no problem with pointer validity by the time it's required. This
      removes another use of DeleteContainerSeconds/manual memory management
      which I'm cleaning up from time to time.
      
      llvm-svn: 224744
      ea37c117
  12. Dec 22, 2014
    • Chandler Carruth's avatar
      [SROA] Lift the logic for traversing the alloca slices one partition at · e2f66cee
      Chandler Carruth authored
      a time into a partition iterator and a Partition class.
      
      There is a lot of knock-on simplification that this enables, largely
      stemming from having a Partition object to refer to in lots of helpers.
      I've only done a minimal amount of that because enoguh stuff is changing
      as-is in this commit.
      
      This shouldn't change any observable behavior. I've worked hard to
      preserve the *exact* traversal semantics which were originally present
      even though some of them make no sense. I'll be changing some of this in
      subsequent commits now that the logic is carefully factored into
      a reusable place.
      
      The primary motivation for this change is to break the rewriting into
      phases in order to support more intelligent rewriting. For example, I'm
      planning to change how split loads and stores are rewritten to remove
      the significant overuse of integer bit packing in the resulting code and
      allow more effective secondary splitting of aggregates. For any of this
      to work, they have to share the exact traversal logic.
      
      llvm-svn: 224742
      e2f66cee
    • Bruno Cardoso Lopes's avatar
      [LCSSA] Handle PHI insertion in disjoint loops · bad65c3b
      Bruno Cardoso Lopes authored
      Take two disjoint Loops L1 and L2.
      
      LoopSimplify fails to simplify some loops (e.g. when indirect branches
      are involved). In such situations, it can happen that an exit for L1 is
      the header of L2. Thus, when we create PHIs in one of such exits we are
      also inserting PHIs in L2 header.
      
      This could break LCSSA form for L2 because these inserted PHIs can also
      have uses in L2 exits, which are never handled in the current
      implementation. Provide a fix for this corner case and test that we
      don't assert/crash on that.
      
      Differential Revision: http://reviews.llvm.org/D6624
      
      rdar://problem/19166231
      
      llvm-svn: 224740
      bad65c3b
    • Adrian Prantl's avatar
      Debug info: Teach SROA how to update debug info for fragmented variables. · a47ace59
      Adrian Prantl authored
      This allows us to generate debug info for extremely advanced code such as
      
        typedef struct { long int a; int b;} S;
      
        int foo(S s) {
          return s.b;
        }
      
      which at -O1 on x86_64 is codegen'd into
      
        define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 {
          ret i32 %s.coerce1, !dbg !24
        }
      
      with this patch we emit the following debug info for this
      
        TAG_formal_parameter [3]
          AT_location( 0x00000000
                       0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004
                       0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 )
                       AT_name( "s" )
                       AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" )
      
      Thanks to chandlerc, dblaikie, and echristo for their feedback on all
      previous iterations of this patch!
      
      llvm-svn: 224739
      a47ace59
  13. Dec 20, 2014
  14. Dec 19, 2014
Loading