Skip to content
  1. Jan 14, 2011
    • Duncan Sands's avatar
      Move some shift transforms out of instcombine and into InstructionSimplify. · 7f60dc1e
      Duncan Sands authored
      While there, I noticed that the transform "undef >>a X -> undef" was wrong.
      For example if X is 2 then the top two bits must be equal, so the result can
      not be anything.  I fixed this in the constant folder as well.  Also, I made
      the transform for "X << undef" stronger: it now folds to undef always, even
      though X might be zero.  This is in accordance with the LangRef, but I must
      admit that it is fairly aggressive.  Also, I added "i32 X << 32 -> undef"
      following the LangRef and the constant folder, likewise fairly aggressive.
      
      llvm-svn: 123417
      7f60dc1e
  2. Jan 13, 2011
    • Bob Wilson's avatar
      Fix whitespace. · 328e91bb
      Bob Wilson authored
      llvm-svn: 123396
      328e91bb
    • Bob Wilson's avatar
      Check for empty structs, and for consistency, zero-element arrays. · c8056a95
      Bob Wilson authored
      llvm-svn: 123383
      c8056a95
    • Bob Wilson's avatar
      Extend SROA to handle arrays accessed as homogeneous structs and vice versa. · 08713d3c
      Bob Wilson authored
      This is a minor extension of SROA to handle a special case that is
      important for some ARM NEON operations.  Some of the NEON intrinsics
      return multiple values, which are handled as struct types containing
      multiple elements of the same vector type.  The corresponding return
      types declared in the arm_neon.h header have equivalent arrays.  We
      need SROA to recognize that it can split up those arrays and structs
      into separate vectors, even though they are not always accessed with
      the same type.  SROA already handles loads and stores of an entire
      alloca by using insertvalue/extractvalue to access the individual
      pieces, and that code works the same regardless of whether the type
      is a struct or an array.  So, all that needs to be done is to check
      for compatible arrays and homogeneous structs.
      
      llvm-svn: 123381
      08713d3c
    • Bob Wilson's avatar
      Make SROA more aggressive with allocas containing padding. · 12eec40c
      Bob Wilson authored
      SROA only split up structs and arrays one level at a time, so padding can
      only cause trouble if it is located in between the struct or array elements.
      
      llvm-svn: 123380
      12eec40c
  3. Jan 12, 2011
  4. Jan 11, 2011
  5. Jan 10, 2011
  6. Jan 09, 2011
  7. Jan 08, 2011
    • Chris Lattner's avatar
      fix a latent bug in memcpyoptimizer that my recent patches exposed: it wasn't · 7d6433ae
      Chris Lattner authored
      updating memdep when fusing stores together.  This fixes the crash optimizing
      the bullet benchmark.
      
      llvm-svn: 123091
      7d6433ae
    • Chris Lattner's avatar
      tryMergingIntoMemset can only handle constant length memsets. · ff6ed2ac
      Chris Lattner authored
      llvm-svn: 123090
      ff6ed2ac
    • Chris Lattner's avatar
      Merge memsets followed by neighboring memsets and other stores into · 9a1d63ba
      Chris Lattner authored
      larger memsets.  Among other things, this fixes rdar://8760394 and
      allows us to handle "Example 2" from http://blog.regehr.org/archives/320,
      compiling it into a single 4096-byte memset:
      
      _mad_synth_mute:                        ## @mad_synth_mute
      ## BB#0:                                ## %entry
      	pushq	%rax
      	movl	$4096, %esi             ## imm = 0x1000
      	callq	___bzero
      	popq	%rax
      	ret
      
      llvm-svn: 123089
      9a1d63ba
    • Chris Lattner's avatar
      fix an issue in IsPointerOffset that prevented us from recognizing that · 5120ebf1
      Chris Lattner authored
      P and P+1 are relative to the same base pointer.
      
      llvm-svn: 123087
      5120ebf1
    • Chris Lattner's avatar
      enhance memcpyopt to merge a store and a subsequent · 4dc1fd93
      Chris Lattner authored
      memset into a single larger memset.
      
      llvm-svn: 123086
      4dc1fd93
    • Chris Lattner's avatar
      constify TargetData references. · c638147e
      Chris Lattner authored
      Split memset formation logic out into its own
      "tryMergingIntoMemset" helper function.
      
      llvm-svn: 123081
      c638147e
    • Chris Lattner's avatar
      When loop rotation happens, it is *very* common for the duplicated condbr · 59c82f85
      Chris Lattner authored
      to be foldable into an uncond branch.  When this happens, we can make a
      much simpler CFG for the loop, which is important for nested loop cases
      where we want the outer loop to be aggressively optimized.
      
      Handle this case more aggressively.  For example, previously on
      phi-duplicate.ll we would get this:
      
      
      define void @test(i32 %N, double* %G) nounwind ssp {
      entry:
        %cmp1 = icmp slt i64 1, 1000
        br i1 %cmp1, label %bb.nph, label %for.end
      
      bb.nph:                                           ; preds = %entry
        br label %for.body
      
      for.body:                                         ; preds = %bb.nph, %for.cond
        %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ]
        %arrayidx = getelementptr inbounds double* %G, i64 %j.02
        %tmp3 = load double* %arrayidx
        %sub = sub i64 %j.02, 1
        %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
        %tmp7 = load double* %arrayidx6
        %add = fadd double %tmp3, %tmp7
        %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02
        store double %add, double* %arrayidx10
        %inc = add nsw i64 %j.02, 1
        br label %for.cond
      
      for.cond:                                         ; preds = %for.body
        %cmp = icmp slt i64 %inc, 1000
        br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge
      
      for.cond.for.end_crit_edge:                       ; preds = %for.cond
        br label %for.end
      
      for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
        ret void
      }
      
      Now we get the much nicer:
      
      define void @test(i32 %N, double* %G) nounwind ssp {
      entry:
        br label %for.body
      
      for.body:                                         ; preds = %entry, %for.body
        %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
        %arrayidx = getelementptr inbounds double* %G, i64 %j.01
        %tmp3 = load double* %arrayidx
        %sub = sub i64 %j.01, 1
        %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
        %tmp7 = load double* %arrayidx6
        %add = fadd double %tmp3, %tmp7
        %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01
        store double %add, double* %arrayidx10
        %inc = add nsw i64 %j.01, 1
        %cmp = icmp slt i64 %inc, 1000
        br i1 %cmp, label %for.body, label %for.end
      
      for.end:                                          ; preds = %for.body
        ret void
      }
      
      With all of these recent changes, we are now able to compile:
      
      void foo(char *X) {
       for (int i = 0; i != 100; ++i) 
         for (int j = 0; j != 100; ++j)
           X[j+i*100] = 0;
      }
      
      into a single memset of 10000 bytes.  This series of changes
      should also be helpful for other nested loop scenarios as well.
      
      llvm-svn: 123079
      59c82f85
    • Chris Lattner's avatar
      split ssa updating code out to its own helper function. Don't bother · 30f318e5
      Chris Lattner authored
      moving the OrigHeader block anymore: we just merge it away anyway so
      its code layout doesn't matter.
      
      llvm-svn: 123077
      30f318e5
    • Chris Lattner's avatar
      Implement a TODO: Enhance loopinfo to merge away the unconditional branch · 2615130e
      Chris Lattner authored
      that it was leaving in loops after rotation (between the original latch
      block and the original header.
      
      With this change, it is possible for rotated loops to have just a single
      basic block, which is useful.
      
      llvm-svn: 123075
      2615130e
    • Chris Lattner's avatar
      various code cleanups, enhance MergeBlockIntoPredecessor to preserve · 930b716e
      Chris Lattner authored
      loop info.
      
      llvm-svn: 123074
      930b716e
Loading