Skip to content
  1. Jan 08, 2011
    • Chris Lattner's avatar
      enhance memcpyopt to merge a store and a subsequent · 4dc1fd93
      Chris Lattner authored
      memset into a single larger memset.
      
      llvm-svn: 123086
      4dc1fd93
    • Chris Lattner's avatar
      merge two tests and filecheckify · 9dbbc49f
      Chris Lattner authored
      llvm-svn: 123082
      9dbbc49f
    • Chris Lattner's avatar
      When loop rotation happens, it is *very* common for the duplicated condbr · 59c82f85
      Chris Lattner authored
      to be foldable into an uncond branch.  When this happens, we can make a
      much simpler CFG for the loop, which is important for nested loop cases
      where we want the outer loop to be aggressively optimized.
      
      Handle this case more aggressively.  For example, previously on
      phi-duplicate.ll we would get this:
      
      
      define void @test(i32 %N, double* %G) nounwind ssp {
      entry:
        %cmp1 = icmp slt i64 1, 1000
        br i1 %cmp1, label %bb.nph, label %for.end
      
      bb.nph:                                           ; preds = %entry
        br label %for.body
      
      for.body:                                         ; preds = %bb.nph, %for.cond
        %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ]
        %arrayidx = getelementptr inbounds double* %G, i64 %j.02
        %tmp3 = load double* %arrayidx
        %sub = sub i64 %j.02, 1
        %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
        %tmp7 = load double* %arrayidx6
        %add = fadd double %tmp3, %tmp7
        %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02
        store double %add, double* %arrayidx10
        %inc = add nsw i64 %j.02, 1
        br label %for.cond
      
      for.cond:                                         ; preds = %for.body
        %cmp = icmp slt i64 %inc, 1000
        br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge
      
      for.cond.for.end_crit_edge:                       ; preds = %for.cond
        br label %for.end
      
      for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
        ret void
      }
      
      Now we get the much nicer:
      
      define void @test(i32 %N, double* %G) nounwind ssp {
      entry:
        br label %for.body
      
      for.body:                                         ; preds = %entry, %for.body
        %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
        %arrayidx = getelementptr inbounds double* %G, i64 %j.01
        %tmp3 = load double* %arrayidx
        %sub = sub i64 %j.01, 1
        %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
        %tmp7 = load double* %arrayidx6
        %add = fadd double %tmp3, %tmp7
        %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01
        store double %add, double* %arrayidx10
        %inc = add nsw i64 %j.01, 1
        %cmp = icmp slt i64 %inc, 1000
        br i1 %cmp, label %for.body, label %for.end
      
      for.end:                                          ; preds = %for.body
        ret void
      }
      
      With all of these recent changes, we are now able to compile:
      
      void foo(char *X) {
       for (int i = 0; i != 100; ++i) 
         for (int j = 0; j != 100; ++j)
           X[j+i*100] = 0;
      }
      
      into a single memset of 10000 bytes.  This series of changes
      should also be helpful for other nested loop scenarios as well.
      
      llvm-svn: 123079
      59c82f85
    • Chris Lattner's avatar
      Three major changes: · 063dca0f
      Chris Lattner authored
      1. Rip out LoopRotate's domfrontier updating code.  It isn't
         needed now that LICM doesn't use DF and it is super complex
         and gross.
      2. Make DomTree updating code a lot simpler and faster.  The 
         old loop over all the blocks was just to find a block??
      3. Change the code that inserts the new preheader to just use
         SplitCriticalEdge instead of doing an overcomplex 
         reimplementation of it.
      
      No behavior change, except for the name of the inserted preheader.
      
      llvm-svn: 123072
      063dca0f
    • Frits van Bommel's avatar
    • Chris Lattner's avatar
      Have loop-rotate simplify instructions (yay instsimplify!) as it clones · 8c5defd0
      Chris Lattner authored
      them into the loop preheader, eliminating silly instructions like
      "icmp i32 0, 100" in fixed tripcount loops.  This also better exposes the 
      bigger problem with loop rotate that I'd like to fix: once this has been
      folded, the duplicated conditional branch *often* turns into an uncond branch.
      
      Not aggressively handling this is pessimizing later loop optimizations 
      somethin' fierce by making "dominates all exit blocks" checks fail.
      
      llvm-svn: 123060
      8c5defd0
  2. Jan 07, 2011
  3. Jan 06, 2011
  4. Jan 04, 2011
  5. Jan 03, 2011
  6. Jan 02, 2011
  7. Jan 01, 2011
  8. Dec 29, 2010
  9. Dec 27, 2010
    • Chris Lattner's avatar
      implement enough of the memset inference algorithm to recognize and insert · 29e14edc
      Chris Lattner authored
      memsets.  This is still missing one important validity check, but this is enough
      to compile stuff like this:
      
      void test0(std::vector<char> &X) {
        for (std::vector<char>::iterator I = X.begin(), E = X.end(); I != E; ++I)
          *I = 0;
      }
      
      void test1(std::vector<int> &X) {
        for (long i = 0, e = X.size(); i != e; ++i)
          X[i] = 0x01010101;
      }
      
      With:
       $ clang t.cpp -S -o - -O2 -emit-llvm | opt -loop-idiom | opt -O3 | llc 
      
      to:
      
      __Z5test0RSt6vectorIcSaIcEE:            ## @_Z5test0RSt6vectorIcSaIcEE
      ## BB#0:                                ## %entry
      	subq	$8, %rsp
      	movq	(%rdi), %rax
      	movq	8(%rdi), %rsi
      	cmpq	%rsi, %rax
      	je	LBB0_2
      ## BB#1:                                ## %bb.nph
      	subq	%rax, %rsi
      	movq	%rax, %rdi
      	callq	___bzero
      LBB0_2:                                 ## %for.end
      	addq	$8, %rsp
      	ret
      ...
      __Z5test1RSt6vectorIiSaIiEE:            ## @_Z5test1RSt6vectorIiSaIiEE
      ## BB#0:                                ## %entry
      	subq	$8, %rsp
      	movq	(%rdi), %rax
      	movq	8(%rdi), %rdx
      	subq	%rax, %rdx
      	cmpq	$4, %rdx
      	jb	LBB1_2
      ## BB#1:                                ## %for.body.preheader
      	andq	$-4, %rdx
      	movl	$1, %esi
      	movq	%rax, %rdi
      	callq	_memset
      LBB1_2:                                 ## %for.end
      	addq	$8, %rsp
      	ret
      
      llvm-svn: 122573
      29e14edc
  10. Dec 26, 2010
  11. Dec 24, 2010
  12. Dec 23, 2010
  13. Dec 22, 2010
Loading