Skip to content
  1. Jan 09, 2011
  2. Jan 08, 2011
    • Cameron Zwarich's avatar
      Fix coding style. · 0939bc37
      Cameron Zwarich authored
      llvm-svn: 123093
      0939bc37
    • Chris Lattner's avatar
      fix a latent bug in memcpyoptimizer that my recent patches exposed: it wasn't · 7d6433ae
      Chris Lattner authored
      updating memdep when fusing stores together.  This fixes the crash optimizing
      the bullet benchmark.
      
      llvm-svn: 123091
      7d6433ae
    • Chris Lattner's avatar
      tryMergingIntoMemset can only handle constant length memsets. · ff6ed2ac
      Chris Lattner authored
      llvm-svn: 123090
      ff6ed2ac
    • Chris Lattner's avatar
      Merge memsets followed by neighboring memsets and other stores into · 9a1d63ba
      Chris Lattner authored
      larger memsets.  Among other things, this fixes rdar://8760394 and
      allows us to handle "Example 2" from http://blog.regehr.org/archives/320,
      compiling it into a single 4096-byte memset:
      
      _mad_synth_mute:                        ## @mad_synth_mute
      ## BB#0:                                ## %entry
      	pushq	%rax
      	movl	$4096, %esi             ## imm = 0x1000
      	callq	___bzero
      	popq	%rax
      	ret
      
      llvm-svn: 123089
      9a1d63ba
    • Chris Lattner's avatar
      fix an issue in IsPointerOffset that prevented us from recognizing that · 5120ebf1
      Chris Lattner authored
      P and P+1 are relative to the same base pointer.
      
      llvm-svn: 123087
      5120ebf1
    • Chris Lattner's avatar
      enhance memcpyopt to merge a store and a subsequent · 4dc1fd93
      Chris Lattner authored
      memset into a single larger memset.
      
      llvm-svn: 123086
      4dc1fd93
    • Chris Lattner's avatar
      constify TargetData references. · c638147e
      Chris Lattner authored
      Split memset formation logic out into its own
      "tryMergingIntoMemset" helper function.
      
      llvm-svn: 123081
      c638147e
    • Chris Lattner's avatar
      When loop rotation happens, it is *very* common for the duplicated condbr · 59c82f85
      Chris Lattner authored
      to be foldable into an uncond branch.  When this happens, we can make a
      much simpler CFG for the loop, which is important for nested loop cases
      where we want the outer loop to be aggressively optimized.
      
      Handle this case more aggressively.  For example, previously on
      phi-duplicate.ll we would get this:
      
      
      define void @test(i32 %N, double* %G) nounwind ssp {
      entry:
        %cmp1 = icmp slt i64 1, 1000
        br i1 %cmp1, label %bb.nph, label %for.end
      
      bb.nph:                                           ; preds = %entry
        br label %for.body
      
      for.body:                                         ; preds = %bb.nph, %for.cond
        %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ]
        %arrayidx = getelementptr inbounds double* %G, i64 %j.02
        %tmp3 = load double* %arrayidx
        %sub = sub i64 %j.02, 1
        %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
        %tmp7 = load double* %arrayidx6
        %add = fadd double %tmp3, %tmp7
        %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02
        store double %add, double* %arrayidx10
        %inc = add nsw i64 %j.02, 1
        br label %for.cond
      
      for.cond:                                         ; preds = %for.body
        %cmp = icmp slt i64 %inc, 1000
        br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge
      
      for.cond.for.end_crit_edge:                       ; preds = %for.cond
        br label %for.end
      
      for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
        ret void
      }
      
      Now we get the much nicer:
      
      define void @test(i32 %N, double* %G) nounwind ssp {
      entry:
        br label %for.body
      
      for.body:                                         ; preds = %entry, %for.body
        %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
        %arrayidx = getelementptr inbounds double* %G, i64 %j.01
        %tmp3 = load double* %arrayidx
        %sub = sub i64 %j.01, 1
        %arrayidx6 = getelementptr inbounds double* %G, i64 %sub
        %tmp7 = load double* %arrayidx6
        %add = fadd double %tmp3, %tmp7
        %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01
        store double %add, double* %arrayidx10
        %inc = add nsw i64 %j.01, 1
        %cmp = icmp slt i64 %inc, 1000
        br i1 %cmp, label %for.body, label %for.end
      
      for.end:                                          ; preds = %for.body
        ret void
      }
      
      With all of these recent changes, we are now able to compile:
      
      void foo(char *X) {
       for (int i = 0; i != 100; ++i) 
         for (int j = 0; j != 100; ++j)
           X[j+i*100] = 0;
      }
      
      into a single memset of 10000 bytes.  This series of changes
      should also be helpful for other nested loop scenarios as well.
      
      llvm-svn: 123079
      59c82f85
    • Chris Lattner's avatar
      make domtree verification print something useful on failure. · 5f7734c4
      Chris Lattner authored
      llvm-svn: 123078
      5f7734c4
    • Chris Lattner's avatar
      split ssa updating code out to its own helper function. Don't bother · 30f318e5
      Chris Lattner authored
      moving the OrigHeader block anymore: we just merge it away anyway so
      its code layout doesn't matter.
      
      llvm-svn: 123077
      30f318e5
    • Chris Lattner's avatar
      Implement a TODO: Enhance loopinfo to merge away the unconditional branch · 2615130e
      Chris Lattner authored
      that it was leaving in loops after rotation (between the original latch
      block and the original header.
      
      With this change, it is possible for rotated loops to have just a single
      basic block, which is useful.
      
      llvm-svn: 123075
      2615130e
    • Chris Lattner's avatar
      various code cleanups, enhance MergeBlockIntoPredecessor to preserve · 930b716e
      Chris Lattner authored
      loop info.
      
      llvm-svn: 123074
      930b716e
    • Chris Lattner's avatar
      inline preserveCanonicalLoopForm now that it is simple. · fee37c5f
      Chris Lattner authored
      llvm-svn: 123073
      fee37c5f
    • Chris Lattner's avatar
      Three major changes: · 063dca0f
      Chris Lattner authored
      1. Rip out LoopRotate's domfrontier updating code.  It isn't
         needed now that LICM doesn't use DF and it is super complex
         and gross.
      2. Make DomTree updating code a lot simpler and faster.  The 
         old loop over all the blocks was just to find a block??
      3. Change the code that inserts the new preheader to just use
         SplitCriticalEdge instead of doing an overcomplex 
         reimplementation of it.
      
      No behavior change, except for the name of the inserted preheader.
      
      llvm-svn: 123072
      063dca0f
    • Chris Lattner's avatar
      reduce nesting. · 30d95f9f
      Chris Lattner authored
      llvm-svn: 123071
      30d95f9f
Loading