- Jan 02, 2011
-
-
Chris Lattner authored
so that Dominators.h is *just* domtree. Also prune #includes a bit. llvm-svn: 122714
-
Chris Lattner authored
llvm-svn: 122713
-
Chris Lattner authored
sure that the loop we're promoting into a memcpy doesn't mutate the input of the memcpy. Before we were just checking that the dest of the memcpy wasn't mod/ref'd by the loop. llvm-svn: 122712
-
Chris Lattner authored
mess with it. We'd rather peel/unroll it than convert all of its stores into memsets. llvm-svn: 122711
-
Benjamin Kramer authored
This allows us to compile: void test(char *s, int a) { __builtin_memset(s, a, 15); } into 1 mul + 3 stores instead of 3 muls + 3 stores. llvm-svn: 122710
-
Peter Collingbourne authored
llvm-svn: 122709
-
Peter Collingbourne authored
llvm-svn: 122708
-
Benjamin Kramer authored
Lower the i8 extension in memset to a multiply instead of a potentially long series of shifts and ors. We could implement a DAGCombine to turn x * 0x0101 back into logic operations on targets that doesn't support the multiply or it is slow (p4) if someone cares enough. Example code: void test(char *s, int a) { __builtin_memset(s, a, 4); } before: _test: ## @test movzbl 8(%esp), %eax movl %eax, %ecx shll $8, %ecx orl %eax, %ecx movl %ecx, %eax shll $16, %eax orl %ecx, %eax movl 4(%esp), %ecx movl %eax, 4(%ecx) movl %eax, (%ecx) ret after: _test: ## @test movzbl 8(%esp), %eax imull $16843009, %eax, %eax ## imm = 0x1010101 movl 4(%esp), %ecx movl %eax, 4(%ecx) movl %eax, (%ecx) ret llvm-svn: 122707
-
Oscar Fuentes authored
llvm-svn: 122706
-
Nick Lewycky authored
another function. llvm-svn: 122705
-
Chris Lattner authored
blocks in a loop, instead of just the header block. This makes it more aggressive, able to handle Duncan's Ada examples. llvm-svn: 122704
-
Chris Lattner authored
llvm-svn: 122703
-
Chris Lattner authored
isExitBlockDominatedByBlockInLoop is a relic of the days when domtree was *just* a tree and didn't have DFS numbers. Checking DFS numbers is faster and easier than "limiting the search of the tree". llvm-svn: 122702
-
Chris Lattner authored
llvm-svn: 122701
-
Chris Lattner authored
llvm-svn: 122700
-
Duncan Sands authored
in the PR, the pass could break LCSSA form when inserting preheaders. It probably would be easy enough to fix this, but since currently we always go into LCSSA form after running this pass, doing so is not urgent. llvm-svn: 122695
-
Cameron Zwarich authored
llvm-svn: 122693
-
Oscar Fuentes authored
llvm-svn: 122692
-
Cameron Zwarich authored
llvm-svn: 122691
-
Cameron Zwarich authored
llvm-svn: 122690
-
Cameron Zwarich authored
llvm-svn: 122689
-
Cameron Zwarich authored
llvm-svn: 122688
-
Cameron Zwarich authored
llvm-svn: 122687
-
Francois Pichet authored
llvm-svn: 122686
-
Chris Lattner authored
header for now for memset/memcpy opportunities. It turns out that loop-rotate is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for loops" into 2 basic block loops that loop-idiom was ignoring. With this fix, we form many *many* more memcpy and memsets than before, including on the "history" loops in the viterbi benchmark, which look like this: for (j=0; j<MAX_history; ++j) { history_new[i][j+1] = history[2*i][j]; } Transforming these loops into memcpy's speeds up the viterbi benchmark from 11.98s to 3.55s on my machine. Woo. llvm-svn: 122685
-
Cameron Zwarich authored
compile, and everyone's tests have shown it to be slower in practice, even for quite large graphs. I also hope to do an optimization that is only correct with the simpler data structure, which would break this even further. llvm-svn: 122684
-
Chris Lattner authored
llvm-svn: 122683
-
Chris Lattner authored
llvm-svn: 122682
-
Chris Lattner authored
size of a loop header instead of its own code size estimator. This allows it to handle bitcasts etc more precisely. llvm-svn: 122681
-
Cameron Zwarich authored
naively implemented, the Lengauer-Tarjan algorithm requires a separate bucket for each vertex. However, this is unnecessary, because each vertex is only placed into a single bucket (that of its semidominator), and each vertex's bucket is processed before it is added to any bucket itself. Instead of using a bucket per vertex, we use a single array Buckets that has two purposes. Before the vertex V with DFS number i is processed, Buckets[i] stores the index of the first element in V's bucket. After V's bucket is processed, Buckets[i] stores the index of the next element in the bucket to which V now belongs, if any. Reading from the buckets can also be optimized. Instead of processing the bucket of V's parent at the end of processing V, we process the bucket of V itself at the beginning of processing V. This means that the case of the root vertex can be simplified somewhat. It also means that we don't need to look up the DFS number of the semidominator of every node in the bucket we are processing, since we know it is the current index being processed. This is a 6.5% speedup running -domtree on test-suite + SPEC2000/2006, with larger speedups of around 12% on the larger benchmarks like GCC. llvm-svn: 122680
-
Rafael Espindola authored
statements using the "x" constraint. llvm-svn: 122679
-
Chris Lattner authored
llvm-svn: 122678
-
Nick Lewycky authored
maintains the guarantee that the DenseSet expects two elements it contains to not go from inequal to equal under its nose. As a side-effect, this also lets us switch from iterating to a fixed-point to actually maintaining a work queue of functions to look at again, and we don't add thunks to our work queue so we don't need to detect and ignore them. llvm-svn: 122677
-
- Jan 01, 2011
-
-
Chris Lattner authored
llvm-svn: 122676
-
Chris Lattner authored
llvm-svn: 122675
-
Chris Lattner authored
loop idiom pass exposed. llvm-svn: 122674
-
Rafael Espindola authored
llvm-svn: 122672
-
Benjamin Kramer authored
llvm-svn: 122671
-
Rafael Espindola authored
llvm-svn: 122670
-
Rafael Espindola authored
llvm-svn: 122669
-