Skip to content
  1. Jan 02, 2011
    • Chris Lattner's avatar
      Allow loop-idiom to run on multiple BB loops, but still only scan the loop · ddf58010
      Chris Lattner authored
      header for now for memset/memcpy opportunities.  It turns out that loop-rotate
      is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for 
      loops" into 2 basic block loops that loop-idiom was ignoring.
      
      With this fix, we form many *many* more memcpy and memsets than before, including
      on the "history" loops in the viterbi benchmark, which look like this:
      
              for (j=0; j<MAX_history; ++j) {
                history_new[i][j+1] = history[2*i][j];
              }
      
      Transforming these loops into memcpy's speeds up the viterbi benchmark from
      11.98s to 3.55s on my machine.  Woo.
      
      llvm-svn: 122685
      ddf58010
    • Cameron Zwarich's avatar
      Remove the #ifdef'd code for balancing the eval-link data structure. It doesn't · 528511b1
      Cameron Zwarich authored
      compile, and everyone's tests have shown it to be slower in practice, even for
      quite large graphs.
      
      I also hope to do an optimization that is only correct with the simpler data
      structure, which would break this even further.
      
      llvm-svn: 122684
      528511b1
    • Chris Lattner's avatar
      remove debugging code. · 5b5a043d
      Chris Lattner authored
      llvm-svn: 122683
      5b5a043d
    • Chris Lattner's avatar
      add some -stats output. · 12f91bef
      Chris Lattner authored
      llvm-svn: 122682
      12f91bef
    • Chris Lattner's avatar
      improve loop rotation to use CodeMetrics to analyze the · 679572e5
      Chris Lattner authored
      size of a loop header instead of its own code size estimator.
      This allows it to handle bitcasts etc more precisely.
      
      llvm-svn: 122681
      679572e5
    • Cameron Zwarich's avatar
      Speed up dominator computation some more by optimizing bucket processing. When · a0800337
      Cameron Zwarich authored
      naively implemented, the Lengauer-Tarjan algorithm requires a separate bucket
      for each vertex. However, this is unnecessary, because each vertex is only
      placed into a single bucket (that of its semidominator), and each vertex's
      bucket is processed before it is added to any bucket itself.
      
      Instead of using a bucket per vertex, we use a single array Buckets that has two
      purposes. Before the vertex V with DFS number i is processed, Buckets[i] stores
      the index of the first element in V's bucket. After V's bucket is processed,
      Buckets[i] stores the index of the next element in the bucket to which V now
      belongs, if any.
      
      Reading from the buckets can also be optimized. Instead of processing the bucket
      of V's parent at the end of processing V, we process the bucket of V itself at
      the beginning of processing V. This means that the case of the root vertex can
      be simplified somewhat. It also means that we don't need to look up the DFS
      number of the semidominator of every node in the bucket we are processing,
      since we know it is the current index being processed.
      
      This is a 6.5% speedup running -domtree on test-suite + SPEC2000/2006, with
      larger speedups of around 12% on the larger benchmarks like GCC.
      
      llvm-svn: 122680
      a0800337
    • Rafael Espindola's avatar
      Add support for passing variables declared to use a xmm register to asm · 47731fe3
      Rafael Espindola authored
      statements using the "x" constraint.
      
      llvm-svn: 122679
      47731fe3
    • Chris Lattner's avatar
      teach loop idiom recognition to form memcpy's from simple loops. · 85b6d81d
      Chris Lattner authored
      llvm-svn: 122678
      85b6d81d
    • Nick Lewycky's avatar
      Remove functions from the FnSet when one of their callee's is being merged. This · 4e250c82
      Nick Lewycky authored
      maintains the guarantee that the DenseSet expects two elements it contains to
      not go from inequal to equal under its nose.
      
      As a side-effect, this also lets us switch from iterating to a fixed-point to
      actually maintaining a work queue of functions to look at again, and we don't
      add thunks to our work queue so we don't need to detect and ignore them.
      
      llvm-svn: 122677
      4e250c82
  2. Jan 01, 2011
  3. Dec 31, 2010
Loading