Commits · 47f733e4ea52340b781db2af19e3ed695c5f27b0 · Roger Ferrer / llvm-epi-0.8

Dec 01, 2008

Generalize the FoldOrWithConstant method to fold for any two constants which · 47f733e4
Bill Wendling authored Dec 01, 2008
```
don't have overlapping bits.

llvm-svn: 60344
```
47f733e4
Reduce copy-and-paste code by splitting out the code into its own function. · 22e761b3
Bill Wendling authored Dec 01, 2008
```
llvm-svn: 60343
```
22e761b3
Use m_Specific() instead of double matching. · 582fe6b0
Bill Wendling authored Dec 01, 2008
```
llvm-svn: 60341
```
582fe6b0

Move pattern check outside of the if-then statement. This prevents us from... · 4eecfb65

Bill Wendling authored Dec 01, 2008

Move pattern check outside of the if-then statement. This prevents us from fiddling with constants unless we have to.

llvm-svn: 60340

4eecfb65

Rename some variables, only increment BI once at the start of the loop instead of throughout it. · 6f5bf6a7
Chris Lattner authored Dec 01, 2008
```
llvm-svn: 60339
```
6f5bf6a7
pull the predMap densemap out of the inner loop of performPRE, so · f00aae49
Chris Lattner authored Dec 01, 2008
```
that it isn't reallocated all the time.  This is a tiny speedup for
GVN: 3.90->3.88s

llvm-svn: 60338
```
f00aae49
switch a couple more calls to use array_pod_sort. · 2b07d3cc
Chris Lattner authored Dec 01, 2008
```
llvm-svn: 60337
```
2b07d3cc
don't assume iterators implicitly convert to pointers. · a29f0e19
Chris Lattner authored Dec 01, 2008
```
llvm-svn: 60336
```
a29f0e19

Introduce a new array_pod_sort function and switch LSR to use it · 2c2dd15a

Chris Lattner authored Dec 01, 2008

instead of std::sort.  This shrinks the release-asserts LSR.o file
by 1100 bytes of code on my system.

We should start using array_pod_sort where possible.

llvm-svn: 60335

2c2dd15a

Eliminate use of setvector for the DeadInsts set, just use a smallvector. · 2aebea57
Chris Lattner authored Dec 01, 2008
```
This is a lot cheaper and conceptually simpler.

llvm-svn: 60332
```
2aebea57
DeleteTriviallyDeadInstructions is always passed the · 4da78e37
Chris Lattner authored Dec 01, 2008
```
DeadInsts ivar, just use it directly.

llvm-svn: 60330
```
4da78e37

simplify DeleteTriviallyDeadInstructions again, unlike my previous · a68a5a47

Chris Lattner authored Dec 01, 2008

buggy rewrite, this notifies ScalarEvolution of a pending instruction
about to be removed and then erases it, instead of erasing it then 
notifying.

llvm-svn: 60329

a68a5a47

simplify these patterns using m_Specific. No need to grep for · 9e6b2434
Chris Lattner authored Dec 01, 2008
```
xor in testcase (or is a substring).

llvm-svn: 60328
```
9e6b2434

Teach jump threading to clean up after itself, DCE and constfolding the · 88a1f021

Chris Lattner authored Dec 01, 2008

new instructions it simplifies. Because we're threading jumps on edges
with constants coming in from PHI's, we inherently are exposing a lot more
constants to the new block. Folding them and deleting dead conditions
allows the cost model in jump threading to be more accurate as it iterates.

llvm-svn: 60327

88a1f021

The PreVerifier pass preserves everything. In practice, this · 856684d3

Chris Lattner authored Dec 01, 2008

prevents the passmgr from adding yet-another domtree invocation
for Verifier if there is already one live.

llvm-svn: 60326

856684d3

Change instcombine to use FoldPHIArgGEPIntoPHI to fold two operand PHIs · 084b3a47

Chris Lattner authored Dec 01, 2008

instead of using FoldPHIArgBinOpIntoPHI.  In addition to being more
obvious, this also fixes a problem where instcombine wouldn't merge two
phis that had different variable indices.  This prevented instcombine
from factoring big chunks of code in 403.gcc.  For example:

 insn_cuid.exit:                
-       %tmp336 = load i32** @uid_cuid, align 4      
-       %tmp337 = getelementptr %struct.rtx_def* %insn_addr.0.ph.i, i32 0, i32 3    
-       %tmp338 = bitcast [1 x %struct.rtunion]* %tmp337 to i32*               
-       %tmp339 = load i32* %tmp338, align 4           
-       %tmp340 = getelementptr i32* %tmp336, i32 %tmp339     
        br label %bb62
 
 bb61:       
-       %tmp341 = load i32** @uid_cuid, align 4     
-       %tmp342 = getelementptr %struct.rtx_def* %insn, i32 0, i32 3        
-       %tmp343 = bitcast [1 x %struct.rtunion]* %tmp342 to i32*           
-       %tmp344 = load i32* %tmp343, align 4        
-       %tmp345 = getelementptr i32* %tmp341, i32 %tmp344          
        br label %bb62
 
 bb62:      
-       %iftmp.62.0.in = phi i32* [ %tmp345, %bb61 ], [ %tmp340, %insn_cuid.exit ]         
+       %insn.pn2 = phi %struct.rtx_def* [ %insn, %bb61 ], [ %insn_addr.0.ph.i, %insn_cuid.exit ]         
+       %tmp344.pn.in.in = getelementptr %struct.rtx_def* %insn.pn2, i32 0, i32 3     
+       %tmp344.pn.in = bitcast [1 x %struct.rtunion]* %tmp344.pn.in.in to i32*  
+       %tmp341.pn = load i32** @uid_cuid     
+       %tmp344.pn = load i32* %tmp344.pn.in 
+       %iftmp.62.0.in = getelementptr i32* %tmp341.pn, i32 %tmp344.pn   
        %iftmp.62.0 = load i32* %iftmp.62.0.in     

llvm-svn: 60325

084b3a47

Teach inst combine to merge GEPs through PHIs. This is really · 9d02a70a

Chris Lattner authored Dec 01, 2008

important because it is sinking the loads using the GEPs, but
not the GEPs themselves.  This triggers 647 times on 403.gcc
and makes the .s file much much nicer.  For example before:

        je      LBB1_87 ## bb78
LBB1_62:        ## bb77
        leal    84(%esi), %eax
LBB1_63:        ## bb79
        movl    (%eax), %eax
...
LBB1_87:        ## bb78
        movl    $0, 4(%esp)
        movl    %esi, (%esp)
        call    L_make_decl_rtl$stub
        jmp     LBB1_62 ## bb77


after:

        jne     LBB1_63 ## bb79
LBB1_62:        ## bb78
        movl    $0, 4(%esp)
        movl    %esi, (%esp)
        call    L_make_decl_rtl$stub
LBB1_63:        ## bb79
        movl    84(%esi), %eax

The input code was (and the GEPs are merged and
the PHI is now eliminated by instcombine):

        br i1 %tmp233, label %bb78, label %bb77
bb77:           
        %tmp234 = getelementptr %struct.tree_node* %t_addr.3, i32 0, i32 0, i32 22              
        br label %bb79
bb78:           
        call void @make_decl_rtl(%struct.tree_node* %t_addr.3, i8* null) nounwind
        %tmp235 = getelementptr %struct.tree_node* %t_addr.3, i32 0, i32 0, i32 22              
        br label %bb79
bb79:           
        %iftmp.12.0.in = phi %struct.rtx_def** [ %tmp235, %bb78 ], [ %tmp234, %bb77 ]           
        %iftmp.12.0 = load %struct.rtx_def** %iftmp.12.0.in             

llvm-svn: 60322

9d02a70a

testcase for my previous commit. · 8facc59e
Chris Lattner authored Dec 01, 2008
```
llvm-svn: 60315
```
8facc59e

Make GVN be more intelligent about redundant load · 9ce8995d

Chris Lattner authored Dec 01, 2008

elimination: when finding dependent load/stores, realize that
they are the same if aliasing claims must alias instead of relying
on the pointers to be exactly equal.  This makes load elimination
more aggressive.  For example, on 403.gcc, we had:

<     68 gvn    - Number of instructions PRE'd
< 152718 gvn    - Number of instructions deleted
<  49699 gvn    - Number of loads deleted
<   6153 memdep - Number of dirty cached non-local responses
< 169336 memdep - Number of fully cached non-local responses
< 162428 memdep - Number of uncached non-local responses

now we have:

>     64 gvn    - Number of instructions PRE'd
> 153623 gvn    - Number of instructions deleted
>  49856 gvn    - Number of loads deleted
>   5022 memdep - Number of dirty cached non-local responses
> 159030 memdep - Number of fully cached non-local responses
> 162443 memdep - Number of uncached non-local responses

That's an extra 157 loads deleted and extra 905 other instructions nuked.

This slows down GVN very slightly, from 3.91 to 3.96s.

llvm-svn: 60314

9ce8995d

Reimplement the non-local dependency data structure in terms of a sorted · 7e61dafc

Chris Lattner authored Dec 01, 2008

vector instead of a densemap.  This shrinks the memory usage of this thing
substantially (the high water mark) as well as making operations like
scanning it faster.  This speeds up memdep slightly, gvn goes from
3.9376 to 3.9118s on 403.gcc

This also splits out the statistics for the cached non-local case to
differentiate between the dirty and clean cached case.  Here's the stats
for 403.gcc:

  6153 memdep - Number of dirty cached non-local responses
169336 memdep - Number of fully cached non-local responses
162428 memdep - Number of uncached non-local responses

yay for caching :)

llvm-svn: 60313

7e61dafc

Implement ((A|B)&1)|(B&-2) -> (A&1) | B transformation. This also takes care of · 5b902c5b
Bill Wendling authored Dec 01, 2008
```
permutations of this pattern.

llvm-svn: 60312
```
5b902c5b
Fix bogus assertion using getSExtValue for legitimate values, like -1 in · 6f0730ff
Eli Friedman authored Dec 01, 2008
```
an 128-bit-wide integer.  No testcase; the issue I ran into depends on 
local changes.

llvm-svn: 60311
```
6f0730ff
Cache analyses in ivars and add some useful DEBUG output. · 8541edec
Chris Lattner authored Dec 01, 2008
```
This speeds up GVN from 4.0386s to 3.9376s.

llvm-svn: 60310
```
8541edec
improve indentation, do cheap checks before expensive ones, · 80c7d81e
Chris Lattner authored Nov 30, 2008
```
remove some fixme's.  This speeds up GVN very slightly on 403.gcc 
(4.06->4.03s)

llvm-svn: 60309
```
80c7d81e
Eliminate the DepResultTy abstraction. It is now completely · 47e81d0e
Chris Lattner authored Nov 30, 2008
```
redundant with MemDepResult, and MemDepResult has a nicer interface.

llvm-svn: 60308
```
47e81d0e

Nov 30, 2008
- Minor cleanup: use getTrue and getFalse where appropriate. No · 11c15a5d
  Eli Friedman authored Nov 30, 2008
```
functional change.

llvm-svn: 60307
```
  11c15a5d
- Some minor cleanups to instcombine; no functionality change. · 55e4becb
  Eli Friedman authored Nov 30, 2008
```
Note that the FoldOpIntoPhi call is dead because it's impossible for the 
first operand of a subtraction to be both a ConstantInt and a PHINode.

llvm-svn: 60306
```
  55e4becb
- Cache TargetData/AliasAnalysis in the pass instead of calling · 13cae612
  Chris Lattner authored Nov 30, 2008
```
getAnalysis<>.  getAnalysis<> is apparently extremely expensive.
Doing this speeds up GVN on 403.gcc by 16%!

llvm-svn: 60304
```
  13cae612
- add the rest of the comparison routines. · 15b59861
  Chris Lattner authored Nov 30, 2008
```
llvm-svn: 60303
```
  15b59861
- Add instruction combining for ((A&~B)|(~A&B)) -> A^B and all permutations. · de89bc27
  Bill Wendling authored Nov 30, 2008
```
llvm-svn: 60291
```
  de89bc27
- Implement (A&((~A)|B)) -> A&B transformation in the instruction combiner. This · 9eef421e
  Bill Wendling authored Nov 30, 2008
```
takes care of all permutations of this pattern.

llvm-svn: 60290
```
  9eef421e
- Forgot one remaining call to getSExtValue(). · 2fe32298
  Bill Wendling authored Nov 30, 2008
```
llvm-svn: 60289
```
  2fe32298
- getSExtValue() doesn't work for ConstantInts with bitwidth > 64 bits. Use all · 2d2e7861
  Bill Wendling authored Nov 30, 2008
```
APInt calls instead.

This fixes PR3144.

llvm-svn: 60288
```
  2d2e7861
- Optimize memmove and memset into the LLVM builtins. Note that these · 09bc6109
  Eli Friedman authored Nov 30, 2008
```
only show up in code from front-ends besides llvm-gcc, like clang.

llvm-svn: 60287
```
  09bc6109
- A couple small cleanups, plus a new potential optimization. · e9ef170d
  Eli Friedman authored Nov 30, 2008
```
llvm-svn: 60286
```
  e9ef170d
- Moving potential optimizations out of PR2330 into lib/Target/README.txt. · e16c0ff1
  Eli Friedman authored Nov 30, 2008
```
Hopefully this isn't too much stuff to dump into this file.

llvm-svn: 60285
```
  e16c0ff1
- Followup to r60283: optimize arbitrary width signed divisions as well · c8228d26
  Eli Friedman authored Nov 30, 2008
```
as unsigned divisions.  Same caveats as before.

llvm-svn: 60284
```
  c8228d26
- Fix for PR2164: allow transforming arbitrary-width unsigned divides into · 1b7fc154
  Eli Friedman authored Nov 30, 2008
```
multiplies.

Some more cleverness would be nice, though. It would be nice if we 
could do this transformation on illegal types.  Also, we would 
prefer a narrower constant when possible so that we can use a narrower
multiply, which can be cheaper.

llvm-svn: 60283
```
  1b7fc154
- Don't make TwoToExp signed by default. · 7abf352f
  Bill Wendling authored Nov 30, 2008
```
llvm-svn: 60279
```
  7abf352f
- From Hacker's Delight: · af200e92
  Bill Wendling authored Nov 30, 2008
```
"For signed integers, the determination of overflow of x*y is not so simple. If
x and y have the same sign, then overflow occurs iff xy > 2**31 - 1. If they
have opposite signs, then overflow occurs iff xy < -2**31."

In this case, x == -1.

llvm-svn: 60278
```
  af200e92