Commits · 1dae8766b18a9d5757ff76668bbd3fd05213c3d5 · Roger Ferrer / llvm-epi-0.8

Jan 03, 2010

fix PR5930, allowing the asmprinter to emit difference between · 1dae8766
Chris Lattner authored Jan 03, 2010
```
two labels as a truncate.

llvm-svn: 92455
```
1dae8766

it isn't safe to speculative load from a malloc, it might have · fd11f49b

Chris Lattner authored Jan 03, 2010

returned null, and may not have been big enough in any case.  
Thanks to Jay Foad for pointing this out!

llvm-svn: 92452

fd11f49b

differences between two blockaddress's don't cause a · a7cfc43a
Chris Lattner authored Jan 03, 2010
```
global variable initializer to require relocations.

llvm-svn: 92450
```
a7cfc43a
pull my debug hooks out, I'm done with this xform for now. · 48218e42
Chris Lattner authored Jan 03, 2010
```
llvm-svn: 92446
```
48218e42
Small cleanups, refactor some duplicated code into a single method. No · 475d3d12
Nick Lewycky authored Jan 03, 2010
```
functionality change.

llvm-svn: 92445
```
475d3d12

generalize the previous transformation to handle indexing into · fca0c8f9

Chris Lattner authored Jan 03, 2010

arrays of structs and other arrays, so long as all the subsequent
indexes are constants.  This triggers frequently for stuff like:

@divisions = internal constant [29 x [2 x i32]] [[2 x i32] zeroinitializer, [2 x i32] [i32 0, i32 1], [2 x i32] [i32 0, i32 2], [2 x i32] [i32 0, i32 1], [2 x i32] zeroinitializer, [2 x i32] [i32 0, i32 1], [2 x i32] [i32 0, i32 1], [2 x i32] [i32 0, i32 2], [2 x i32] [i32 0, i32 2], [2 x i32] zeroinitializer, [2 x i32] zeroinitializer, [2 x i32] zeroinitializer, [2 x i32] [i32 0, i32 2], [2 x i32] [i32 0, i32 1], [2 x i32] zeroinitializer, [2 x i32] [i32 1, i32 0], [2 x i32] [i32 1, i32 1], [2 x i32] [i32 1, i32 1], [2 x i32] [i32 1, i32 2], [2 x i32] [i32 1, i32 1], [2 x i32] [i32 1, i32 0], [2 x i32] [i32 1, i32 2], [2 x i32] [i32 1, i32 2], [2 x i32] [i32 1, i32 0], [2 x i32] [i32 1, i32 0], [2 x i32] [i32 1, i32 0], [2 x i32] [i32 1, i32 1], [2 x i32] [i32 1, i32 2], [2 x i32] [i32 1, i32 2]], align 32 ; <[29 x [2 x i32]]*> [#uses=50]

	  %623 = getelementptr inbounds [29 x [2 x i32]]* @divisions, i64 0, i64 %619, i64 0 ; <i32*> [#uses=1]
	   %684 = icmp eq i32 %683, 999 

also for the "my_defs" table in 'gs', etc.

llvm-svn: 92444

fca0c8f9

Cleanup. · ff9cd7ac
Nick Lewycky authored Jan 03, 2010
```
llvm-svn: 92436
```
ff9cd7ac

Jan 02, 2010

teach instcombine to optimize idioms like A[i]&42 == 0. This · 98ad2b56

Chris Lattner authored Jan 02, 2010

occurs in 403.gcc in mode_mask_array, in safe-ctype.c (which
is copied in multiple apps) in _sch_istable, etc.

llvm-svn: 92427

98ad2b56

Teach the table lookup optimization to generate range compares · b56bef45

Chris Lattner authored Jan 02, 2010

when a consequtive sequence of elements all satisfies the 
predicate.  Like the double compare case, this generates better
code than the magic constant case and generalizes to more than
32/64 element array lookups.

Here are some examples where it triggers.  From 403.gcc, most
accesses to the rtx_class array are handled, e.g.:

@rtx_class = constant [153 x i8] c"xxxxxmmmmmmmmxxxxxxxxxxxxmxxxxxxiiixxxxxxxxxxxxxxxxxxxooxooooooxxoooooox3x2c21c2222ccc122222ccccaaaaaa<<<<<<<<<<<<<<<<<<111111111111bbooxxxxxxxxxxcc2211x", align 32 ; <[153 x i8]*> [#uses=547]
   %142 = icmp eq i8 %141, 105
@rtx_class = constant [153 x i8] c"xxxxxmmmmmmmmxxxxxxxxxxxxmxxxxxxiiixxxxxxxxxxxxxxxxxxxooxooooooxxoooooox3x2c21c2222ccc122222ccccaaaaaa<<<<<<<<<<<<<<<<<<111111111111bbooxxxxxxxxxxcc2211x", align 32 ; <[153 x i8]*> [#uses=543]
	   %165 = icmp eq i8 %164, 60      

Also, most of the 59-element arrays (mode_class/rid_to_yy, etc) 
optimized before are actually range compares.  This lets 32-bit
machines optimize them.

400.perlbmk has stuff like this:

400.perlbmk: PL_regkind, even for 32-bit:
@PL_regkind = constant [62 x i8] c"\00\00\02\02\02\06\06\06\06\09\09\0B\0B\0D\0E\0E\0E\11\12\12\14\14\16\16\18\18\1A\1A\1C\1C\1E\1F !!!$$&'((((,-.///88886789:;8$", align 32 ; <[62 x i8]*> [#uses=4]
	   %811 = icmp ne i8 %810, 33 

@PL_utf8skip = constant [256 x i8] c"\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\03\03\03\03\03\03\03\03\03\03\03\03\03\03\03\03\04\04\04\04\04\04\04\04\05\05\05\05\06\06\07\0D", align 32 ; <[256 x i8]*> [#uses=94]
	   %12 = icmp ult i8 %10, 2
           
etc.

llvm-svn: 92426

b56bef45

theoretically the negate we find could be in a different function, check · e199d2df
Chris Lattner authored Jan 02, 2010
```
for this case.

llvm-svn: 92425
```
e199d2df
use enums for the over/underdefined markers for clarity. Switch · 2fa4ec70
Chris Lattner authored Jan 02, 2010
```
to using -2/-3 instead of -1/-2 for a future xform.

llvm-svn: 92423
```
2fa4ec70
remove the random sampling framework, which is not maintained anymore. · 351e22aa
Chris Lattner authored Jan 02, 2010
```
If there is interest, it can be resurrected from SVN.  PR4912.

llvm-svn: 92422
```
351e22aa
Fix logic error in previous commit. The != case needs to become an or, not an · a67519be
Nick Lewycky authored Jan 02, 2010
```
and.

llvm-svn: 92419
```
a67519be
Optimize pointer comparison into the typesafe form, now that the backends will · 357d41b3
Nick Lewycky authored Jan 02, 2010
```
handle them efficiently. This is the opposite direction of the transformation
we used to have here.

llvm-svn: 92418
```
357d41b3

Generalize the previous xform to handle cases where exactly · cfda435c

Chris Lattner authored Jan 02, 2010

two elements match or don't match with two comparisons.  For
example, the testcase compiles into:

define i1 @test5(i32 %X) {
  %1 = icmp eq i32 %X, 2                          ; <i1> [#uses=1]
  %2 = icmp eq i32 %X, 7                          ; <i1> [#uses=1]
  %R = or i1 %1, %2                               ; <i1> [#uses=1]
  ret i1 %R
}

This generalizes the previous xforms when the array is larger than
64 elements (and this case matches) and generates better code for
cases where it overlaps with the magic bitshift case.

This generalizes more cases than you might expect.  For example,
400.perlbmk has:

@PL_utf8skip = constant [256 x i8] c"\01\01\01\...
%15 = icmp ult i8 %7, 7

403.gcc has:
@rid_to_yy = internal constant [114 x i16] [i16 259, i16 260, ...
%18 = icmp eq i16 %16, 295 

and xalancbmk has a bunch of examples, such as 
_ZN11xercesc_2_5L15gCombiningCharsE and _ZN11xercesc_2_5L10gBaseCharsE.

llvm-svn: 92417

cfda435c

fix a miscompilation I introduced of cdecl with a late change. · c6ac0784
Chris Lattner authored Jan 02, 2010
```
llvm-svn: 92416
```
c6ac0784

enhance the compare/load/index optimization to work on *any* load · 935a4a60

Chris Lattner authored Jan 02, 2010

from a global with 32/64 elements or less (depending on whether
i64 is native on the target), generating a bitshift idiom to 
determine the result.  For example, on test4 we produce:

define i1 @test4(i32 %X) {
  %1 = lshr i32 933, %X                           ; <i32> [#uses=1]
  %2 = and i32 %1, 1                              ; <i32> [#uses=1]
  %R = icmp ne i32 %2, 0                          ; <i1> [#uses=1]
  ret i1 %R
}

This triggers in a number of interesting cases, for example, here's an
fp case:
@A.3255 = internal constant [4 x double] [double 4.100000e+00, double -3.900000e+00, double -1.000000e+00, double 1.000000e+00], align 32 ; <[4 x double]*> [#uses=7]
...
	   %7 = fcmp olt double %3, 0.000000e+00

In this case we make the slen2_tab global dead, which is nice:
@slen2_tab = internal constant [16 x i32] [i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 1, i32 2, i32 3, i32 1, i32 2, i32 3, i32 2, i32 3], align 32 ; <[16 x i32]*> [#uses=1]
...
	   %204 = icmp eq i32 %46, 0     

Perl has a bunch of these, also on the 'Perl_regkind' array:
@Perl_yygindex = internal constant [51 x i16] [i16 0, i16 0, i16 0, i16 0, i16 374, i16 351, i16 0, i16 -12, i16 0, i16 946, i16 413, i16 -83, i16 0, i16 0, i16 0, i16 -311, i16 -13, i16 4007, i16 2893, i16 0, i16 0, i16 0, i16 0, i16 0, i16 372, i16 -8, i16 0, i16 0, i16 246, i16 -131, i16 43, i16 86, i16 208, i16 -45, i16 -169, i16 987, i16 0, i16 0, i16 0, i16 0, i16 308, i16 0, i16 -271, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0], align 32 ; <[51 x i16]*> [#uses=1]
...
  %1364 = icmp eq i16 %1361, 0

186.crafty really likes this on 64-bit machines, because it triggers on a bunch of globals like this:
@white_outpost = internal constant [64 x i8] c"\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\02\02\00\00\00\00\00\04\05\05\04\00\00\00\00\03\06\06\03\00\00\00\00\00\01\01\00\00\00\00\00\00\00\00\00\00\00", align 32 ; <[64 x i8]*> [#uses=2]

However the big winner is 403.gcc, which triggers hundreds of times, eliminating all the accesses to the 57-element arrays 'mode_class', mode_unit_size, mode_bitsize, regclass_map, etc.

go 64-bit machines :)

llvm-svn: 92415

935a4a60

enhance the previous optimization to work with fcmp in addition · b1567bd5
Chris Lattner authored Jan 02, 2010
```
to icmp.

llvm-svn: 92412
```
b1567bd5

Teach instcombine to fold compares of loads from constant · a061859c

Chris Lattner authored Jan 02, 2010

arrays with variable indices into a comparison of the index
with a constant.  The most common occurrence of this that
I see by far is stuff like:

if ("foobar"[i] == '\0') ...

which we compile into: if (i == 6), saving a load and 
materialization of the global address.  This also exposes 
loop trip count information to later passes in many cases.

This triggers hundreds of times in xalancbmk, which is where I first
noticed it, but it also triggers in many other apps.  Here are a few 
interesting ones from various apps:

@must_be_connected_without = internal constant [8 x i8*] [i8* getelementptr inbounds ([3 x i8]* @.str64320, i64 0, i64 0), i8* getelementptr inbounds ([3 x i8]* @.str27283, i64 0, i64 0), i8* getelementptr inbounds ([4 x i8]* @.str71327, i64 0, i64 0), i8* getelementptr inbounds ([4 x i8]* @.str72328, i64 0, i64 0), i8* getelementptr inbounds ([3 x i8]* @.str18274, i64 0, i64 0), i8* getelementptr inbounds ([6 x i8]* @.str11267, i64 0, i64 0), i8* getelementptr inbounds ([3 x i8]* @.str32288, i64 0, i64 0), i8* null], align 32 ; <[8 x i8*]*> [#uses=2]
  %scevgep.i = getelementptr [8 x i8*]* @must_be_connected_without, i64 0, i64 %indvar.i ; <i8**> [#uses=1]
  %17 = load ...
  %18 = icmp eq i8* %17, null                     ; <i1> [#uses=1]
-> icmp eq i64 %indvar.i, 7 


@yytable1095 = internal constant [84 x i8] c"\12\01(\05\06\07\08\09\0A\0B\0C\0D\0E1\0F\10\11266\1D: \10\11,-,0\03'\10\11B6\04\17&\18\1945\05\06\07\08\09\0A\0B\0C\0D\0E\1E\0F\10\11*\1A\1B\1C$3+>#%;<IJ=ADFEGH9KL\00\00\00C", align 32 ; <[84 x i8]*> [#uses=2]
  %57 = getelementptr inbounds [84 x i8]* @yytable1095, i64 0, i64 %56 ; <i8*> [#uses=1]
   %mode.0.in = getelementptr inbounds [9 x i32]* @mb_mode_table, i64 0, i64 %.pn ; <i32*> [#uses=1]
load ...
   %64 = icmp eq i8 %58, 4                         ; <i1> [#uses=1]
-> icmp eq i64 %.pn, 35             ; <i1> [#uses=0]


@gsm_DLB = internal constant [4 x i16] [i16 6554, i16 16384, i16 26214, i16 32767]
%scevgep.i = getelementptr [4 x i16]* @gsm_DLB, i64 0, i64 %indvar.i ; <i16*> [#uses=1]
%425 = load %scevgep.i
%426 = icmp eq i16 %425, -32768                 ; <i1> [#uses=0]
-> false

llvm-svn: 92411

a061859c

constant fold nasty constant expressions formed by llvm-gcc, · 8fb74c6e
Chris Lattner authored Jan 02, 2010
```
wrapping up PR3351.

llvm-svn: 92410
```
8fb74c6e

remove the instcombine transformations that are inserting nasty · 2e4be2c3

Chris Lattner authored Jan 02, 2010

pointer to int casts that confuse later optimizations.  See PR3351
for details.

This improves but doesn't complete fix 483.xalancbmk because llvm-gcc
does this xform in GCC's "fold" routine as well.  Clang++ will do
better I guess.

llvm-svn: 92408

2e4be2c3

Teach codegen to handle: · 1eea3b0a

Chris Lattner authored Jan 02, 2010

 (X != null) | (Y != null) --> (X|Y) != 0
 (X == null) & (Y == null) --> (X|Y) == 0

so that instcombine can stop doing this for pointers.  This is part of PR3351,
which is a case where instcombine doing this for pointers (inserting ptrtoint)
is pessimizing code.

llvm-svn: 92406

1eea3b0a

whitespace cleanup · 24576a5c
Chris Lattner authored Jan 01, 2010
```
llvm-svn: 92404
```
24576a5c
add a simple instcombine xform, simplify another one to use hasAllZeroIndices() · faf1337a
Chris Lattner authored Jan 01, 2010
```
instead of hand rolling a loop.

llvm-svn: 92403
```
faf1337a

Jan 01, 2010

generalize the pointer difference optimization to handle · 30c0a283

Chris Lattner authored Jan 01, 2010

a constantexpr gep on the 'base' side of the expression.
This completes comment #4 in PR3351, which comes from
483.xalancbmk.

llvm-svn: 92402

30c0a283

teach instcombine to optimize pointer difference idioms involving constant · 4394f717
Chris Lattner authored Jan 01, 2010
```
expressions.  This is a step towards comment #4 in PR3351.

llvm-svn: 92401
```
4394f717
use 'match' to simplify some code. · 9d4c5414
Chris Lattner authored Jan 01, 2010
```
llvm-svn: 92400
```
9d4c5414
implement the transform requested in PR5284 · 25c87e9c
Chris Lattner authored Jan 01, 2010
```
llvm-svn: 92398
```
25c87e9c

Fix a warning on gcc 4.4. · 5c35d2f6

Mikhail Glushenkov authored Jan 01, 2010

SelectionDAGBuilder.cpp:4294: warning: suggest explicit braces to avoid
ambiguous ‘else’

llvm-svn: 92395

5c35d2f6

Trailing whitespace, 80-col violations. · 2abe1b70
Mikhail Glushenkov authored Jan 01, 2010
```
llvm-svn: 92394
```
2abe1b70

Teach codegen to lower llvm.powi to an efficient (but not optimal) · 39f18e54

Chris Lattner authored Jan 01, 2010

multiply sequence when the power is a constant integer.  Before, our
codegen for std::pow(.., int) always turned into a libcall, which was
really inefficient.

This should also make many gfortran programs happier I'd imagine.

llvm-svn: 92388

39f18e54

add missing line. · ee1f861d
Chris Lattner authored Jan 01, 2010
```
llvm-svn: 92384
```
ee1f861d
add a few trivial instcombines for llvm.powi. · 8330daf7
Chris Lattner authored Jan 01, 2010
```
llvm-svn: 92383
```
8330daf7
update this. To take the next step, llvm.powi should be generalized to work · 71cf7c25
Chris Lattner authored Jan 01, 2010
```
on integers as well and codegen should lower them to branch trees.

llvm-svn: 92382
```
71cf7c25

When factoring multiply expressions across adds, factor both · 0c59ac3f

Chris Lattner authored Jan 01, 2010

positive and negative forms of constants together.  This 
allows us to compile:

int foo(int x, int y) {
    return (x-y) + (x-y) + (x-y);
}

into:

_foo:                                                       ## @foo
	subl	%esi, %edi
	leal	(%rdi,%rdi,2), %eax
	ret

instead of (where the 3 and -3 were not factored):

_foo:
        imull   $-3, 8(%esp), %ecx
        imull   $3, 4(%esp), %eax
        addl    %ecx, %eax
        ret

this started out as:
    movl    12(%ebp), %ecx
    imull   $3, 8(%ebp), %eax
    subl    %ecx, %eax
    subl    %ecx, %eax
    subl    %ecx, %eax
    ret

This comes from PR5359.

llvm-svn: 92381

0c59ac3f

clean up some comments. · a552683f
Chris Lattner authored Jan 01, 2010
```
llvm-svn: 92377
```
a552683f
switch from std::map to DenseMap for rank data structures. · 17229a7c
Chris Lattner authored Jan 01, 2010
```
llvm-svn: 92375
```
17229a7c
Remove derelict serialization code. · 2fdca4b7
Ted Kremenek authored Dec 31, 2009
```
llvm-svn: 92374
```
2fdca4b7

Dec 31, 2009

reuse negates where possible instead of always creating them from scratch. · fed33976

Chris Lattner authored Dec 31, 2009

This allows us to optimize test12 into:

define i32 @test12(i32 %X) {
  %factor = mul i32 %X, -3                        ; <i32> [#uses=1]
  %Z = add i32 %factor, 6                         ; <i32> [#uses=1]
  ret i32 %Z
}

instead of:

define i32 @test12(i32 %X) {
  %Y = sub i32 6, %X                              ; <i32> [#uses=1]
  %C = sub i32 %Y, %X                             ; <i32> [#uses=1]
  %Z = sub i32 %C, %X                             ; <i32> [#uses=1]
  ret i32 %Z
}

llvm-svn: 92373

fed33976

we don't need a smallptrset to detect duplicates, the values are · 60c2ca74
Chris Lattner authored Dec 31, 2009
```
sorted, so we can just do a linear scan.

llvm-svn: 92372
```
60c2ca74