Commits · 468aa3b20c4284ddd2bc45a82457352943e57609 · Roger Ferrer / llvm-epi-0.8

Aug 28, 2010

handle the constant case of vector insertion. For something · d0214f3e

Chris Lattner authored Aug 28, 2010

like this:

struct S { float A, B, C, D; };

struct S g;
struct S bar() { 
  struct S A = g;
  ++A.B;
  A.A = 42;
  return A;
}

we now generate:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movq	_g@GOTPCREL(%rip), %rax
	movss	12(%rax), %xmm0
	pshufd	$16, %xmm0, %xmm0
	movss	4(%rax), %xmm2
	movss	8(%rax), %xmm1
	pshufd	$16, %xmm1, %xmm1
	unpcklps	%xmm0, %xmm1
	addss	LCPI1_0(%rip), %xmm2
	pshufd	$16, %xmm2, %xmm2
	movss	LCPI1_1(%rip), %xmm0
	pshufd	$16, %xmm0, %xmm0
	unpcklps	%xmm2, %xmm0
	ret

instead of:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movq	_g@GOTPCREL(%rip), %rax
	movss	12(%rax), %xmm0
	pshufd	$16, %xmm0, %xmm0
	movss	4(%rax), %xmm2
	movss	8(%rax), %xmm1
	pshufd	$16, %xmm1, %xmm1
	unpcklps	%xmm0, %xmm1
	addss	LCPI1_0(%rip), %xmm2
	movd	%xmm2, %eax
	shlq	$32, %rax
	addq	$1109917696, %rax       ## imm = 0x42280000
	movd	%rax, %xmm0
	ret

llvm-svn: 112345

d0214f3e

optimize bitcasts from large integers to vector into vector · dd660104

Chris Lattner authored Aug 28, 2010

element insertion from the pieces that feed into the vector.
This handles a pattern that occurs frequently due to code
generated for the x86-64 abi.  We now compile something like
this:

struct S { float A, B, C, D; };
struct S g;
struct S bar() { 
  struct S A = g;
  ++A.A;
  ++A.C;
  return A;
}

into all nice vector operations:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movq	_g@GOTPCREL(%rip), %rax
	movss	LCPI1_0(%rip), %xmm1
	movss	(%rax), %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm0
	movss	4(%rax), %xmm2
	movss	12(%rax), %xmm3
	pshufd	$16, %xmm2, %xmm2
	unpcklps	%xmm2, %xmm0
	addss	8(%rax), %xmm1
	pshufd	$16, %xmm1, %xmm1
	pshufd	$16, %xmm3, %xmm2
	unpcklps	%xmm2, %xmm1
	ret

instead of icky integer operations:

_bar:                                   ## @bar
	movq	_g@GOTPCREL(%rip), %rax
	movss	LCPI1_0(%rip), %xmm1
	movss	(%rax), %xmm0
	addss	%xmm1, %xmm0
	movd	%xmm0, %ecx
	movl	4(%rax), %edx
	movl	12(%rax), %esi
	shlq	$32, %rdx
	addq	%rcx, %rdx
	movd	%rdx, %xmm0
	addss	8(%rax), %xmm1
	movd	%xmm1, %eax
	shlq	$32, %rsi
	addq	%rax, %rsi
	movd	%rsi, %xmm1
	ret

This resolves rdar://8360454

llvm-svn: 112343

dd660104

Completely disable tail calls when fast-isel is enabled, as fast-isel · e06905d1
Dan Gohman authored Aug 28, 2010
```
doesn't currently support dealing with this.

llvm-svn: 112341
```
e06905d1
Trim a #include. · 1e06dbf8
Dan Gohman authored Aug 28, 2010
```
llvm-svn: 112340
```
1e06dbf8
Fix an index calculation thinko. · fe22f1d3
Dan Gohman authored Aug 28, 2010
```
llvm-svn: 112337
```
fe22f1d3
We don't need to custom-select VLDMQ and VSTMQ anymore. · 8ee93947
Bob Wilson authored Aug 28, 2010
```
llvm-svn: 112336
```
8ee93947
Update CMake build. Add newline at end of file. · 83f9ff04
Benjamin Kramer authored Aug 28, 2010
```
llvm-svn: 112332
```
83f9ff04

When merging Thumb2 loads/stores, do not give up when the offset is one of · ca5af129

Bob Wilson authored Aug 27, 2010

the special values that for ARM would be used with IB or DA modes.  Fall
through and consider materializing a new base address is it would be
profitable.

llvm-svn: 112329

ca5af129

Add a prototype of a new peephole optimizing pass that uses LazyValue info to... · cf7f9411

Owen Anderson authored Aug 27, 2010

Add a prototype of a new peephole optimizing pass that uses LazyValue info to simplify PHIs and select's.
This pass addresses the missed optimizations from PR2581 and PR4420.

llvm-svn: 112325

cf7f9411

Improve the precision of getConstant(). · 38f6b7fe
Owen Anderson authored Aug 27, 2010
```
llvm-svn: 112323
```
38f6b7fe

Change ARM VFP VLDM/VSTM instructions to use addressing mode #4, just like · 13ce07fa

Bob Wilson authored Aug 27, 2010

all the other LDM/STM instructions. This fixes asm printer crashes when
compiling with -O0. I've changed one of the NEON tests (vst3.ll) to run
with -O0 to check this in the future.

Prior to this change VLDM/VSTM used addressing mode #5, but not really.
The offset field was used to hold a count of the number of registers being
loaded or stored, and the AM5 opcode field was expanded to specify the IA
or DB mode, instead of the standard ADD/SUB specifier. Much of the backend
was not aware of these special cases. The crashes occured when rewriting
a frameindex caused the AM5 offset field to be changed so that it did not
have a valid submode. I don't know exactly what changed to expose this now.
Maybe we've never done much with -O0 and NEON. Regardless, there's no longer
any reason to keep a count of the VLDM/VSTM registers, so we can use
addressing mode #4 and clean things up in a lot of places.

llvm-svn: 112322

13ce07fa

Enhance the shift propagator to handle the case when you have: · 6c1395f6

Chris Lattner authored Aug 27, 2010

A = shl x, 42
...
B = lshr ..., 38

which can be transformed into:
A = shl x, 4
...

iff we can prove that the would-be-shifted-in bits
are already zero.  This eliminates two shifts in the testcase
and allows eliminate of the whole i128 chain in the real example.

llvm-svn: 112314

6c1395f6

Simplify. · f2855b14
Devang Patel authored Aug 27, 2010
```
llvm-svn: 112305
```
f2855b14

Implement a pretty general logical shift propagation · 18d7fc8f

Chris Lattner authored Aug 27, 2010

framework, which is good at ripping through bitfield
operations.  This generalize a bunch of the existing
xforms that instcombine does, such as 
  (x << c) >> c -> and
to handle intermediate logical nodes.  This is useful for
ripping up the "promote to large integer" code produced by
SRoA.

llvm-svn: 112304

18d7fc8f

Aug 27, 2010

Unsigned value cannot be < 0. · af371b49
Bob Wilson authored Aug 27, 2010
```
llvm-svn: 112300
```
af371b49
When merging adjacent operands, scan ahead and merge all equal · 15871f23
Dan Gohman authored Aug 27, 2010
```
adjacent operands at once, instead of just two at a time.

llvm-svn: 112299
```
15871f23
remove some special shift cases that have been subsumed into the · 25a198e7
Chris Lattner authored Aug 27, 2010
```
more general simplify demanded bits logic.

llvm-svn: 112291
```
25a198e7

Make the {A,+,B}<L> + {C,+,D}<L> --> Other + {A+C,+,B+D}<L> · c866bf4f

Dan Gohman authored Aug 27, 2010

transformation collect all the addrecs with the same loop
add combine them at once rather than starting everything over
at the first chance.

llvm-svn: 112290

c866bf4f

Remove now unneeded command line flag that enables 'optimize compares.' · 6628431a
Bill Wendling authored Aug 27, 2010
```
llvm-svn: 112287
```
6628431a
Fix typos in comments. · 99d4cb86
Owen Anderson authored Aug 27, 2010
```
llvm-svn: 112286
```
99d4cb86

teach the truncation optimization that an entire chain of · 73984346

Chris Lattner authored Aug 27, 2010

computation can be truncated if it is fed by a sext/zext that doesn't
have to be exactly equal to the truncation result type.

llvm-svn: 112285

73984346

Switch ScalarEvolution's main Value*->SCEV* map from std::map · 9bad2fb3
Dan Gohman authored Aug 27, 2010
```
to DenseMap.

llvm-svn: 112281
```
9bad2fb3

Add an instcombine to clean up a common pattern produced · 90cd746e

Chris Lattner authored Aug 27, 2010

by the SRoA "promote to large integer" code, eliminating
some type conversions like this:

   %94 = zext i16 %93 to i32                       ; <i32> [#uses=2]
   %96 = lshr i32 %94, 8                           ; <i32> [#uses=1]
   %101 = trunc i32 %96 to i8                      ; <i8> [#uses=1]

This also unblocks other xforms from happening, now clang is able to compile:

struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }

into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	pshufd	$1, %xmm0, %xmm2
	addss	%xmm0, %xmm2
	movdqa	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	pshufd	$1, %xmm1, %xmm0
	addss	%xmm3, %xmm0
	ret

on x86-64, instead of:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movapd	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	movd	%xmm1, %rax
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm3, %xmm0
	ret

This seems pretty close to optimal to me, at least without
using horizontal adds.  This also triggers in lots of other
code, including SPEC.

llvm-svn: 112278

90cd746e

Add alignment arguments to all the NEON load/store intrinsics. · edf722ad

Bob Wilson authored Aug 27, 2010

Update all the tests using those intrinsics and add support for
auto-upgrading bitcode files with the old versions of the intrinsics.

llvm-svn: 112271

edf722ad

Use LVI to eliminate conditional branches where we've tested a related... · 6ebbd923

Owen Anderson authored Aug 27, 2010

Use LVI to eliminate conditional branches where we've tested a related condition previously.  Update tests for this change.
This fixes PR5652.

llvm-svn: 112270

6ebbd923

Optimize SCEVComplexityCompare. Use a 3-way return instead of a 2-way · 2706567c

Dan Gohman authored Aug 27, 2010

return to avoid needing two calls to test for equivalence, and sort
addrecs by their degree before examining their operands.

llvm-svn: 112267

2706567c

Properly handle passing of FP stuff to varargs function on Win64: · c0b36921
Anton Korobeynikov authored Aug 27, 2010
```
value should be copied to the corresponding shadow reg as well.
Patch by Cameron Esfahani!

llvm-svn: 112262
```
c0b36921
MCELF: Port EmitInstruction changes from MachO streamer. Patch by Roman Divacky. · 1f601247
Benjamin Kramer authored Aug 27, 2010
```
llvm-svn: 112260
```
1f601247
MCELF: Always overwrite FixedValue. · 05e22982
Benjamin Kramer authored Aug 27, 2010
```
llvm-svn: 112259
```
05e22982

X86: Fix an encoding issue with LOCK_ADD64mr, which could lead to very hard to... · 1844a71e

Daniel Dunbar authored Aug 27, 2010

X86: Fix an encoding issue with LOCK_ADD64mr, which could lead to very hard to find miscompiles with the integrated assembler.

llvm-svn: 112250

1844a71e

Revert r112213. It is not needed. · b12ff599
Devang Patel authored Aug 26, 2010
```
llvm-svn: 112242
```
b12ff599
Simplify eliminateFrameIndex() interface back down now that PEI doesn't need · 6a770669
Jim Grosbach authored Aug 26, 2010
```
to try to re-use scavenged frame index reference registers. rdar://8277890

llvm-svn: 112241
```
6a770669
If node is not available then use FuncInfo.ValueMap to emit debug info for byval parameter. · ea134f56
Devang Patel authored Aug 26, 2010
```
llvm-svn: 112238
```
ea134f56

Remove the now obsolete frame index virtual re-use algorithm from PEI. Pre-RA · 2a1915d0

Jim Grosbach authored Aug 26, 2010

virtual base registers handle this function, and more. A bit more cleanup
to do on the interface to eliminateFrameIndex() after this.

llvm-svn: 112237

2a1915d0

optimize "integer extraction out of the middle of a vector" as produced · bfd22281
Chris Lattner authored Aug 26, 2010
```
by SRoA.  This is part of rdar://7892780, but needs another xform to
expose this.

llvm-svn: 112232
```
bfd22281

Aug 26, 2010

tidy up a bit. no functional change. · e82d5b4a
Jim Grosbach authored Aug 26, 2010
```
llvm-svn: 112228
```
e82d5b4a

optimize bitcast(trunc(bitcast(x))) where the result is a float and 'x' · d4ebd6df

Chris Lattner authored Aug 26, 2010

is a vector to be a vector element extraction.  This allows clang to
compile:

struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }

into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movapd	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	movd	%xmm1, %rax
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm3, %xmm0
	ret

instead of:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	movd	%eax, %xmm0
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movd	%xmm1, %rax
	movd	%eax, %xmm1
	addss	%xmm2, %xmm1
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm1, %xmm0
	ret

... eliminating half of the horribleness.

llvm-svn: 112227

d4ebd6df

Turn off the scavenging based frame reg reuse briefly to measure whether it's · 17da9359

Jim Grosbach authored Aug 26, 2010

still having a significant effect. It shouldn't be now that the pre-RA
virtual base reg stuff is in. Assuming that's valididated by the nightly
testers, we can simplify a lot of the PEI frame index code.

llvm-svn: 112220

17da9359

zap the now unused MVT::getIntVectorWithNumElements · e25ba0c7
Bruno Cardoso Lopes authored Aug 26, 2010
```
llvm-svn: 112218
```
e25ba0c7
Speculatively revert r112207. · 42b4ac7e
Devang Patel authored Aug 26, 2010
```
llvm-svn: 112216
```
42b4ac7e