Commits · 74f2e46eeff9e5ad001259a74c56a1b4d71776e3 · Roger Ferrer / llvm-epi-0.8

Apr 22, 2013

Clarify that llvm.used can contain aliases. · 74f2e46e

Rafael Espindola authored Apr 22, 2013

Also add a check for llvm.used in the verifier and simplify clients now that
they can assume they have a ConstantArray.

llvm-svn: 180019

74f2e46e

No really, don't store anything to this since it's unconditionally · cc2cfe42
Eric Christopher authored Apr 22, 2013
```
set below.

llvm-svn: 180015
```
cc2cfe42
Remove variable store that is never read. · 6647fb2c
Eric Christopher authored Apr 22, 2013
```
llvm-svn: 180014
```
6647fb2c
Remove variable store that is never read. · 845c2ca7
Eric Christopher authored Apr 22, 2013
```
llvm-svn: 180013
```
845c2ca7

Fix for 5.5 Parameter Passing --> Stage C: · f80f9513

Stepan Dyatkovskiy authored Apr 22, 2013

 -- C.4 and C.5 statements, when NSAA is not equal to SP.
 -- C.1.cp statement for VA functions. Note: There are no VFP CPRCs in a
    variadic procedure.

Before this patch "NSAA != 0" means "don't use GPRs anymore ". But there are
some exceptions in AAPCS.
1. For non VA function: allocate all VFP regs for CPRC. When all VFPs are allocated
   CPRCs would be sent to stack, while non CPRCs may be still allocated in GRPs.
2. Check that for VA functions all params uses GPRs and then stack.
   No exceptions, no CPRCs here.

llvm-svn: 180011

f80f9513

Tidy. · 44c6aa67
Eric Christopher authored Apr 22, 2013
```
llvm-svn: 180000
```
44c6aa67
Update comment. Whitespace. · 25e3509c
Eric Christopher authored Apr 22, 2013
```
llvm-svn: 179999
```
25e3509c

Revert "Revert "PR14606: debug info imported_module support"" · f55abeaf

David Blaikie authored Apr 22, 2013

This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll

I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
though the debug info was clearly invalid on all of them, but this ought to fix
it.

llvm-svn: 179996

f55abeaf

Convert windows line endings to linux/unix line endings. · 7af39d7d
Craig Topper authored Apr 22, 2013
```
llvm-svn: 179995
```
7af39d7d
Fix indentation. No functional change. · 2172ad64
Craig Topper authored Apr 22, 2013
```
llvm-svn: 179994
```
2172ad64
Put 'else' on same line as preceding curly brace per coding standards. No functional change. · f15655b2
Craig Topper authored Apr 22, 2013
```
llvm-svn: 179993
```
f15655b2
Remove an unreachable 'break' following a 'return'. · b5ba3d3b
Craig Topper authored Apr 22, 2013
```
llvm-svn: 179991
```
b5ba3d3b

Legalize vector truncates by parts rather than just splitting. · 563983c8

Jim Grosbach authored Apr 21, 2013

Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.

Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again.  For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
  %inlo = v4i32 extract_subvector %in, 0
  %inhi = v4i32 extract_subvector %in, 4
  %lo16 = v4i16 trunc v4i32 %inlo
  %hi16 = v4i16 trunc v4i32 %inhi
  %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
  %res = v8i8 trunc v8i16 %in16

This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.

Update the ARMTargetTransformInfo to take this improved legalization
into account.

Consider the simplified IR:

define <16 x i8> @test1(<16 x i32>* %ap) {
  %a = load <16 x i32>* %ap
  %tmp = trunc <16 x i32> %a to <16 x i8>
  ret <16 x i8> %tmp
}

define <8 x i8> @test2(<8 x i32>* %ap) {
  %a = load <8 x i32>* %ap
  %tmp = trunc <8 x i32> %a to <8 x i8>
  ret <8 x i8> %tmp
}

Previously, we would generate the truly hideous:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #20
	bic	sp, sp, #7
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d24, d25}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	vld1.64	{d18, d19}, [r2:128]
	add	r1, r0, #16
	vmovn.i32	d22, q8
	vld1.64	{d16, d17}, [r1:128]
	vmovn.i32	d20, q9
	vmovn.i32	d18, q12
	vmov.u16	r0, d22[3]
	strb	r0, [sp, #15]
	vmov.u16	r0, d22[2]
	strb	r0, [sp, #14]
	vmov.u16	r0, d22[1]
	strb	r0, [sp, #13]
	vmov.u16	r0, d22[0]
	vmovn.i32	d16, q8
	strb	r0, [sp, #12]
	vmov.u16	r0, d20[3]
	strb	r0, [sp, #11]
	vmov.u16	r0, d20[2]
	strb	r0, [sp, #10]
	vmov.u16	r0, d20[1]
	strb	r0, [sp, #9]
	vmov.u16	r0, d20[0]
	strb	r0, [sp, #8]
	vmov.u16	r0, d18[3]
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	vldmia	sp, {d16, d17}
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	mov	sp, r7
	pop	{r7}
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #12
	bic	sp, sp, #7
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d20, d21}, [r0:128]
	vmovn.i32	d18, q8
	vmov.u16	r0, d18[3]
	vmovn.i32	d16, q10
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	ldm	sp, {r0, r1}
	mov	sp, r7
	pop	{r7}
	bx	lr

Now, however, we generate the much more straightforward:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d20, d21}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	add	r1, r0, #16
	vld1.64	{d18, d19}, [r2:128]
	vld1.64	{d22, d23}, [r1:128]
	vmovn.i32	d17, q8
	vmovn.i32	d16, q9
	vmovn.i32	d18, q10
	vmovn.i32	d19, q11
	vmovn.i16	d17, q8
	vmovn.i16	d16, q9
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d18, d19}, [r0:128]
	vmovn.i32	d16, q8
	vmovn.i32	d17, q9
	vmovn.i16	d16, q8
	vmov	r0, r1, d16
	bx	lr

llvm-svn: 179989

563983c8

Apr 21, 2013

Passing arguments to varags functions under the SPARC v9 ABI. · 84ebe25d
Jakob Stoklund Olesen authored Apr 21, 2013
```
Arguments after the fixed arguments never use the floating point
registers.

llvm-svn: 179987
```
84ebe25d
Tidy up comment grammar. · d4db72db
Jim Grosbach authored Apr 21, 2013
```
llvm-svn: 179986
```
d4db72db
Fix the SETHIimm pattern for 64-bit code. · 65d32872
Jakob Stoklund Olesen authored Apr 21, 2013
```
Don't ignore the high 32 bits of the immediate.

llvm-svn: 179985
```
65d32872

SROA: Don't crash on a select with two identical operands. · 0212dc27

Benjamin Kramer authored Apr 21, 2013

This is an edge case that can happen if we modify a chain of multiple selects.
Update all operands in that case and remove the assert. PR15805.

llvm-svn: 179982

0212dc27

Revert "SimplifyCFG: If convert single conditional stores" · 6eb32b31

Arnold Schwaighofer authored Apr 21, 2013

There is the temptation to make this tranform dependent on target information as
it is not going to be beneficial on all (sub)targets. Therefore, we should
probably do this in MI Early-Ifconversion.

This reverts commit r179957. Original commit message:

"SimplifyCFG: If convert single conditional stores

This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:

a[i] =
may-alias with a[i] load
if (cond)
    a[i] = Y
into an unconditional store.

a[i] = X
may-alias with a[i] load
tmp = cond ? Y : X;
a[i] = tmp

We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case were the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.

hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.

I am going to watch performance numbers across the builtbots and will revert
this if anything unexpected comes up."

llvm-svn: 179980

6eb32b31

ARM: Use ldrd/strd to spill 64-bit pairs when available. · 798697d6

Tim Northover authored Apr 21, 2013

This allows common sp-offsets to be part of the instruction and is
probably faster on modern CPUs too.

llvm-svn: 179977

798697d6

SLPVectorize: Add support for vectorization of casts. · c57af326
Nadav Rotem authored Apr 21, 2013
```
llvm-svn: 179975
```
c57af326
SLPVectorizer: Fix a bug in the code that scans the tree in search of nodes with multiple users. · 98ad5f0f
Nadav Rotem authored Apr 21, 2013
```
We did not terminate the switch case and we executed the search routine twice.

llvm-svn: 179974
```
98ad5f0f

When we strength reduce an objc_retainBlock call to objc_retain, increment... · 3eab2e43

Michael Gottesman authored Apr 21, 2013

When we strength reduce an objc_retainBlock call to objc_retain, increment NumPeeps and make sure that Changed is set to true.

llvm-svn: 179968

3eab2e43

Fixed comment typo. · 1e430042
Michael Gottesman authored Apr 21, 2013
```
llvm-svn: 179967
```
1e430042
[objc-arc] Fixed typo in debug message. · df110ac9
Michael Gottesman authored Apr 21, 2013
```
llvm-svn: 179966
```
df110ac9
[objc-arc] Fixed comment typo. · cdb7c15c
Michael Gottesman authored Apr 21, 2013
```
llvm-svn: 179965
```
cdb7c15c
[objc-arc] Refactored OptimizeReturns so that it uses continue instead of a... · fb9ece9a
Michael Gottesman authored Apr 21, 2013
```
[objc-arc] Refactored OptimizeReturns so that it uses continue instead of a large multi-level nested if statement.

llvm-svn: 179964
```
fb9ece9a

[objc-arc] Added debug statement saying when we are resetting a sequence's progress. · 01338a44

Michael Gottesman authored Apr 20, 2013

This will make it clearer when we are actually resetting a sequence's progress
vs just changing state. This is an important distinction because the former case
clears any pointers that we are tracking while the later does not.

llvm-svn: 179963

01338a44

Compile varargs functions for SPARCv9. · a41f91ea

Jakob Stoklund Olesen authored Apr 20, 2013

With a little help from the frontend, it looks like the standard va_*
intrinsics can do the job.

Also clean up an old bitcast hack in LowerVAARG that dealt with
unaligned double loads. Load SDNodes can specify an alignment now.

Still missing: Calling varargs functions with float arguments.

llvm-svn: 179961

a41f91ea

Fix PR15800. Do not try to vectorize vectors and structs. · 8aca44a6
Nadav Rotem authored Apr 20, 2013
```
llvm-svn: 179960
```
8aca44a6

Apr 20, 2013

SimplifyCFG: If convert single conditional stores · 3546ccf4

Arnold Schwaighofer authored Apr 20, 2013

This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:

 a[i] =
 may-alias with a[i] load
 if (cond)
   a[i] = Y

into an unconditional store.

 a[i] = X
 may-alias with a[i] load
 tmp = cond ? Y : X;
 a[i] = tmp

We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case were the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.

hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.

I am going to watch performance numbers across the builtbots and will revert
this if anything unexpected comes up.

llvm-svn: 179957

3546ccf4

ARM: don't add FrameIndex offset for LDMIA (has no immediate) · d9d4211f

Tim Northover authored Apr 20, 2013

Previously, when spilling 64-bit paired registers, an LDMIA with both
a FrameIndex and an offset was produced. This kind of instruction
shouldn't exist, and the extra operand was being confused with the
predicate, causing aborts later on.

This removes the invalid 0-offset from the instruction being
produced.

llvm-svn: 179956

d9d4211f

AArch64: remove useless comment · 56862bd6
Tim Northover authored Apr 20, 2013
```
llvm-svn: 179952
```
56862bd6

Move 'kw_align' case to proper section, reorganize function attribute keyword... · 7577ed57

Stephen Lin authored Apr 20, 2013

Move 'kw_align' case to proper section, reorganize function attribute keyword case statements to be consistent with r179119

llvm-svn: 179948

7577ed57

Remove unused ShouldFoldAtomicFences flag. · 16aba170

Tim Northover authored Apr 20, 2013

I think it's almost impossible to fold atomic fences profitably under
LLVM/C++11 semantics. As a result, this is now unused and just
cluttering up the target interface.

llvm-svn: 179940

16aba170

Remove unused MEMBARRIER DAG node; it's been replaced by ATOMIC_FENCE. · a2b53390
Tim Northover authored Apr 20, 2013
```
llvm-svn: 179939
```
a2b53390
VecUtils: Clean up uses of dyn_cast. · 519b2e30
Benjamin Kramer authored Apr 20, 2013
```
llvm-svn: 179936
```
519b2e30
SLPVectorizer: Strength reduce SmallVectors to ArrayRefs. · 4600bcc3
Benjamin Kramer authored Apr 20, 2013
```
Avoids a couple of copies and allows more flexibility in the clients.

llvm-svn: 179935
```
4600bcc3

SLPVectorizer: Reduce the compile time by eliminating the search for some of... · ce2660d6

Nadav Rotem authored Apr 20, 2013

SLPVectorizer: Reduce the compile time by eliminating the search for some of the more expensive patterns. After this change will only check basic arithmetic trees that start at cmpinstr.

llvm-svn: 179933

ce2660d6

refactor tryToVectorizePair to a new method that supports vectorization of lists. · 998e035c
Nadav Rotem authored Apr 20, 2013
```
llvm-svn: 179932
```
998e035c
Fix an unused variable warning. · 89038728
Nadav Rotem authored Apr 20, 2013
```
llvm-svn: 179931
```
89038728