Commits · f55abeaf4c2489106ccf4defdb1ebbd9a5d8b1ef · Roger Ferrer / llvm-epi-0.8

Apr 22, 2013

Revert "Revert "PR14606: debug info imported_module support"" · f55abeaf

David Blaikie authored Apr 22, 2013

This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll

I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
though the debug info was clearly invalid on all of them, but this ought to fix
it.

llvm-svn: 179996

f55abeaf

Legalize vector truncates by parts rather than just splitting. · 563983c8

Jim Grosbach authored Apr 21, 2013

Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.

Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again.  For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
  %inlo = v4i32 extract_subvector %in, 0
  %inhi = v4i32 extract_subvector %in, 4
  %lo16 = v4i16 trunc v4i32 %inlo
  %hi16 = v4i16 trunc v4i32 %inhi
  %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
  %res = v8i8 trunc v8i16 %in16

This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.

Update the ARMTargetTransformInfo to take this improved legalization
into account.

Consider the simplified IR:

define <16 x i8> @test1(<16 x i32>* %ap) {
  %a = load <16 x i32>* %ap
  %tmp = trunc <16 x i32> %a to <16 x i8>
  ret <16 x i8> %tmp
}

define <8 x i8> @test2(<8 x i32>* %ap) {
  %a = load <8 x i32>* %ap
  %tmp = trunc <8 x i32> %a to <8 x i8>
  ret <8 x i8> %tmp
}

Previously, we would generate the truly hideous:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #20
	bic	sp, sp, #7
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d24, d25}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	vld1.64	{d18, d19}, [r2:128]
	add	r1, r0, #16
	vmovn.i32	d22, q8
	vld1.64	{d16, d17}, [r1:128]
	vmovn.i32	d20, q9
	vmovn.i32	d18, q12
	vmov.u16	r0, d22[3]
	strb	r0, [sp, #15]
	vmov.u16	r0, d22[2]
	strb	r0, [sp, #14]
	vmov.u16	r0, d22[1]
	strb	r0, [sp, #13]
	vmov.u16	r0, d22[0]
	vmovn.i32	d16, q8
	strb	r0, [sp, #12]
	vmov.u16	r0, d20[3]
	strb	r0, [sp, #11]
	vmov.u16	r0, d20[2]
	strb	r0, [sp, #10]
	vmov.u16	r0, d20[1]
	strb	r0, [sp, #9]
	vmov.u16	r0, d20[0]
	strb	r0, [sp, #8]
	vmov.u16	r0, d18[3]
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	vldmia	sp, {d16, d17}
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	mov	sp, r7
	pop	{r7}
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #12
	bic	sp, sp, #7
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d20, d21}, [r0:128]
	vmovn.i32	d18, q8
	vmov.u16	r0, d18[3]
	vmovn.i32	d16, q10
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	ldm	sp, {r0, r1}
	mov	sp, r7
	pop	{r7}
	bx	lr

Now, however, we generate the much more straightforward:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d20, d21}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	add	r1, r0, #16
	vld1.64	{d18, d19}, [r2:128]
	vld1.64	{d22, d23}, [r1:128]
	vmovn.i32	d17, q8
	vmovn.i32	d16, q9
	vmovn.i32	d18, q10
	vmovn.i32	d19, q11
	vmovn.i16	d17, q8
	vmovn.i16	d16, q9
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d18, d19}, [r0:128]
	vmovn.i32	d16, q8
	vmovn.i32	d17, q9
	vmovn.i16	d16, q8
	vmov	r0, r1, d16
	bx	lr

llvm-svn: 179989

563983c8

Apr 21, 2013
- Tidy up comment grammar. · d4db72db
  Jim Grosbach authored Apr 21, 2013
```
llvm-svn: 179986
```
  d4db72db
Apr 20, 2013
- Remove unused ShouldFoldAtomicFences flag. · 16aba170
  Tim Northover authored Apr 20, 2013
```
I think it's almost impossible to fold atomic fences profitably under
LLVM/C++11 semantics. As a result, this is now unused and just
cluttering up the target interface.

llvm-svn: 179940
```
  16aba170
- Remove unused MEMBARRIER DAG node; it's been replaced by ATOMIC_FENCE. · a2b53390
  Tim Northover authored Apr 20, 2013
```
llvm-svn: 179939
```
  a2b53390
- Add CodeGen support for functions that always return arguments via a new... · b8bd232a
  Stephen Lin authored Apr 20, 2013
```
Add CodeGen support for functions that always return arguments via a new parameter attribute 'returned', which is taken advantage of in target-independent tail call opportunity detection and in ARM call lowering (when placed on an integral first parameter).

llvm-svn: 179925
```
  b8bd232a
- Allow tail call opportunity detection through nested and/or multiple... · ffc44549
  Stephen Lin authored Apr 20, 2013
```
Allow tail call opportunity detection through nested and/or multiple iterations of extractelement/insertelement indirection

llvm-svn: 179924
```
  ffc44549
- Simplify the code in FastISel::tryToFoldLoad, add an assertion and fix a comment. · e80691dc
  Eli Bendersky authored Apr 19, 2013
```
llvm-svn: 179908
```
  e80691dc
- Move TryToFoldFastISelLoad to FastISel, where it belongs. In general, I'm · 90dd3e7d
  Eli Bendersky authored Apr 19, 2013
```
trying to move as much FastISel logic as possible out of the main path in
SelectionDAGISel - intermixing them just adds confusion.

llvm-svn: 179902
```
  90dd3e7d
- ArrayRefize getMachineNode(). No functionality change. · b53d8963
  Michael Liao authored Apr 19, 2013
```
llvm-svn: 179901
```
  b53d8963
Apr 19, 2013

Add an MRI::verifyUseLists() function. · e17c3fde

Jakob Stoklund Olesen authored Apr 19, 2013

This checks the sanity of the register use lists in the MI intermediate
representation.

llvm-svn: 179895

e17c3fde

Use dbgs() consistently for -debug printouts · dbeefaa8
Eli Bendersky authored Apr 19, 2013
```
llvm-svn: 179894
```
dbeefaa8
Revert "PR14606: debug info imported_module support" · 0e89ade8
Eric Christopher authored Apr 19, 2013
```
This reverts commit r179836 as it seems to have caused test failures.

llvm-svn: 179840
```
0e89ade8

PR14606: debug info imported_module support · 88564f3c

David Blaikie authored Apr 19, 2013

Adding another CU-wide list, in this case of imported_modules (since they
should be relatively rare, it seemed better to add a list where each element
had a "context" value, rather than add a (usually empty) list to every scope).
This takes care of DW_TAG_imported_module, but to fully address PR14606 we'll
need to expand this to cover DW_TAG_imported_declaration too.

llvm-svn: 179836

88564f3c

Add some more stats for fast isel vs. SelectionDAG, w.r.t lowering function · 6084f45f
Eli Bendersky authored Apr 19, 2013
```
arguments in entry BBs.

llvm-svn: 179824
```
6084f45f

Apr 17, 2013
- Add support for subsections to the ELF assembler. Fixes PR8717. · 2f495b93
  Peter Collingbourne authored Apr 17, 2013
```
Differential Revision: http://llvm-reviews.chandlerc.com/D598

llvm-svn: 179725
```
  2f495b93
Apr 15, 2013

Replace uses of the deprecated std::auto_ptr with OwningPtr. · b23ea72e

Andy Gibbs authored Apr 15, 2013

This is a rework of the broken parts in r179373 which were subsequently reverted in r179374 due to incompatibility with C++98 compilers.  This version should be ok under C++98.

llvm-svn: 179520

b23ea72e

Apr 14, 2013
- Document the decision to assume that the cost of floats is twice as much as integers. · 0db0690a
  Nadav Rotem authored Apr 14, 2013
```
llvm-svn: 179478
```
  0db0690a
Apr 13, 2013

MI-Sched: DEBUG formatting. · 1f0bb69b
Andrew Trick authored Apr 13, 2013
```
llvm-svn: 179452
```
1f0bb69b
MI-Sched cleanup. If an instruction has no valid sched class, do not attempt... · be2bccbc
Andrew Trick authored Apr 13, 2013
```
MI-Sched cleanup. If an instruction has no valid sched class, do not attempt to check for a variant.

llvm-svn: 179451
```
be2bccbc

MI-Sched: schedule physreg copies. · e833e1cd

Andrew Trick authored Apr 13, 2013

The register allocator expects minimal physreg live ranges. Schedule
physreg copies accordingly. This is slightly tricky when they occur in
the middle of the scheduling region. For now, this is handled by
rescheduling the copy when its associated instruction is
scheduled. Eventually we may instead bundle them, but only if we can
preserve the bundles as parallel copies during regalloc.

llvm-svn: 179449

e833e1cd

Apr 12, 2013

CostModel: increase the default cost of supported floating point operations... · 87a0af6e

Nadav Rotem authored Apr 12, 2013

CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles.

llvm-svn: 179413

87a0af6e

Revert broken pieces of r179373. · dae08512

Benjamin Kramer authored Apr 12, 2013

You can't copy an OwningPtr, and move semantics aren't available in C++98.

llvm-svn: 179374

dae08512

Replace uses of the deprecated std::auto_ptr with OwningPtr. · 95777550
Andy Gibbs authored Apr 12, 2013
```
llvm-svn: 179373
```
95777550
Don't disable block layout when forcing block alignment. · c0adc9fd
Nadav Rotem authored Apr 12, 2013
```
llvm-svn: 179355
```
c0adc9fd

Add a flag to align all basic blocks in the function. · c3b0f50a

Nadav Rotem authored Apr 12, 2013

When debugging performance regressions we often ask ourselves if the regression
that we see is due to poor isel/sched/ra or due to some micro-architetural
problem. When comparing two code sequences one good way to rule out front-end
bottlenecks (and other the issues) is to force code alignment. This pass adds
a flag that forces the alignment of all of the basic blocks in the program.

llvm-svn: 179353

c3b0f50a

Apr 11, 2013

Add braces around || in && to pacify GCC. · e7c45bc6
Benjamin Kramer authored Apr 11, 2013
```
llvm-svn: 179275
```
e7c45bc6

Manually remove successors in if conversion when CopyAndPredicateBlock is used · 95081bff

Hal Finkel authored Apr 10, 2013

In the simple and triangle if-conversion cases, when CopyAndPredicateBlock is
used because the to-be-predicated block has other predecessors, we need to
explicitly remove the old copied block from the successors list. Normally if
conversion relies on TII->AnalyzeBranch combined with BB->CorrectExtraCFGEdges
to cleanup the successors list, but if the predicated block contained an
un-analyzable branch (such as a now-predicated return), then this will fail.

These extra successors were causing a problem on PPC because it was causing
later passes (such as PPCEarlyReturm) to leave dead return-only basic blocks in
the code.

llvm-svn: 179227

95081bff

Apr 10, 2013

Generalize the PassConfig API and remove addFinalizeRegAlloc(). · e220323c

Andrew Trick authored Apr 10, 2013

The target hooks are getting out of hand. What does it mean to run
before or after regalloc anyway? Allowing either Pass* or AnalysisID
pass identification should make it much easier for targets to use the
substitutePass and insertPass APIs, and create less need for badly
named target hooks.

llvm-svn: 179140

e220323c

Apr 09, 2013

The .dwo section shouldn't contain the unrelocated values (and · 52ce7189

Eric Christopher authored Apr 09, 2013

therefore not at all) of the pc or statement list. We also don't
need to emit the compilation dir so save so space and time
and don't bother.

Fix up the testcase accordingly and verify that we don't emit
the attributes or the items that they use.

llvm-svn: 179114

52ce7189

DAGCombiner: Fold a shuffle on CONCAT_VECTORS into a new CONCAT_VECTORS if possible. · bbae991d

Benjamin Kramer authored Apr 09, 2013

This pattern occurs in SROA output due to the way vector arguments are lowered
on ARM.

The testcase from PR15525 now compiles into this, which is better than the code
we got with the old scalarrepl:
_Store:
	ldr.w	r9, [sp]
	vmov	d17, r3, r9
	vmov	d16, r1, r2
	vst1.8	{d16, d17}, [r0]
	bx	lr

Differential Revision: http://llvm-reviews.chandlerc.com/D647

llvm-svn: 179106

bbae991d

Apr 07, 2013

DW_FORM_sec_offset should be a relocation on platforms that use · 55863bef

Eric Christopher authored Apr 07, 2013

a relocation across sections. Do this for DW_AT_stmt list in the
skeleton CU and check the relocations in the debug_info section.

Add a FIXME for multiple CUs.

llvm-svn: 178969

55863bef

Apr 06, 2013

typo · c4bd84c1
Nadav Rotem authored Apr 06, 2013
```
llvm-svn: 178949
```
c4bd84c1

Dwarf: use utostr on CUID to append to SmallString. · 5b22f9fe

Manman Ren authored Apr 06, 2013

We used to do "SmallString += CUID", which is incorrect, since CUID will
be truncated to a char.

rdar://problem/13573833

llvm-svn: 178941

5b22f9fe

Reapply r178845 with fix - Fix bug in PEI's virtual-register scavenging · 3005c299

Hal Finkel authored Apr 05, 2013

This fixes PEI as previously described, but correctly handles the case where
the instruction defining the virtual register to be scavenged is the first in
the block. Arnold provided me with a bugpoint-reduced test case, but even that
seems too large to use as a regression test. If I'm successful in cleaning it
up then I'll commit that as well.

Original commit message:

This change fixes a bug that I introduced in r178058. After a register is
scavenged using one of the available spills slots the instruction defining the
virtual register needs to be moved to after the spill code. The scavenger has
already processed the defining instruction so that registers killed by that
instruction are available for definition in that same instruction. Unfortunately,
after this, the scavenger needs to iterate through the spill code and then
visit, again, the instruction that defines the now-scavenged register. In order
to avoid confusion, the register scavenger needs the ability to 'back up'
through the spill code so that it can again process the instructions in the
appropriate order. Prior to this fix, once the scavenger reached the
just-moved instruction, it would assert if it killed any registers because,
having already processed the instruction, it believed they were undefined.

Unfortunately, I don't yet have a small test case. Thanks to Pranav Bhandarkar
for diagnosing the problem and testing this fix.

llvm-svn: 178919

3005c299

Apr 05, 2013

Use the target options specified on a function to reset the back-end. · eb108bad

Bill Wendling authored Apr 05, 2013

During LTO, the target options on functions within the same Module may
change. This would necessitate resetting some of the back-end. Do this for X86,
because it's a Friday afternoon.

llvm-svn: 178917

eb108bad

Revert r178845 - Fix bug in PEI's virtual-register scavenging · 81c46d08

Hal Finkel authored Apr 05, 2013

Reverting because this breaks one of the LTO builders. Original commit message:

Unfortunately, I don't yet have a small test case. Thanks to Pranav Bhandarkar
for diagnosing the problem and testing this fix.

llvm-svn: 178916

81c46d08

Fix bug in PEI's virtual-register scavenging · e6f48e4e

Hal Finkel authored Apr 05, 2013

Unfortunately, I don't yet have a small test case. Thanks to Pranav Bhandarkar
for diagnosing the problem and testing this fix.

llvm-svn: 178845

e6f48e4e

RegisterPressure heuristics currently require signed comparisons. · 80e66ce0
Andrew Trick authored Apr 05, 2013
```
llvm-svn: 178823
```
80e66ce0

Disable DFSResult for ConvergingScheduler. · 96ce3848

Andrew Trick authored Apr 05, 2013

For now, just save the compile time since the ConvergingScheduler
heuristics don't use this analysis. We'll probably enable it later
after compile-time investigation.

llvm-svn: 178822

96ce3848