Commits · b9116e69666de1d6da41af6554beb9ceb2835db8 · Roger Ferrer / llvm-epi-0.8

Apr 16, 2013

SLPVectorizer: Make it a function pass and add code for hoisting the... · b9116e69

Nadav Rotem authored Apr 15, 2013

SLPVectorizer: Make it a function pass and add code for hoisting the vector-gather sequence out of loops.

llvm-svn: 179562

b9116e69

Apr 15, 2013

R600/SI: Emit config values in register value pairs. · cb97e3ac

Tom Stellard authored Apr 15, 2013

Instead of emitting config values in a predefined order, the code
emitter will now emit a 32-bit register index followed by the 32-bit
config value.

llvm-svn: 179546

cb97e3ac

R600/SI: Emit configuration value in the .AMDGPU.config ELF section · 3a7beafb
Tom Stellard authored Apr 15, 2013
```
llvm-svn: 179545
```
3a7beafb
R600: Emit ELF formatted code rather than raw ISA. · 9991659f
Tom Stellard authored Apr 15, 2013
```
llvm-svn: 179544
```
9991659f
Fix a typo in comment. · 0f38c1e3
Jim Grosbach authored Apr 15, 2013
```
llvm-svn: 179542
```
0f38c1e3

Make the host endianness check an integer constant expression. · 41cb64f4

Rafael Espindola authored Apr 15, 2013

I will remove the isBigEndianHost function once I update clang.

The ifdef logic is designed to
* not use configure/cmake to avoid breaking -arch i686 -arch ppc.
* default to little endian
* be as small as possible

It looks like sys/endian.h is the preferred header on most modern BSD systems,
but it is better to change this in a followup patch as machine/endian.h is
available on FreeBSD, OpenBSD, NetBSD and OS X.

llvm-svn: 179527

41cb64f4

Replace uses of the deprecated std::auto_ptr with OwningPtr. · b23ea72e

Andy Gibbs authored Apr 15, 2013

This is a rework of the broken parts in r179373 which were subsequently reverted in r179374 due to incompatibility with C++98 compilers.  This version should be ok under C++98.

llvm-svn: 179520

b23ea72e

Add an option -vectorize-slp-aggressive for running the BB vectorizer. Make... · d4dcc003

Nadav Rotem authored Apr 15, 2013

Add an option -vectorize-slp-aggressive for running the BB vectorizer. Make -fslp-vectorize run the slp-vectorizer.

llvm-svn: 179508

d4dcc003

Rename the slp-vectorizer clang/llvm flags. No functionality change. · a1e5e44e
Nadav Rotem authored Apr 15, 2013
```
llvm-svn: 179505
```
a1e5e44e
SLPVectorizer: Add support for vectorizing trees that start at compare instructions. · 5d393c41
Nadav Rotem authored Apr 15, 2013
```
llvm-svn: 179504
```
5d393c41

Mark all PPC comparison instructions as not having side effects · 95e6ea69

Hal Finkel authored Apr 15, 2013

Now that the CR spilling issues have been resolved, we can remove the
unmodeled-side-effect attributes from the comparison instructions (and also
mark them as isCompare). By allowing these, by default, to have unmodeled side
effects, we were hiding problems with CR spilling; but everything seems much
happier now.

llvm-svn: 179502

95e6ea69

Fix PPC64 CR spill location for callee-saved registers · 6736988a

Hal Finkel authored Apr 15, 2013

This fixes an ABI bug for non-Darwin PPC64. For the callee-saved condition
registers, the spill location is specified relative to the stack pointer (SP +
8). However, this is not relative to the SP after the new stack frame is
established, but instead relative to the caller's stack pointer (it is stored
into the linkage area of the parent's stack frame).

So, like with the link register, we don't directly spill the CRs with other
callee-saved registers, but just mark them to be spilled during prologue
generation.

In practice, this reverts r179457 for PPC64 (but leaves it in place for PPC32).

llvm-svn: 179500

6736988a

Apr 14, 2013

Use object file specific section type for initial text section · 334c7bc7
Nico Rieck authored Apr 14, 2013
```
llvm-svn: 179494
```
334c7bc7

Reorders two transforms that collide with each other · 1fae1955

David Majnemer authored Apr 14, 2013

One performs: (X == 13 | X == 14) -> X-13 <u 2
The other: (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1

The problem is that there are certain values of C1 and C2 that
trigger both transforms but the first one blocks out the second,
this generates suboptimal code.

Reordering the transforms should be better in every case and
allows us to do interesting stuff like turn:
  %shr = lshr i32 %X, 4
  %and = and i32 %shr, 15
  %add = add i32 %and, -14
  %tobool = icmp ne i32 %add, 0

into:
  %and = and i32 %X, 240
  %tobool = icmp ne i32 %and, 224

llvm-svn: 179493

1fae1955

Miscellaneous cleanups for VecUtils.h · 7d62ea86
Benjamin Kramer authored Apr 14, 2013
```
llvm-svn: 179483
```
7d62ea86
SLP: Document the scalarization cost method. · 3403c115
Nadav Rotem authored Apr 14, 2013
```
llvm-svn: 179479
```
3403c115
Document the decision to assume that the cost of floats is twice as much as integers. · 0db0690a
Nadav Rotem authored Apr 14, 2013
```
llvm-svn: 179478
```
0db0690a
Use i32 for all SPARC shift amounts, even in 64-bit mode. · eed1072f
Jakob Stoklund Olesen authored Apr 14, 2013
```
Test case by llvm-stress.

llvm-svn: 179477
```
eed1072f

SLPVectorizer: Add support for trees that don't start at binary operators, and... · 54b413d1

Nadav Rotem authored Apr 14, 2013

SLPVectorizer: Add support for trees that don't start at binary operators, and add the cost of extracting values from the roots of the tree.

llvm-svn: 179475

54b413d1

Add support for the abs64 SPARC v9 code model. · c3c28f85
Jakob Stoklund Olesen authored Apr 14, 2013
```
For when 16 TB just isn't enough.

llvm-svn: 179474
```
c3c28f85

Add support for the SPARC v9 abs44 code model. · c8fc76b0

Jakob Stoklund Olesen authored Apr 14, 2013

This is the default model for non-PIC 64-bit code. It supports
text+data+bss linked anywhere in the low 16 TB of the address space.

llvm-svn: 179473

c8fc76b0

Use target flags for printing SPARC asm operands. · 2e64d7ab

Jakob Stoklund Olesen authored Apr 14, 2013

64-bit code models need multiple relocations that can't be inferred from
the opcode like they can in 32-bit code.

llvm-svn: 179472

2e64d7ab

Also put target flags on SPARC constant pool references. · e0fc832b
Jakob Stoklund Olesen authored Apr 14, 2013
```
Constant pool entries are accessed exactly the same way as global
variables.

llvm-svn: 179471
```
e0fc832b
SLPVectorizer: add initial support for reduction variable vectorization. · 0b9cf856
Nadav Rotem authored Apr 14, 2013
```
llvm-svn: 179470
```
0b9cf856
Fix patterns for 64-bit pointers. · dc1ed578
Jakob Stoklund Olesen authored Apr 14, 2013
```
This fixes the pic32 code model for SPARC v9.

llvm-svn: 179469
```
dc1ed578

Add target flags to SPARC address operands. · 1fb08a8b

Jakob Stoklund Olesen authored Apr 14, 2013

SDNodes and MachineOperands get target flags representing the %hi() and
%lo() assembly annotations that eventually become relocations.

Also define flags to be used by the 64-bit code models.

llvm-svn: 179468

1fb08a8b

Mark all PPC CR registers to be spilled as live-in and tag MFCR appropriately · 2f293915

Hal Finkel authored Apr 13, 2013

Leaving MFCR has having unmodeled side effects is not enough to prevent
unwanted instruction reordering post-RA. We could probably apply a stronger
barrier attribute, but there is a better way: Add all (not just the first) CR
to be spilled as live-in to the entry block, and add all CRs to the MFCR
instruction as implicitly killed.

Unfortunately, I don't have a small test case.

llvm-svn: 179465

2f293915

Apr 13, 2013

Define SPARC code models. · 15b3e900

Jakob Stoklund Olesen authored Apr 13, 2013

Currently, only abs32 and pic32 are implemented. Add a test case for
abs32 with 64-bit code. 64-bit PIC code is currently broken.

llvm-svn: 179463

15b3e900

Use the correct types when matching ADDRri patterns from frame indexes. · 6a0a3eb5
Jakob Stoklund Olesen authored Apr 13, 2013
```
It doesn't seem like anybody is checking types this late in isel, so no
test case.

llvm-svn: 179462
```
6a0a3eb5
GlobalDCE: Fix an oversight in my last commit that could lead to crashes. · adc1727c
Benjamin Kramer authored Apr 13, 2013
```
There is a Constant with non-constant operands: blockaddress.

llvm-svn: 179460
```
adc1727c

Fix a scalability issue with complex ConstantExprs. · 89ca4bc6

Benjamin Kramer authored Apr 13, 2013

This is basically the same fix in three different places. We use a set to avoid
walking the whole tree of a big ConstantExprs multiple times.

For example: (select cmp, (add big_expr 1), (add big_expr 2))
We don't want to visit big_expr twice here, it may consist of thousands of
nodes.

The testcase exercises this by creating an insanely large ConstantExprs out of
a loop. It's questionable if the optimizer should ever create those, but this
can be triggered with real C code. Fixes PR15714.

llvm-svn: 179458

89ca4bc6

Spill and restore PPC CR registers using the FP when we have one · d85a04b3

Hal Finkel authored Apr 13, 2013

For functions that need to spill CRs, and have dynamic stack allocations, the
value of the SP during the restore is not what it was during the save, and so
we need to use the FP in these cases (as for all of the other spills and
restores, but the CR restore has a special code path because its reserved slot,
like the link register, is specified directly relative to the adjusted SP).

llvm-svn: 179457

d85a04b3

MI-Sched: DEBUG formatting. · 1f0bb69b
Andrew Trick authored Apr 13, 2013
```
llvm-svn: 179452
```
1f0bb69b
MI-Sched cleanup. If an instruction has no valid sched class, do not attempt... · be2bccbc
Andrew Trick authored Apr 13, 2013
```
MI-Sched cleanup. If an instruction has no valid sched class, do not attempt to check for a variant.

llvm-svn: 179451
```
be2bccbc

X86 machine model: reduce SandyBridge and Haswell ILPWindow. · f7fd6b9e

Andrew Trick authored Apr 13, 2013

The initial values were arbitrary. I want them to be more
conservative. This represents the number of latency cycles hidden by
OOO execution. In practice, I think it should be within a small factor
of the complex floating point operation latency so the scheduler can
make some attempt to hide latency even for smallish blocks.

These are by no means the best values, just a starting point for
tuning heuristics. Some benchmarks such as TSVC run faster with this
lower value for SandyBridge. I haven't run anything on Haswell, but
it's shouldn't be 2x SB.

llvm-svn: 179450

f7fd6b9e

MI-Sched: schedule physreg copies. · e833e1cd

Andrew Trick authored Apr 13, 2013

The register allocator expects minimal physreg live ranges. Schedule
physreg copies accordingly. This is slightly tricky when they occur in
the middle of the scheduling region. For now, this is handled by
rescheduling the copy when its associated instruction is
scheduled. Eventually we may instead bundle them, but only if we can
preserve the bundles as parallel copies during regalloc.

llvm-svn: 179449

e833e1cd

Catch another case where SD fails to propagate node order. · 52b8387f

Andrew Trick authored Apr 13, 2013

I need to handle this for the test case in my following scheduler
commit.

Work is already under way to redesign the mechanism for node order
propagation because this case by case approach is unmaintainable.

llvm-svn: 179448

52b8387f

[mips] Move MipsTargetLowering::lowerINTRINSIC_W_CHAIN and · a6bbde58
Akira Hatanaka authored Apr 13, 2013
```
lowerINTRINSIC_WO_CHAIN into MipsSETargetLowering.

No functionality changes.

llvm-svn: 179444
```
a6bbde58

Finish templating MachObjectFile over endianness. · 9b709259

Rafael Espindola authored Apr 13, 2013

We are now able to handle big endian macho files in llvm-readobject. Thanks to
David Fang for providing the object files.

llvm-svn: 179440

9b709259

[mips] Reapply r179420 and r179421. · 2f08822f
Akira Hatanaka authored Apr 13, 2013
```
llvm-svn: 179434
```
2f08822f