Commits · 682b850602d3380528c5ecafe352c48422e19724 · Roger Ferrer / llvm-epi

Nov 02, 2011
- Add a bunch more X86 AVX2 instructions and their corresponding intrinsics. · 682b8506
  Craig Topper authored Nov 02, 2011
```
llvm-svn: 143529
```
  682b8506
Nov 01, 2011
- Teach the x86 backend a couple tricks for dealing with v16i8 sra by a constant... · 3f5eccbe
  Eli Friedman authored Nov 01, 2011
```
Teach the x86 backend a couple tricks for dealing with v16i8 sra by a constant splat value.  Fixes PR11289.

llvm-svn: 143498
```
  3f5eccbe
- Fix operand type for x86 pmadd_ub_sw intrinsic. · fec80c6a
  Craig Topper authored Nov 01, 2011
```
llvm-svn: 143455
```
  fec80c6a
Oct 31, 2011
- Fix operand type for int_x86_ssse3_phadd_sw_128 intrinsic · 9821e75e
  Craig Topper authored Oct 31, 2011
```
llvm-svn: 143336
```
  9821e75e
- Test case for X86 FS/GS Base intrinsics · 242d1f8c
  Craig Topper authored Oct 31, 2011
```
llvm-svn: 143332
```
  242d1f8c
- Begin adding AVX2 instructions. No selection support yet other than intrinsics. · cfcfdf2a
  Craig Topper authored Oct 31, 2011
```
llvm-svn: 143331
```
  cfcfdf2a
- Switch new .file directive emission off by default, change llc's flag for it to · aab6169e
  Nick Lewycky authored Oct 31, 2011
```
-enable-dwarf-directory.

llvm-svn: 143326
```
  aab6169e
Oct 30, 2011

X86: Emit logical shift by constant splat of <16 x i8> as a <8 x i16> shift... · 7402ee6e

Benjamin Kramer authored Oct 30, 2011

X86: Emit logical shift by constant splat of <16 x i8> as a <8 x i16> shift and zero out the bits where zeros should've been shifted in.

llvm-svn: 143315

7402ee6e

Fix return type for X86 mpsadbw instrinsic. The instruction takes in a vector... · 9cdb9ffa

Craig Topper authored Oct 30, 2011

Fix return type for X86 mpsadbw instrinsic. The instruction takes in a vector of 8-bit integers, but produces a vector of 16-bit integers.

llvm-svn: 143313

9cdb9ffa

· c602b2c4

Nadav Rotem authored Oct 30, 2011

Fix pr11266.

On x86: (shl V, 1) -> add V,V

Hardware support for vector-shift is sparse and in many cases we scalarize the
result. Additionally, on sandybridge padd is faster than shl.

llvm-svn: 143311

c602b2c4

Stabilize the test by specifying an exact cpu target · 1dda6a8c
Nadav Rotem authored Oct 30, 2011
```
llvm-svn: 143307
```
1dda6a8c

Oct 29, 2011
- Add a new DAGCombine optimization for BUILD_VECTOR. · bf6568b5
  Nadav Rotem authored Oct 29, 2011
```
If all of the inputs are zero/any_extended, create a new simple BV
which can be further optimized by other BV optimizations.

llvm-svn: 143297
```
  bf6568b5
- Force SSE for this test. · 932de2bc
  Benjamin Kramer authored Oct 29, 2011
```
llvm-svn: 143291
```
  932de2bc
- Revert r143206, as there are still some failing tests. · 9b9c9701
  Dan Gohman authored Oct 29, 2011
```
llvm-svn: 143262
```
  9b9c9701
Oct 28, 2011

Reapply r143177 and r143179 (reverting r143188), with scheduler · 73057ad2

Dan Gohman authored Oct 28, 2011

fixes: Use a separate register, instead of SP, as the
calling-convention resource, to avoid spurious conflicts with
actual uses of SP. Also, fix unscheduling of calling sequences,
which can be triggered by pseudo-two-address dependencies.

llvm-svn: 143206

73057ad2

Dwarf: [PR11022] Fix emitting DW_AT_const_value(>i64), to be host-endian-neutral. · 29ccdd82

NAKAMURA Takumi authored Oct 28, 2011

Don't assume APInt::getRawData() would hold target-aware endianness nor host-compliant endianness. rawdata[0] holds most lower i64, even on big endian host.

FIXME: Add a testcase for big endian target.

FIXME: Ditto on CompileUnit::addConstantFPValue() ?
llvm-svn: 143194

29ccdd82

test/CodeGen/X86/2010-08-10-DbgConstant.ll: Add explicit -mtriple=i686-linux. It must be for elf! · 88dd835f
NAKAMURA Takumi authored Oct 28, 2011
```
llvm-svn: 143189
```
88dd835f

Speculatively disable Dan's commits 143177 and 143179 to see if · 225a7037

Duncan Sands authored Oct 28, 2011

it fixes the dragonegg self-host (it looks like gcc is miscompiled).
Original commit messages:
Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.

Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.

Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.

Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.

This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.

Delete #if 0 code accidentally left in.

llvm-svn: 143188

225a7037

Always use the string pool, even when it makes the .o larger. This may help · cc64ae14
Nick Lewycky authored Oct 28, 2011
```
tools that read the debug info in the .o files by making the DIE sizes more
consistent.

llvm-svn: 143186
```
cc64ae14

Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW · 4db3f7dd

Dan Gohman authored Oct 28, 2011

on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.

Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.

Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.

Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.

This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.

llvm-svn: 143177

4db3f7dd

Oct 27, 2011
- Changed test to check for correct load size instead of shift as the shift might change if optimised · ce637007
  Pete Cooper authored Oct 27, 2011
```
llvm-svn: 143116
```
  ce637007
- Teach our Dwarf emission to use the string pool. · d59c0cac
  Nick Lewycky authored Oct 27, 2011
```
llvm-svn: 143097
```
  d59c0cac
- Don't crash on 128-bit sdiv by constant. Found by inspection. · e9e356ad
  Eli Friedman authored Oct 27, 2011
```
 

llvm-svn: 143095
```
  e9e356ad
Oct 26, 2011

Run test with -verify-machineinstrs. · f5a15529
Rafael Espindola authored Oct 26, 2011
```
Patch by Sanjoy Das.

llvm-svn: 143066
```
f5a15529
Fixes an issue reported by -verify-machineinstrs. · b3285224
Rafael Espindola authored Oct 26, 2011
```
Patch by Sanjoy Das.

llvm-svn: 143064
```
b3285224

This commit introduces two fake instructions MORESTACK_RET and · 66393c12

Rafael Espindola authored Oct 26, 2011

MORESTACK_RET_RESTORE_R10; which are lowered to a RET and a RET
followed by a MOV respectively.  Having a fake instruction prevents
the verifier from seeing a MachineBasicBlock end with a
non-terminator (MOV).  It also prevents the rather eccentric case of a
MachineBasicBlock ending with RET but having successors nevertheless.

Patch by Sanjoy Das.

llvm-svn: 143062

66393c12

Oct 23, 2011

Completely re-write the algorithm behind MachineBlockPlacement based on · bd1be4d0

Chandler Carruth authored Oct 23, 2011

discussions with Andy. Fundamentally, the previous algorithm is both
counter productive on several fronts and prioritizing things which
aren't necessarily the most important: static branch prediction.

The new algorithm uses the existing loop CFG structure information to
walk through the CFG itself to layout blocks. It coalesces adjacent
blocks within the loop where the CFG allows based on the most likely
path taken. Finally, it topologically orders the block chains that have
been formed. This allows it to choose a (mostly) topologically valid
ordering which still priorizes fallthrough within the structural
constraints.

As a final twist in the algorithm, it does violate the CFG when it
discovers a "hot" edge, that is an edge that is more than 4x hotter than
the competing edges in the CFG. These are forcibly merged into
a fallthrough chain.

Future transformations that need te be added are rotation of loop exit
conditions to be fallthrough, and better isolation of cold block chains.
I'm also planning on adding statistics to model how well the algorithm
does at laying out blocks based on the probabilities it receives.

The old tests mostly still pass, and I have some new tests to add, but
the nested loops are still behaving very strangely. This almost seems
like working-as-intended as it rotated the exit branch to be
fallthrough, but I'm not convinced this is actually the best layout. It
is well supported by the probabilities for loops we currently get, but
those are pretty broken for nested loops, so this may change later.

llvm-svn: 142743

bd1be4d0

Oct 22, 2011

Fix pr11193. · e649d665

Nadav Rotem authored Oct 22, 2011

SHL inserts zeros from the right, thus even when the original
sign_extend_inreg value was of 1-bit, we need to sra.

llvm-svn: 142724

e649d665

Oct 21, 2011

Fix pr11194. When promoting and splitting integers we need to use · 5e00bb5f

Nadav Rotem authored Oct 21, 2011

ZExtPromotedInteger and SExtPromotedInteger based on the operation we legalize.

SetCC return type needs to be legalized via PromoteTargetBoolean.

llvm-svn: 142660

5e00bb5f

Don't hard code the desired alignment for loops -- it isn't 16-bytes on · 70a38058
Chandler Carruth authored Oct 21, 2011
```
all x86 systems. Sorry for the breakage.

llvm-svn: 142656
```
70a38058
1. Fix the widening of SETCC in WidenVecOp_SETCC. Use the correct return CC type. · d315157f
Nadav Rotem authored Oct 21, 2011
```
2. Fix a typo in CONCAT_VECTORS which exposed the bug in #1.

llvm-svn: 142648
```
d315157f

Add loop aligning to MachineBlockPlacement based on review discussion so · 8b9737cb

Chandler Carruth authored Oct 21, 2011

it's a bit more plausible to use this instead of CodePlacementOpt. The
code for this was shamelessly stolen from CodePlacementOpt, and then
trimmed down a bit. There doesn't seem to be much utility in returning
true/false from this pass as we may or may not have rewritten all of the
blocks. Also, the statistic of counting how many loops were aligned
doesn't seem terribly important so I removed it. If folks would like it
to be included, I'm happy to add it back.

This was probably the most egregious of the missing features, and now
I'm going to start gathering some performance numbers and looking at
specific loop structures that have different layout between the two.

Test is updated to include both basic loop alignment and nested loop
alignment.

llvm-svn: 142645

8b9737cb

Add a very basic test for MachineBlockPlacement. This is essentially the · ddfeaafd

Chandler Carruth authored Oct 21, 2011

canonical example I used when developing it, and is one of the primary
motivating real-world use cases for __builtin_expect (when burried under
a macro).

I'm working on more test cases here, but I'm trying to make sure both
that the pass is doing the right thing with the test cases and that they
aren't too brittle to changes elsewhere in the code generation pipeline.

Feedback and/or suggestions on how to test this are very welcome.
Especially feedback on whether testing the block comments is a good
strategy; I couldn't find any good examples to steal from but all the
other ideas I had were a lot uglier or more fragile.

llvm-svn: 142644

ddfeaafd

Remove intrinsics for X86 BLSI, BLSMSK, and BLSR intrinsics and replace with... · 039a7906
Craig Topper authored Oct 21, 2011
```
Remove intrinsics for X86 BLSI, BLSMSK, and BLSR intrinsics and replace with custom isel lowering code.

llvm-svn: 142642
```
039a7906

Oct 20, 2011
- Fix TLS lowering bug. The CopyFromReg must be glued to the TLSCALL. rdar://10291355 · 54d678ff
  Evan Cheng authored Oct 19, 2011
```
llvm-svn: 142550
```
  54d678ff
Oct 19, 2011
- Improve code generation for vselect on SSE2: · 8824472a
  Nadav Rotem authored Oct 19, 2011
```
When checking the availability of instructions using the TLI, a 'promoted'
instruction IS available. It means that the value is bitcasted to another type
for which there is an operation. The correct check for the availablity of an
instruction is to check if it should be expanded.

llvm-svn: 142542
```
  8824472a
- Add support for the vector-widening of vselect and vector-setcc · 6652e22b
  Nadav Rotem authored Oct 19, 2011
```
llvm-svn: 142488
```
  6652e22b
- Rename PEXTR to PEXT. Add intrinsics for BMI instructions. · ef309c33
  Craig Topper authored Oct 19, 2011
```
llvm-svn: 142480
```
  ef309c33
- Added testcase for <rdar://problem/10215997> · 20a04e74
  Lang Hames authored Oct 18, 2011
```
llvm-svn: 142462
```
  20a04e74
- Add additional element-promotion tests. · 0d339335
  Nadav Rotem authored Oct 18, 2011
```
llvm-svn: 142442
```
  0d339335