Commits · bfc9a5f7d3fda13dc9545a22c23e4e16834ac9e2 · Roger Ferrer / llvm-epi

Apr 16, 2012
- Remove AVX2 vpermq and vpermpd intrinsics. These can now be handled with normal shuffle vectors. · bfc9a5f7
  Craig Topper authored Apr 15, 2012
```
llvm-svn: 154778
```
  bfc9a5f7
Apr 15, 2012
- Fix PR12529. The Vxx family of instructions are only supported by AVX. · 42bcd04e
  Nadav Rotem authored Apr 15, 2012
```
Use non-vex instructions for SSE4.

llvm-svn: 154770
```
  42bcd04e
- When emulating vselect using OR/AND/XOR make sure to bitcast the result back to the original type. · 02ef0c35
  Nadav Rotem authored Apr 15, 2012
```
llvm-svn: 154764
```
  02ef0c35
- Added VPERM optimization for AVX2 shuffles · 779a72b4
  Elena Demikhovsky authored Apr 15, 2012
```
llvm-svn: 154761
```
  779a72b4
Apr 14, 2012
- Fix X86 codegen for 'atomicrmw nand' to generate *x = ~(*x & y), not *x = ~*x & y. · 3e8f1f6a
  Richard Smith authored Apr 13, 2012
```
llvm-svn: 154705
```
  3e8f1f6a
Apr 13, 2012
- On Darwin targets, only use vfma etc. if the source use fma() intrinsic explicitly. · 267a4ada
  Evan Cheng authored Apr 13, 2012
```
llvm-svn: 154689
```
  267a4ada
Apr 12, 2012

Disable Hexagon test temporarily. · 1d195b9c

Sirish Pande authored Apr 12, 2012

There is an assert at line 558 in ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA).
This assert needs to addressed for post RA scheduler. Until that assert is addressed,
any passes that uses post ra scheduler will fail. So, I am temporarily disabling the
hexagon tests until that fix is in.

The assert is as follows:
    assert(!MI->isTerminator() && !MI->isLabel() &&
               "Cannot schedule terminators or labels!");

llvm-svn: 154617

1d195b9c

Fix 128-bit ptest intrinsics to take v2i64 instead of v4f32 since these are integer instructions. · d0271b27
Craig Topper authored Apr 12, 2012
```
llvm-svn: 154580
```
d0271b27
Revert changes that were accidentally committed. · c80ae58a
Akira Hatanaka authored Apr 11, 2012
```
llvm-svn: 154563
```
c80ae58a
Fix string that is being checked. · 1e962f25
Akira Hatanaka authored Apr 11, 2012
```
llvm-svn: 154547
```
1e962f25
Emit neg.s or neg.d only if -enable-no-nans-fp-math is supplied by user, · 47ad674f
Akira Hatanaka authored Apr 11, 2012
```
otherwise expand FNEG during legalization.

llvm-svn: 154546
```
47ad674f
Emit abs.s or abs.d only if -enable-no-nans-fp-math is supplied by user. · 7f4c9d14
Akira Hatanaka authored Apr 11, 2012
```
Invalid operation is signaled if the operand of these instructions is NaN.

llvm-svn: 154545
```
7f4c9d14

Fix bugs in lowering of FCOPYSIGN nodes. · 4f5c8421

Akira Hatanaka authored Apr 11, 2012

- FCOPYSIGN nodes that have operands of different types were not handled.
- Different code was generated depending on the endianness of the target.

Additionally, code is added that emits INS and EXT instructions, if they are
supported by target (they are R2 instructions).

llvm-svn: 154540

4f5c8421

Apr 11, 2012

Add more fused mul+add/sub patterns. rdar://10139676 · 5efc4422
Evan Cheng authored Apr 11, 2012
```
llvm-svn: 154484
```
5efc4422

Reapply 154396 after fixing a test. · 9bc178ac

Nadav Rotem authored Apr 11, 2012

Original message:
Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendV uses a register for the selection while Vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154483

9bc178ac

Match (fneg (fma) to vfnma. rdar://10139676 · 67a09fc3
Evan Cheng authored Apr 11, 2012
```
llvm-svn: 154469
```
67a09fc3
Merge fma.ll into fusedMAC.ll · d0f61cbe
Evan Cheng authored Apr 11, 2012
```
llvm-svn: 154466
```
d0f61cbe
Fix test to be register assignment invariant. · 0bcf8f4b
Jakob Stoklund Olesen authored Apr 11, 2012
```
llvm-svn: 154453
```
0bcf8f4b

Move the constant-folding support for FP_ROUND in SelectionDAG from the... · 6f1ee163

Owen Anderson authored Apr 10, 2012

Move the constant-folding support for FP_ROUND in SelectionDAG from the one-operand version of getNode() to the two-operand version, since it became a two-operand node at sound point.
Zap a testcase that this allows us to completely fold away.

llvm-svn: 154447

6f1ee163

Apr 10, 2012
- Handle llvm.fma.* intrinsics. rdar://10914096 · d0007f3c
  Evan Cheng authored Apr 10, 2012
```
llvm-svn: 154439
```
  d0007f3c
- Add a comment noting that the fdiv -> fmul conversion won't generate · 4f53074c
  Duncan Sands authored Apr 10, 2012
```
multiplication by a denormal, and some tests checking that.

llvm-svn: 154431
```
  4f53074c
- Temporarily revert this patch to see if it brings the buildbots back. · 65ada95b
  Eric Christopher authored Apr 10, 2012
```
llvm-svn: 154425
```
  65ada95b
- To ensure that we have more accurate line information for a block · e9abba71
  Eric Christopher authored Apr 10, 2012
```
don't elide the branch instruction if it's the only one in the block,
otherwise it's ok.

PR9796 and rdar://11215207

llvm-svn: 154417
```
  e9abba71
- Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. · f934f917
  Nadav Rotem authored Apr 10, 2012
```
blendv uses a register for the selection while vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154396
```
  f934f917
- Transform div to mul with reciprocal only when fp imm is legal. · 4d1220de
  Anton Korobeynikov authored Apr 10, 2012
```
This fixes PR12516 and uncovers one weird problem in legalize (workarounded)

llvm-svn: 154394
```
  4d1220de
- Add proper checks. · 07526249
  Evan Cheng authored Apr 10, 2012
```
llvm-svn: 154379
```
  07526249
- Fix a long standing tail call optimization bug. When a libcall is emitted · f8bad080
  Evan Cheng authored Apr 10, 2012
```
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.

PR12419
rdar://9770785
rdar://11195178

llvm-svn: 154370
```
  f8bad080
- Don't try to zExt just to check if an integer constant is zero, it might · 1d9672bd
  Rafael Espindola authored Apr 10, 2012
```
not fit in a i64.

llvm-svn: 154364
```
  1d9672bd
- Test case for PR12495. · ec96cd06
  Lang Hames authored Apr 09, 2012
```
llvm-svn: 154359
```
  ec96cd06
Apr 09, 2012

Have TargetLowering::getPICJumpTableRelocBase return a node that points to the · 8483a6c4
Akira Hatanaka authored Apr 09, 2012
```
GOT if jump table uses 64-bit gp-relative relocation.

llvm-svn: 154341
```
8483a6c4

When performing a truncating store, it's possible to rearrange the data · e0e38f61

Chad Rosier authored Apr 09, 2012

in-register, such that we can use a single vector store rather then a 
series of scalar stores.

For func_4_8 the generated code

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vmov.u16	r0, d16[3]
	strb	r0, [r2, #3]
	vmov.u16	r0, d16[2]
	strb	r0, [r2, #2]
	vmov.u16	r0, d16[1]
	strb	r0, [r2, #1]
	vmov.u16	r0, d16[0]
	strb	r0, [r2]
	bx	lr

becomes

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vuzp.8	d16, d17
	vst1.32	{d16[0]}, [r2, :32]
	bx	lr

I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll,
but I couldn't think of a way to judiciously apply this combine.

This

	ldrh	r0, [r0, #4]
	strh	r0, [r1]

becomes

	vldr	d16, [r0]
	vmov.u16	r0, d16[2]
	vmov.32	d16[0], r0
	vuzp.16	d16, d17
	vst1.32	{d16[0]}, [r1, :32]

PR11158
rdar://10703339

llvm-svn: 154340

e0e38f61

Pattern match a setcc of boolean value with 0 as a truncate. · 8f62b324
Rafael Espindola authored Apr 09, 2012
```
llvm-svn: 154322
```
8f62b324
Lower some x86 shuffle sequences to the vblend family of instructions. · fb7e2ae5
Nadav Rotem authored Apr 09, 2012
```
llvm-svn: 154313
```
fb7e2ae5
Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. · b801ca39
Nadav Rotem authored Apr 09, 2012
```
Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering.

llvm-svn: 154310
```
b801ca39

Cleanup and relax a restriction on the matching of global offsets into · 3779ac10

Chandler Carruth authored Apr 09, 2012

x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.

To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
the match to try to fit RIP into the base register. If it fails, it
now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
as it did before. The reason these immediates are safe is because the
ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
This is the only case changed by the patch, and the primary place you
see it is in TLS, either the win64 section offset TLS or Linux
local-exec TLS model in a PIC compilation. Here the ABI again ensures
that the immediates fit because we are in small mode, and any other
operations required due to the PIC relocation model have been handled
externally to the Wrapper node (extra loads etc are made around the
wrapper node in ISelLowering).

I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.

llvm-svn: 154304

3779ac10

Fold 15 tiny test cases into a single file that implements the · 84b83426

Chandler Carruth authored Apr 09, 2012

comprehensive testing of TLS codegen for x86. Convert all of the ones
that were still using grep to use FileCheck. Remove some redundancies
between them.

Perhaps most interestingly expand the test cases so that they actually
fully list the instruction snippet being tested. TLS operations are
*very* narrowly defined, and so these seem reasonably stable. More
importantly, the existing test cases already were crazy fine grained,
expecting specific registers to be allocated. This just clarifies that
no *other* instructions are expected, and fills in some crucial gaps
that weren't being tested at all.

This will make any subsequent changes to TLS much more clear during
review.

llvm-svn: 154303

84b83426

Apr 08, 2012

Only have codegen turn fdiv by a constant into fmul by the reciprocal · 2f1dc381

Duncan Sands authored Apr 08, 2012

when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly.  There is already an IR level transform that does
that, and it does it more carefully.

llvm-svn: 154296

2f1dc381

Teach LLVM about a PIE option which, when enabled on top of PIC, makes · ede4a8aa

Chandler Carruth authored Apr 08, 2012

optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.

I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]

I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.

llvm-svn: 154294

ede4a8aa

AVX2: Build splat vectors by broadcasting a scalar from the constant pool. · 82609df6

Nadav Rotem authored Apr 08, 2012

Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.

llvm-svn: 154284

82609df6

Apr 07, 2012

1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new · 71d07ae5

Nadav Rotem authored Apr 07, 2012

   shuffle node because it could introduce new shuffle nodes that were not
   supported efficiently by the target.

2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
   second shuffle reverses the transformation of the first shuffle.

llvm-svn: 154266

71d07ae5