Commits · ec96cd0690e7f28769fbc2bf7e2c2ba90e84178c · Roger Ferrer / llvm-epi-0.8

Apr 10, 2012
- Test case for PR12495. · ec96cd06
  Lang Hames authored Apr 09, 2012
```
llvm-svn: 154359
```
  ec96cd06
Apr 09, 2012

Have TargetLowering::getPICJumpTableRelocBase return a node that points to the · 8483a6c4
Akira Hatanaka authored Apr 09, 2012
```
GOT if jump table uses 64-bit gp-relative relocation.

llvm-svn: 154341
```
8483a6c4

When performing a truncating store, it's possible to rearrange the data · e0e38f61

Chad Rosier authored Apr 09, 2012

in-register, such that we can use a single vector store rather then a 
series of scalar stores.

For func_4_8 the generated code

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vmov.u16	r0, d16[3]
	strb	r0, [r2, #3]
	vmov.u16	r0, d16[2]
	strb	r0, [r2, #2]
	vmov.u16	r0, d16[1]
	strb	r0, [r2, #1]
	vmov.u16	r0, d16[0]
	strb	r0, [r2]
	bx	lr

becomes

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vuzp.8	d16, d17
	vst1.32	{d16[0]}, [r2, :32]
	bx	lr

I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll,
but I couldn't think of a way to judiciously apply this combine.

This

	ldrh	r0, [r0, #4]
	strh	r0, [r1]

becomes

	vldr	d16, [r0]
	vmov.u16	r0, d16[2]
	vmov.32	d16[0], r0
	vuzp.16	d16, d17
	vst1.32	{d16[0]}, [r1, :32]

PR11158
rdar://10703339

llvm-svn: 154340

e0e38f61

Pattern match a setcc of boolean value with 0 as a truncate. · 8f62b324
Rafael Espindola authored Apr 09, 2012
```
llvm-svn: 154322
```
8f62b324
Lower some x86 shuffle sequences to the vblend family of instructions. · fb7e2ae5
Nadav Rotem authored Apr 09, 2012
```
llvm-svn: 154313
```
fb7e2ae5
Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. · b801ca39
Nadav Rotem authored Apr 09, 2012
```
Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering.

llvm-svn: 154310
```
b801ca39

Cleanup and relax a restriction on the matching of global offsets into · 3779ac10

Chandler Carruth authored Apr 09, 2012

x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.

To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
the match to try to fit RIP into the base register. If it fails, it
now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
as it did before. The reason these immediates are safe is because the
ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
This is the only case changed by the patch, and the primary place you
see it is in TLS, either the win64 section offset TLS or Linux
local-exec TLS model in a PIC compilation. Here the ABI again ensures
that the immediates fit because we are in small mode, and any other
operations required due to the PIC relocation model have been handled
externally to the Wrapper node (extra loads etc are made around the
wrapper node in ISelLowering).

I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.

llvm-svn: 154304

3779ac10

Fold 15 tiny test cases into a single file that implements the · 84b83426

Chandler Carruth authored Apr 09, 2012

comprehensive testing of TLS codegen for x86. Convert all of the ones
that were still using grep to use FileCheck. Remove some redundancies
between them.

Perhaps most interestingly expand the test cases so that they actually
fully list the instruction snippet being tested. TLS operations are
*very* narrowly defined, and so these seem reasonably stable. More
importantly, the existing test cases already were crazy fine grained,
expecting specific registers to be allocated. This just clarifies that
no *other* instructions are expected, and fills in some crucial gaps
that weren't being tested at all.

This will make any subsequent changes to TLS much more clear during
review.

llvm-svn: 154303

84b83426

Apr 08, 2012

Only have codegen turn fdiv by a constant into fmul by the reciprocal · 2f1dc381

Duncan Sands authored Apr 08, 2012

when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly.  There is already an IR level transform that does
that, and it does it more carefully.

llvm-svn: 154296

2f1dc381

Teach LLVM about a PIE option which, when enabled on top of PIC, makes · ede4a8aa

Chandler Carruth authored Apr 08, 2012

optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.

I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]

I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.

llvm-svn: 154294

ede4a8aa

AVX2: Build splat vectors by broadcasting a scalar from the constant pool. · 82609df6

Nadav Rotem authored Apr 08, 2012

Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.

llvm-svn: 154284

82609df6

Apr 07, 2012

1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new · 71d07ae5

Nadav Rotem authored Apr 07, 2012

   shuffle node because it could introduce new shuffle nodes that were not
   supported efficiently by the target.

2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
   second shuffle reverses the transformation of the first shuffle.

llvm-svn: 154266

71d07ae5

Convert floating point division by a constant into multiplication by the · 5f8397a9

Duncan Sands authored Apr 07, 2012

reciprocal if converting to the reciprocal is exact.  Do it even if inexact
if -ffast-math.  This substantially speeds up ac.f90 from the polyhedron
benchmarks.

llvm-svn: 154265

5f8397a9

Make the test for r154235 more platform-independent with a shorter · 78fce432
Alexis Hunt authored Apr 07, 2012
```
string.

llvm-svn: 154243
```
78fce432

Output UTF-8-encoded characters as identifier characters into assembly · 0235f684

Alexis Hunt authored Apr 07, 2012

by default.

This is a behaviour configurable in the MCAsmInfo. I've decided to turn
it on by default in (possibly optimistic) hopes that most assemblers are
reasonably sane. If this proves a problem, switching to default seems
reasonable.

I'm not sure if this is the opportune place to test, but it seemed good
to make sure it was tested somewhere.

llvm-svn: 154235

0235f684

Apr 06, 2012
- Add lines in global-address.ll to test N32 and N64 code generation. · 487e5676
  Akira Hatanaka authored Apr 06, 2012
```
llvm-svn: 154202
```
  487e5676
- Allow negative immediates in ARM and Thumb2 compares. · 967b86a0
  Jakob Stoklund Olesen authored Apr 06, 2012
```
ARM and Thumb2 mode can use cmn instructions to compare against negative
immediates. Thumb1 mode can't.

llvm-svn: 154183
```
  967b86a0
- Test case for PR12413 · bdc9f071
  Craig Topper authored Apr 06, 2012
```
llvm-svn: 154172
```
  bdc9f071
- Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a... · 447417c9
  Craig Topper authored Apr 06, 2012
```
Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413.

llvm-svn: 154166
```
  447417c9
Apr 05, 2012

Reapply test case in 154038, this time with triple to prevent the backend · 43fb2b2c
Akira Hatanaka authored Apr 05, 2012
```
from emitting gp_rel relocation.

llvm-svn: 154122
```
43fb2b2c

Don't break the IV update in TLI::SimplifySetCC(). · 37492eac

Jakob Stoklund Olesen authored Apr 05, 2012

LSR always tries to make the ICmp in the loop latch use the incremented
induction variable. This allows the induction variable to be kept in a
single register.

When the induction variable limit is equal to the stride,
SimplifySetCC() would break LSR's hard work by transforming:

   (icmp (add iv, stride), stride) --> (cmp iv, 0)

This forced us to use lea for the IC update, preventing the simpler
incl+cmp.

<rdar://problem/7643606>
<rdar://problem/11184260>

llvm-svn: 154119

37492eac

An oversight when applying the patches for r150956 and r150957 to a vanilla... · 1ea64736

James Molloy authored Apr 05, 2012

An oversight when applying the patches for r150956 and r150957 to a vanilla tree meant I forgot to svn add these testcases.

Noticed while investigating PR12274!

llvm-svn: 154090

1ea64736

Pass the right sign to TLI->isLegalICmpImmediate. · f2390e83

Jakob Stoklund Olesen authored Apr 05, 2012

LSR can fold three addressing modes into its ICmpZero node:

  ICmpZero BaseReg + Offset      => ICmp BaseReg, -Offset
  ICmpZero -1*ScaleReg + Offset  => ICmp ScaleReg, Offset
  ICmpZero BaseReg + -1*ScaleReg => ICmp BaseReg, ScaleReg

The first two cases are only used if TLI->isLegalICmpImmediate() likes
the offset.

Make sure the right Offset sign is passed to this method in the second
case. The ARM version is not symmetric.

<rdar://problem/11184260>

llvm-svn: 154079

f2390e83

Reapply 154038 without the failing test. · 121342fc
Akira Hatanaka authored Apr 04, 2012
```
llvm-svn: 154062
```
121342fc

Apr 04, 2012

Revert r154038. It was causing make check failures. · 4743c6e1
Owen Anderson authored Apr 04, 2012
```
llvm-svn: 154054
```
4743c6e1
Fix LowerGlobalAddress to produce instructions with the correct relocation · 9705c865
Akira Hatanaka authored Apr 04, 2012
```
types for N32 ABI. Add new test case and update existing ones.

llvm-svn: 154038
```
9705c865
Fix LowerConstantPool to produce instructions with the correct relocation · b3a2b8c1
Akira Hatanaka authored Apr 04, 2012
```
types for N32 ABI and update test case.

llvm-svn: 154034
```
b3a2b8c1

Implement ARMBaseInstrInfo::commuteInstruction() for MOVCCr. · 0a5b72f0

Jakob Stoklund Olesen authored Apr 04, 2012

A MOVCCr instruction can be commuted by inverting the condition. This
can help reduce register pressure and remove unnecessary copies in some
cases.

<rdar://problem/11182914>

llvm-svn: 154033

0a5b72f0

Fix LowerBlockAddress to produce instructions with the correct relocation · aeff24e4
Akira Hatanaka authored Apr 04, 2012
```
types for N32 ABI and update test case.

llvm-svn: 154031
```
aeff24e4

Add VSELECT to LegalizeVectorTypes::ScalariseVectorResult. Previously it... · 9511ec86

Pete Cooper authored Apr 03, 2012

Add VSELECT to LegalizeVectorTypes::ScalariseVectorResult.  Previously it would crash if it encountered a 1 element VSELECT.  Solution is slightly more complicated than just creating a SELET as we have to mask or sign extend the vector condition if it had different boolean contents from the scalar condition.  Fixes <rdar://problem/11178095>

llvm-svn: 153976

9511ec86

Apr 03, 2012
- Add an additional testcase which checks ops with multiple users. · 269703f9
  Nadav Rotem authored Apr 03, 2012
```
llvm-svn: 153939
```
  269703f9
- Allocate virtual registers in ascending order. · 291007b0
  Jakob Stoklund Olesen authored Apr 02, 2012
```
This is just the fallback tie-breaker ordering, the main allocation
order is still descending size.

Patch by Shamil Kurmangaleev!

llvm-svn: 153904
```
  291007b0
Apr 02, 2012
- During two-address lowering, rescheduling an instruction does not untie · aaafacd0
  Lang Hames authored Apr 02, 2012
```
operands. Make TryInstructionTransform return false to reflect this.
Fixes PR11861.

llvm-svn: 153892
```
  aaafacd0
- No need to run llvm-as. · 2e5c58e7
  Rafael Espindola authored Apr 02, 2012
```
llvm-svn: 153890
```
  2e5c58e7
- Optimizing swizzles of complex shuffles may generate additional complex shuffles. · 702f0807
  Nadav Rotem authored Apr 02, 2012
```
Do not try to optimize swizzles of shuffles if the source shuffle has more than
a single user, except when the source shuffle is also a swizzle.

llvm-svn: 153864
```
  702f0807
Apr 01, 2012

Enable prefetch generation on PPC64. · 322e41a9
Hal Finkel authored Apr 01, 2012
```
llvm-svn: 153851
```
322e41a9

This commit contains a few changes that had to go in together. · b0783508

Nadav Rotem authored Apr 01, 2012

1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B))
   (and also scalar_to_vector).

2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src).
   Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B))

3. Optimize swizzles of shuffles:  shuff(shuff(x, y), undef) -> shuff(x, y).

4. Fix an X86ISelLowering optimization which was very bitcast-sensitive.

Code which was previously compiled to this:

movd    (%rsi), %xmm0
movdqa  .LCPI0_0(%rip), %xmm2
pshufb  %xmm2, %xmm0
movd    (%rdi), %xmm1
pshufb  %xmm2, %xmm1
pxor    %xmm0, %xmm1
pshufb  .LCPI0_1(%rip), %xmm1
movd    %xmm1, (%rdi)
ret

Now compiles to this:

movl    (%rsi), %eax
xorl    %eax, (%rdi)
ret

llvm-svn: 153848

b0783508

Add instruction itinerary for the PPC64 A2 core. · 9f9f8929

Hal Finkel authored Apr 01, 2012

This adds a full itinerary for IBM's PPC64 A2 embedded core. These
cores form the basis for the CPUs in the new IBM BG/Q supercomputer.

llvm-svn: 153842

9f9f8929

Mar 31, 2012

Add a triple to the test. · 77242fa7
Rafael Espindola authored Mar 31, 2012
```
llvm-svn: 153818
```
77242fa7

Teach CodeGen's version of computeMaskedBits to understand the range metadata. · 80c540e6

Rafael Espindola authored Mar 31, 2012

This is the CodeGen equivalent of r153747. I tested that there is not noticeable
performance difference with any combination of -O0/-O2 /-g when compiling
gcc as a single compilation unit.

llvm-svn: 153817

80c540e6