Commits · 72f18bbcffe3a57fc8f23c2f4e5aa5779eec0425 · Roger Ferrer / llvm-epi-0.8

Apr 12, 2012

Fixed a case of ARM disassembly getting an assert on a bad encoding · 72f18bbc
Kevin Enderby authored Apr 11, 2012
```
of a VST instruction.

llvm-svn: 154544
```
72f18bbc

Fix bugs in lowering of FCOPYSIGN nodes. · 4f5c8421

Akira Hatanaka authored Apr 11, 2012

- FCOPYSIGN nodes that have operands of different types were not handled.
- Different code was generated depending on the endianness of the target.

Additionally, code is added that emits INS and EXT instructions, if they are
supported by target (they are R2 instructions).

llvm-svn: 154540

4f5c8421

Apr 11, 2012

ARM 'vuzp.32 Dd, Dm' is a pseudo-instruction. · 6e536de1

Jim Grosbach authored Apr 11, 2012

While there is an encoding for it in VUZP, the result of that is undefined,
so we should avoid it. Define the instruction as a pseudo for VTRN.32
instead, as the ARM ARM indicates.

rdar://11222366

llvm-svn: 154511

6e536de1

ARM 'vzip.32 Dd, Dm' is a pseudo-instruction. · 4640c816

Jim Grosbach authored Apr 11, 2012

While there is an encoding for it in VZIP, the result of that is undefined,
so we should avoid it. Define the instruction as a pseudo for VTRN.32
instead, as the ARM ARM indicates.

rdar://11221911

llvm-svn: 154505

4640c816

remove unused argument · 372cf151
Nadav Rotem authored Apr 11, 2012
```
llvm-svn: 154494
```
372cf151
Add a C binding to the Target and TargetMachine classes to allow for emitting · 264d2e71
Duncan Sands authored Apr 11, 2012
```
binary and assembly. Patch by Carlo Kok.  Emitting was inspired by but not based
on the D llvm bindings. 

llvm-svn: 154493
```
264d2e71
Add more fused mul+add/sub patterns. rdar://10139676 · 5efc4422
Evan Cheng authored Apr 11, 2012
```
llvm-svn: 154484
```
5efc4422

Reapply 154396 after fixing a test. · 9bc178ac

Nadav Rotem authored Apr 11, 2012

Original message:
Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendV uses a register for the selection while Vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154483

9bc178ac

Clean up ARM fused multiply + add/sub support some more: rename some isel · 48346c1c

Evan Cheng authored Apr 11, 2012

predicates.
Also remove NEON2 since it's not really useful and it is confusing. If
NEON + VFP4 implies NEON2 but NEON2 doesn't imply NEON + VFP4, what does it
really mean?

rdar://10139676

llvm-svn: 154480

48346c1c

Match (fneg (fma) to vfnma. rdar://10139676 · 67a09fc3
Evan Cheng authored Apr 11, 2012
```
llvm-svn: 154469
```
67a09fc3
Add retw and lretw instructions. Also, fix Intel syntax parsing for all · 74c282b5
Charles Davis authored Apr 11, 2012
```
ret instructions.

llvm-svn: 154468
```
74c282b5
Fix ARM disassembly of VLD instructions with writebacks. And add test a case · d2980cd0
Kevin Enderby authored Apr 11, 2012
```
for all opcodes handed by DecodeVLDInstruction() in ARMDisassembler.cpp .

llvm-svn: 154459
```
d2980cd0
ARM add missing Thumb1 two-operand aliases for shift-by-immediate. · ad66de15
Jim Grosbach authored Apr 11, 2012
```
rdar://11222742

llvm-svn: 154457
```
ad66de15

Fix a number of problems with ARM fused multiply add/subtract instructions. · aca6c822

Evan Cheng authored Apr 11, 2012

1. The new instruction itinerary entries are not properly described.
2. The asm parser can't handle vfms and vfnms.
3. There were no assembler, disassembler test cases.
4. HasNEON2 has the wrong assembler predicate.
rdar://10139676

llvm-svn: 154456

aca6c822

Apr 10, 2012

Handle llvm.fma.* intrinsics. rdar://10914096 · d0007f3c
Evan Cheng authored Apr 10, 2012
```
llvm-svn: 154439
```
d0007f3c
Whitespace. · f7345b02
Chad Rosier authored Apr 10, 2012
```
llvm-svn: 154427
```
f7345b02
Revert r154396, which looks to be the real culprit behind the bot failures. · 235a7a17
Chad Rosier authored Apr 10, 2012
```
llvm-svn: 154426
```
235a7a17
Temporarily revert this patch to see if it brings the buildbots back. · 65ada95b
Eric Christopher authored Apr 10, 2012
```
llvm-svn: 154425
```
65ada95b

ARM fix cc_out operand handling for t2SUBrr instructions. · df5a2447

Jim Grosbach authored Apr 10, 2012

We were incorrectly conflating some add variants which don't have a
cc_out operand with the mirroring sub encodings, which do. Part of the
awesome non-orthogonality legacy of thumb1. Similarly, handling of
add/sub of an immediate was sometimes incorrectly removing the cc_out
operand for add/sub register variants.

rdar://11216577

llvm-svn: 154411

df5a2447

Remove unused variable. · 27351366
David Blaikie authored Apr 10, 2012
```
llvm-svn: 154398
```
27351366

Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. · f934f917

Nadav Rotem authored Apr 10, 2012

blendv uses a register for the selection while vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154396

f934f917

Fix a long standing tail call optimization bug. When a libcall is emitted · f8bad080

Evan Cheng authored Apr 10, 2012

legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.

PR12419
rdar://9770785
rdar://11195178

llvm-svn: 154370

f8bad080

ARM LDR/LDRT has the same encoding collision as STR/STRT. · 8f99bc3a
Jim Grosbach authored Apr 10, 2012
```
Generalized logic of r154141.

llvm-svn: 154362
```
8f99bc3a

Apr 09, 2012

When performing a truncating store, it's possible to rearrange the data · e0e38f61

Chad Rosier authored Apr 09, 2012

in-register, such that we can use a single vector store rather then a 
series of scalar stores.

For func_4_8 the generated code

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vmov.u16	r0, d16[3]
	strb	r0, [r2, #3]
	vmov.u16	r0, d16[2]
	strb	r0, [r2, #2]
	vmov.u16	r0, d16[1]
	strb	r0, [r2, #1]
	vmov.u16	r0, d16[0]
	strb	r0, [r2]
	bx	lr

becomes

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vuzp.8	d16, d17
	vst1.32	{d16[0]}, [r2, :32]
	bx	lr

I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll,
but I couldn't think of a way to judiciously apply this combine.

This

	ldrh	r0, [r0, #4]
	strh	r0, [r1]

becomes

	vldr	d16, [r0]
	vmov.u16	r0, d16[2]
	vmov.32	d16[0], r0
	vuzp.16	d16, d17
	vst1.32	{d16[0]}, [r1, :32]

PR11158
rdar://10703339

llvm-svn: 154340

e0e38f61

Update comments and remove unnecessary isVolatile() check. · 99cbde9e
Chad Rosier authored Apr 09, 2012
```
llvm-svn: 154336
```
99cbde9e

Fix accidentally constant conditions found by uncommitted improvements to -Wconstant-conversion. · e6b6fae8

David Blaikie authored Apr 09, 2012

A couple of cases where we were accidentally creating constant conditions by
something like "x == a || b" instead of "x == a || x == b". In one case a
conditional & then unreachable was used - I transformed this into a direct
assert instead.

llvm-svn: 154324

e6b6fae8

This patch adds X86 instruction itineraries, which were missed by the · 2eec3672
Preston Gurd authored Apr 09, 2012
```
original patch to add itineraries, to X86InstrArithmetc.td.  

llvm-svn: 154320
```
2eec3672
Lower some x86 shuffle sequences to the vblend family of instructions. · fb7e2ae5
Nadav Rotem authored Apr 09, 2012
```
llvm-svn: 154313
```
fb7e2ae5
Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. · b801ca39
Nadav Rotem authored Apr 09, 2012
```
Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering.

llvm-svn: 154310
```
b801ca39

Cleanup and relax a restriction on the matching of global offsets into · 3779ac10

Chandler Carruth authored Apr 09, 2012

x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.

To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
the match to try to fit RIP into the base register. If it fails, it
now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
as it did before. The reason these immediates are safe is because the
ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
This is the only case changed by the patch, and the primary place you
see it is in TLS, either the win64 section offset TLS or Linux
local-exec TLS model in a PIC compilation. Here the ABI again ensures
that the immediates fit because we are in small mode, and any other
operations required due to the PIC relocation model have been handled
externally to the Wrapper node (extra loads etc are made around the
wrapper node in ISelLowering).

I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.

llvm-svn: 154304

3779ac10

Apr 08, 2012

Teach LLVM about a PIE option which, when enabled on top of PIC, makes · ede4a8aa

Chandler Carruth authored Apr 08, 2012

optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.

I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]

I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.

llvm-svn: 154294

ede4a8aa

Move the TLSModel information into the TargetMachine rather than hiding · 16f0ebcb

Chandler Carruth authored Apr 08, 2012

in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.

llvm-svn: 154292

16f0ebcb

AVX2: Build splat vectors by broadcasting a scalar from the constant pool. · 82609df6

Nadav Rotem authored Apr 08, 2012

Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.

llvm-svn: 154284

82609df6

Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and... · d024cef2

Craig Topper authored Apr 07, 2012

Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1.

llvm-svn: 154272

d024cef2

Apr 07, 2012

Move vinsertf128 patterns near the instruction definitions. Add... · aa9aab5a

Craig Topper authored Apr 07, 2012

Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns.

llvm-svn: 154268

aa9aab5a

Fix Thumb __builtin_longjmp with integrated assembler. <rdar://problem/11203543 > · 6f9be7e2

Bob Wilson authored Apr 07, 2012

The tLDRr instruction with the last register operand set to the zero register
prints in assembly as if no register was specified, and the assembler encodes
it as a tLDRi instruction with a zero immediate. With the integrated assembler,
that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which
is broken. Emit the instruction as tLDRi with a zero immediate. I don't
know if there's a good way to write a testcase for this. Suggestions welcome.

Opportunities for follow-up work:
1) The asm printer should complain if a non-optional register operand is set
to the zero register, instead of silently dropping it.
2) The integrated assembler should complain in the same situation, instead of
silently emitting the operand as "r0".

llvm-svn: 154261

6f9be7e2

Target/X86/MCTargetDesc/X86MCAsmInfo.cpp: Enable DwarfCFI (aka DW2) on Cygming. · b95f6413

NAKAMURA Takumi authored Apr 07, 2012

Cygwin-1.7 supports dw2. Some recent mingw distros support one, too.
I have confirmed test-suite/SingleSource/Benchmarks/Shootout-C++/except.cpp can pass on Cygwin.

llvm-svn: 154247

b95f6413

Output UTF-8-encoded characters as identifier characters into assembly · 0235f684

Alexis Hunt authored Apr 07, 2012

by default.

This is a behaviour configurable in the MCAsmInfo. I've decided to turn
it on by default in (possibly optimistic) hopes that most assemblers are
reasonably sane. If this proves a problem, switching to default seems
reasonable.

I'm not sure if this is the opportune place to test, but it seemed good
to make sure it was tested somewhere.

llvm-svn: 154235

0235f684

Tidy up. 80 columns. · 0c509fa6
Jim Grosbach authored Apr 06, 2012
```
llvm-svn: 154226
```
0c509fa6

Apr 06, 2012
- ARMPat is equivalent to Requires<[IsARM]>. · baa35660
  Jakob Stoklund Olesen authored Apr 06, 2012
```
llvm-svn: 154210
```
  baa35660