Commits · e6e48b1490fe79ca8d44718e48f8644a28efa097 · Roger Ferrer / llvm-epi-0.8

Jan 01, 2012
- X86Disassembler: Fix undefined behavior found by GCC 4.6 · 47aecca5
  Benjamin Kramer authored Jan 01, 2012
```
llvm-svn: 147404
```
  47aecca5
- Merge X86 SHUFPS and SHUFPD node types. · 6e54ba7e
  Craig Topper authored Dec 31, 2011
```
llvm-svn: 147394
```
  6e54ba7e
- Add patterns for integer forms of SHUFPD/VSHUFPD with a memory load. · d51092d9
  Craig Topper authored Dec 31, 2011
```
llvm-svn: 147393
```
  d51092d9
- Fix typo in a SHUFPD and VSHUFPD pattern that prevented SHUFPD/VSHUFPD with a... · 0e796fee
  Craig Topper authored Dec 31, 2011
```
Fix typo in a SHUFPD and VSHUFPD pattern that prevented SHUFPD/VSHUFPD with a load from being selected.

llvm-svn: 147392
```
  0e796fee
Dec 30, 2011

Cleanup Mips code and rename some variables. Patch by Jack Carter · cd1d447d
Bruno Cardoso Lopes authored Dec 30, 2011
```
llvm-svn: 147383
```
cd1d447d

Bruno Cardoso Lopes authored Dec 30, 2011

Implement encoder methods getJumpTargetOpValue and getBranchTargetOpValue
for jmptarget and brtarget Mips tablegen operand types in the code emitter
for old-style JIT. Rename the pc relative relocation for branches - new
name is Mips::reloc_mips_pc16.

Patch by Sasa Stankovic

llvm-svn: 147382

d5b2834f

Make FMA4 imply AVX so that YMM registers would be available. Necessitates... · a5d1fc2c

Craig Topper authored Dec 30, 2011

Make FMA4 imply AVX so that YMM registers would be available. Necessitates removing from Bulldozer CPU types since it would enable AVX code generation implicitly. Also make SSE4A imply SSE3. Without some level of SSE implied, XMM registers wouldn't be legal.

llvm-svn: 147369

a5d1fc2c

Add disassembler support for VPERMIL2PD and VPERMIL2PS. · 2ba766ae
Craig Topper authored Dec 30, 2011
```
llvm-svn: 147368
```
2ba766ae
Add FMA4 instructions to disassembler. · 03a0beda
Craig Topper authored Dec 30, 2011
```
llvm-svn: 147367
```
03a0beda

Separate the concept of having memory access in operand 4 from the concept of... · cd93de93

Craig Topper authored Dec 30, 2011

Separate the concept of having memory access in operand 4 from the concept of having the W bit set for XOP instructons. Removes ORing W-bits in the encoder and will similarly simplify the disassembler implementation.

llvm-svn: 147366

cd93de93

Combine FMA4 SS/SD patterns with the instruction definitions. · c0f9bcb5
Craig Topper authored Dec 30, 2011
```
llvm-svn: 147365
```
c0f9bcb5
Combine FMA4 PS/PD patterns with the instruction definitions. · 51fe43fc
Craig Topper authored Dec 30, 2011
```
llvm-svn: 147364
```
51fe43fc

Change FMA4 memory forms to use memopv* instead of alignedloadv*. No need to... · 6c08930c

Craig Topper authored Dec 30, 2011

Change FMA4 memory forms to use memopv* instead of alignedloadv*. No need to force alignment on these instructions. Add a couple testcases for memory forms.

llvm-svn: 147361

6c08930c

Fix load size for FMA4 SS/SD instructions. They need to use f32 and f64 size,... · 2ca79b9d

Craig Topper authored Dec 30, 2011

Fix load size for FMA4 SS/SD instructions. They need to use f32 and f64 size, but with the special handling to be compatible with the intrinsic expecting a vector. Similar handling is already used elsewhere.

llvm-svn: 147360

2ca79b9d

Cleanup stack/frame register define/kill states. This fixes two bugs: · 692d1fb3

Hal Finkel authored Dec 30, 2011

1. The ST*UX instructions that store and update the stack pointer did not set define/kill on R1. This became a problem when I activated post-RA scheduling (and had incorrectly adjusted the Frames-large test).

2. eliminateFrameIndex did not kill its scavenged temporary register, and this could cause the scavenger to exhaust all available registers (and its emergency spill slot) when there were a lot of CR values to spill. The 2010-02-12-saveCR test has been adjusted to check for this.

llvm-svn: 147359

692d1fb3

Dec 29, 2011
- Fix execution domains for PS/PD FMA3 instructions. Add SS/SD forms o FMA3 instructions. · d773607e
  Craig Topper authored Dec 29, 2011
```
llvm-svn: 147353
```
  d773607e
- Expose FMA3 instructions to the disassembler. · 8cab06a2
  Craig Topper authored Dec 29, 2011
```
llvm-svn: 147351
```
  8cab06a2
- Make FMA3 imply AVX needs to be enabled. Particularly because 256-bit types... · e1bd0512
  Craig Topper authored Dec 29, 2011
```
Make FMA3 imply AVX needs to be enabled. Particularly because 256-bit types aren't valid unless AVX is enabled.

llvm-svn: 147349
```
  e1bd0512
- Change XOP detection to use the correct CPUID bit instead of using the FMA4 bit. · dd286a52
  Craig Topper authored Dec 29, 2011
```
llvm-svn: 147348
```
  dd286a52
- Add FeaturePOPCNT to all CPU types that lost it was removed from SSE42/SSE4A in r147339. · a060afb5
  Craig Topper authored Dec 29, 2011
```
llvm-svn: 147347
```
  a060afb5
- Mark non-VEX forms of PCLMUL instructions as requiring SSE2 to be enabled... · 97f05c57
  Craig Topper authored Dec 29, 2011
```
Mark non-VEX forms of PCLMUL instructions as requiring SSE2 to be enabled along with CLMUL. That's required for the XMM registers to be valid for integer data. Doesn't change any behavior since the CLMUL instructions don't have patterns yet.

llvm-svn: 147345
```
  97f05c57
- Mark non-VEX forms of AES instructions as requiring SSE2 to be enabled along... · 1559123c
  Craig Topper authored Dec 29, 2011
```
Mark non-VEX forms of AES instructions as requiring SSE2 to be enabled along with AES. Since that's required for the XMM registers to be valid for integer data. Doesn't change any behavior though since you can't use an intrinsic with an illegal type anyway. Just makes it consistent with the VEX forms.

llvm-svn: 147344
```
  1559123c
- Remove the separate explicit AES instruction patterns. They are equivalent to... · 9e61291b
  Craig Topper authored Dec 29, 2011
```
Remove the separate explicit AES instruction patterns. They are equivalent to the patterns specified by the instructions. Also remove unnecessary bitconverts from the AES patterns.

llvm-svn: 147342
```
  9e61291b
- Make SSE42 and SSE4A not imply POPCNT. POPCNT should be able to be disabled on... · 7bd3305f
  Craig Topper authored Dec 29, 2011
```
Make SSE42 and SSE4A not imply POPCNT. POPCNT should be able to be disabled on its own without disabling SSE4.2 or SSE4A.

llvm-svn: 147339
```
  7bd3305f
- Make LowerBUILD_VECTOR keep node vector types consistent when creating MOVL for v16i16 and v32i8. · 0fdf720d
  Craig Topper authored Dec 29, 2011
```
llvm-svn: 147337
```
  0fdf720d
- Remove some elses after returns. · 862c9b65
  Craig Topper authored Dec 29, 2011
```
llvm-svn: 147336
```
  862c9b65
- Remove trailing spaces. Fix an assert to use && instead of || before string.... · 274e20a4
  Craig Topper authored Dec 29, 2011
```
Remove trailing spaces. Fix an assert to use && instead of || before string. Add same assert on similar code path.

llvm-svn: 147335
```
  274e20a4
Dec 28, 2011
- Fix type-checking for load transformation which is not legal on floating-point types. PR11674. · 3a01ddb7
  Eli Friedman authored Dec 28, 2011
```
llvm-svn: 147323
```
  3a01ddb7
- Fixed a bug in LowerVECTOR_SHUFFLE and LowerBUILD_VECTOR. · b3515a8d
  Elena Demikhovsky authored Dec 28, 2011
```
Matching MOVLP mask for AVX (265-bit vectors) was wrong.
The failure was detected by conformance tests.

llvm-svn: 147308
```
  b3515a8d
Dec 27, 2011

Clean up some Release build warnings. · b668401b
Benjamin Kramer authored Dec 27, 2011
```
llvm-svn: 147289
```
b668401b

Add handling of x86_avx2_pmovmskb to computeMaskedBitsForTargetNode for... · df34d152

Craig Topper authored Dec 27, 2011

Add handling of x86_avx2_pmovmskb to computeMaskedBitsForTargetNode for consistency. Add comments and an assert for BMI instructions to PerformXorCombine since the enabling of the combine is conditional on it, but the function itself isn't.

llvm-svn: 147287

df34d152

Dec 25, 2011
- Sparc: Implement emitFrameIndexDebugValue and getDebugValue Location hooks. · 1fc8263b
  Venkatraman Govindaraju authored Dec 25, 2011
```
llvm-svn: 147269
```
  1fc8263b
Dec 24, 2011

Section relative fixups are a coff concept, not a x86 one. Replace the · a56ab0ed
Rafael Espindola authored Dec 24, 2011
```
x86 specific reloc_coff_secrel32 with a generic FK_SecRel_4.

llvm-svn: 147252
```
a56ab0ed

Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the · a3d54fe0

Chandler Carruth authored Dec 24, 2011

LZCNT instructions are available. Force promotion to i32 to get
a smaller encoding since the fix-ups necessary are just as complex for
either promoted type

We can't do standard promotion for CTLZ when lowering through BSR
because it results in poor code surrounding the 'xor' at the end of this
instruction. Essentially, if we promote the entire CTLZ node to i32, we
end up doing the xor on a 32-bit CTLZ implementation, and then
subtracting appropriately to get back to an i8 value. Instead, our
custom logic just uses the knowledge of the incoming size to compute
a perfect xor. I'd love to know of a way to fix this, but so far I'm
drawing a blank. I suspect the legalizer could be more clever and/or it
could collude with the DAG combiner, but how... ;]

llvm-svn: 147251

a3d54fe0

Add systematic testing for cttz as well, and fix the bug I spotted by · 38ce2445
Chandler Carruth authored Dec 24, 2011
```
inspection earlier.

llvm-svn: 147250
```
38ce2445
Chandler fixed this. · 767bbe48
Benjamin Kramer authored Dec 24, 2011
```
llvm-svn: 147247
```
767bbe48

Expand more when we have a nice 'tzcnt' instruction, to avoid generating · c9fcde23

Chandler Carruth authored Dec 24, 2011

'bsf' instructions here.

This one is actually debatable to my eyes. It's not clear that any chip
implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless
EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding.
Still, this restores the old behavior with 'tzcnt' enabled for now.

llvm-svn: 147246

c9fcde23

Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the · 7e9453e9

Chandler Carruth authored Dec 24, 2011

X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:

  (sizeof(x)*8 - 1) ^ __builtin_clz(x)

Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.

The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.

Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.

These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.

llvm-svn: 147244

7e9453e9

Fix Comments. · 103318e9
Jakob Stoklund Olesen authored Dec 24, 2011
```
llvm-svn: 147238
```
103318e9
Add MachineMemOperands to instructions generated in storeRegToStackSlot or · 1cf75767
Akira Hatanaka authored Dec 24, 2011
```
loadRegFromStackSlot. 

llvm-svn: 147235
```
1cf75767