Commits · e40e42181fcc0c9742d6fcb654b0e1dee07a2513 · Roger Ferrer / llvm-epi-0.8

Aug 30, 2010

When expanding NEON VST pseudo instructions, if the original super-register · e2f8bdac

Bob Wilson authored Aug 30, 2010

operand is killed, add it to the expanded instruction as an implicit kill
operand instead of marking the individual subregs with kill flags.  This
should work better in general and also handles the case for VST3 where one
of the subregs was not referenced in the expanded instruction and so was
not marked killed.

llvm-svn: 112494

e2f8bdac

Create Thumb2sI_cpsr and T2sI_cpsr. These new classes indicate that CPSR is the · f8dfa461

Bill Wendling authored Aug 30, 2010

optional modified register (instead of reg0). Along with r112461 it will make
sure that the optional define of CPSR is marked as "def" and will thus mark the
instructions using these classes (t2ANDS*) as setting the 's' flag.

llvm-svn: 112462

f8dfa461

Aug 29, 2010
- Fix lowering of INSERT_VECTOR_ELT in SPU. · 1e616572
  Kalle Raiskila authored Aug 29, 2010
```
The IDX was treated as byte index, not element index.

llvm-svn: 112422
```
  1e616572
- Fix whitespaces. No functionality changes. · 8fc2b590
  Bill Wendling authored Aug 29, 2010
```
llvm-svn: 112421
```
  8fc2b590
- Remove NEON vaddl, vaddw, vsubl, and vsubw intrinsics. Instead, use llvm · d0c05488
  Bob Wilson authored Aug 29, 2010
```
IR add/sub operations with one or both operands sign- or zero-extended.
Auto-upgrade the old intrinsics.

llvm-svn: 112416
```
  d0c05488
- A couple of small missed optimizations. · f75de6ea
  Eli Friedman authored Aug 29, 2010
```
llvm-svn: 112411
```
  f75de6ea
- - Add a parameter to T2I_bin_irs for those patterns which set the S bit. · df9ec17d
  Bill Wendling authored Aug 29, 2010
```
- Create T2I_bin_sw_irs to be like T2I_bin_w_irs, but that it sets the S bit.

llvm-svn: 112399
```
  df9ec17d
- add a bunch more common shuffles to the instprinter. · 38ccc8b8
  Chris Lattner authored Aug 29, 2010
```
llvm-svn: 112397
```
  38ccc8b8
- Name ANDflag to ANDS, which is less stupid. · b0dc465c
  Bill Wendling authored Aug 29, 2010
```
llvm-svn: 112395
```
  b0dc465c
- File missing from last commit. · ac64ed09
  Bill Wendling authored Aug 29, 2010
```
llvm-svn: 112394
```
  ac64ed09
- Create an ARMISD::AND node. This node is exactly like the "ARM::AND" node, but · 0a65116c
  Bill Wendling authored Aug 29, 2010
```
it sets the CPSR register.

llvm-svn: 112393
```
  0a65116c
Aug 28, 2010

I have manually decoded the imm field of an insertps one too many · 7a05e6dc

Chris Lattner authored Aug 28, 2010

times.  This patch causes llc and llvm-mc (which both default to
verbose-asm) to print out comments after a few common shuffle 
instructions which indicates the shuffle mask, e.g.:

	insertps	$113, %xmm3, %xmm0     ## xmm0 = zero,xmm0[1,2],xmm3[1]
	unpcklps	%xmm1, %xmm0    ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	pshufd	$1, %xmm1, %xmm1        ## xmm1 = xmm1[1,0,0,0]

This is carefully factored to keep the information extraction (of the
shuffle mask) separate from the printing logic.  I plan to move the
extraction part out somewhere else at some point for other parts of
the x86 backend that want to introspect on the behavior of shuffles.

llvm-svn: 112387

7a05e6dc

fix the buildvector->insertp[sd] logic to not always create a redundant · 94656b1c

Chris Lattner authored Aug 28, 2010

insertp[sd] $0, which is a noop.  Before:

_f32:                                   ## @f32
	pshufd	$1, %xmm1, %xmm2
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm2, %xmm3
	addss	%xmm1, %xmm0
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
	insertps	$0, %xmm0, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

after:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movdqa	%xmm2, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

The extra movs are due to a random (poor) scheduling decision.

llvm-svn: 112379

94656b1c

fix the BuildVector -> unpcklps logic to not do pointless shuffles · bcb6090a

Chris Lattner authored Aug 28, 2010

when the top elements of a vector are undefined.  This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined.  For example, on:

_Complex float f32(_Complex float A, _Complex float B) {
  return A+B;
}

We used to produce (with SSE2, SSE4.1+ uses insertps):

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$16, %xmm2, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	unpcklps	%xmm1, %xmm0
	ret

We now produce:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movaps	%xmm2, %xmm0
	unpcklps	%xmm3, %xmm0
	ret

This implements rdar://8368414

llvm-svn: 112378

bcb6090a

improve comments in the unpcklps generating logic, introduce · 96db6e66

Chris Lattner authored Aug 28, 2010

a new EltStride variable instead of reusing NumElems variable
for a non-obvious purpose.  No functionality change.

llvm-svn: 112377

96db6e66

remove the MSIL backend. It isn't maintained, is buggy, has no testcases · bd244047
Chris Lattner authored Aug 28, 2010
```
and hasn't kept up with ToT.  Approved by Anton.

llvm-svn: 112375
```
bd244047
Use pseudo instructions for VST1 and VST2. · 950882be
Bob Wilson authored Aug 28, 2010
```
llvm-svn: 112357
```
950882be
remove unions from LLVM IR. They are severely buggy and not · 13ee795c
Chris Lattner authored Aug 28, 2010
```
being actively maintained, improved, or extended.

llvm-svn: 112356
```
13ee795c

Clean up the logic of vector shuffles -> vector shifts. · a982aa24

Bruno Cardoso Lopes authored Aug 28, 2010

Also teach this logic how to handle target specific shuffles if
needed, this is necessary while searching recursively for zeroed
scalar elements in vector shuffle operands.

llvm-svn: 112348

a982aa24

We don't need to custom-select VLDMQ and VSTMQ anymore. · 8ee93947
Bob Wilson authored Aug 28, 2010
```
llvm-svn: 112336
```
8ee93947

When merging Thumb2 loads/stores, do not give up when the offset is one of · ca5af129

Bob Wilson authored Aug 27, 2010

the special values that for ARM would be used with IB or DA modes.  Fall
through and consider materializing a new base address is it would be
profitable.

llvm-svn: 112329

ca5af129

Change ARM VFP VLDM/VSTM instructions to use addressing mode #4, just like · 13ce07fa

Bob Wilson authored Aug 27, 2010

all the other LDM/STM instructions. This fixes asm printer crashes when
compiling with -O0. I've changed one of the NEON tests (vst3.ll) to run
with -O0 to check this in the future.

Prior to this change VLDM/VSTM used addressing mode #5, but not really.
The offset field was used to hold a count of the number of registers being
loaded or stored, and the AM5 opcode field was expanded to specify the IA
or DB mode, instead of the standard ADD/SUB specifier. Much of the backend
was not aware of these special cases. The crashes occured when rewriting
a frameindex caused the AM5 offset field to be changed so that it did not
have a valid submode. I don't know exactly what changed to expose this now.
Maybe we've never done much with -O0 and NEON. Regardless, there's no longer
any reason to keep a count of the VLDM/VSTM registers, so we can use
addressing mode #4 and clean things up in a lot of places.

llvm-svn: 112322

13ce07fa

Aug 27, 2010
- Unsigned value cannot be < 0. · af371b49
  Bob Wilson authored Aug 27, 2010
```
llvm-svn: 112300
```
  af371b49
- Properly handle passing of FP stuff to varargs function on Win64: · c0b36921
  Anton Korobeynikov authored Aug 27, 2010
```
value should be copied to the corresponding shadow reg as well.
Patch by Cameron Esfahani!

llvm-svn: 112262
```
  c0b36921
- X86: Fix an encoding issue with LOCK_ADD64mr, which could lead to very hard to... · 1844a71e
  Daniel Dunbar authored Aug 27, 2010
```
X86: Fix an encoding issue with LOCK_ADD64mr, which could lead to very hard to find miscompiles with the integrated assembler.

llvm-svn: 112250
```
  1844a71e
- Simplify eliminateFrameIndex() interface back down now that PEI doesn't need · 6a770669
  Jim Grosbach authored Aug 26, 2010
```
to try to re-use scavenged frame index reference registers. rdar://8277890

llvm-svn: 112241
```
  6a770669
Aug 26, 2010

tidy up a bit. no functional change. · e82d5b4a
Jim Grosbach authored Aug 26, 2010
```
llvm-svn: 112228
```
e82d5b4a

Turn off the scavenging based frame reg reuse briefly to measure whether it's · 17da9359

Jim Grosbach authored Aug 26, 2010

still having a significant effect. It shouldn't be now that the pre-RA
virtual base reg stuff is in. Assuming that's valididated by the nightly
testers, we can simplify a lot of the PEI frame index code.

llvm-svn: 112220

17da9359

zap the now unused MVT::getIntVectorWithNumElements · e25ba0c7
Bruno Cardoso Lopes authored Aug 26, 2010
```
llvm-svn: 112218
```
e25ba0c7
Use pseudo instructions for VST3. · 97919e9c
Bob Wilson authored Aug 26, 2010
```
llvm-svn: 112208
```
97919e9c
Reapply r112176 without removing the other CMN patterns (that was unintentional). · a9c03f4f
Bill Wendling authored Aug 26, 2010
```
llvm-svn: 112206
```
a9c03f4f
Fix comment typos. · a967c42a
Bob Wilson authored Aug 26, 2010
```
llvm-svn: 112202
```
a967c42a
Restrict the register to tGPR to make sure the str instruction will be · 074d22e1
Jim Grosbach authored Aug 26, 2010
```
encodable as a 16-bit wide instruction.

llvm-svn: 112195
```
074d22e1
Revert r112176; it broke test/CodeGen/Thumb2/thumb2-cmn.ll. · 10b20b2b
Dan Gohman authored Aug 26, 2010
```
llvm-svn: 112191
```
10b20b2b

Reapply r112091 and r111922, support for metadata linking, with a · ca26f790

Dan Gohman authored Aug 26, 2010

fix: add a flag to MapValue and friends which indicates whether
any module-level mappings are being made. In the common case of
inlining, no module-level mappings are needed, so MapValue doesn't
need to examine non-function-local metadata, which can be very
expensive in the case of a large module with really deep metadata
(e.g. a large C++ program compiled with -g).

This flag is a little awkward; perhaps eventually it can be moved
into the ClonedCodeInfo class.

llvm-svn: 112190

ca26f790

There seems to be a (potential) hardware bug with the CMN instruction and · a9a0599b

Bill Wendling authored Aug 26, 2010

comparison with 0. These two pieces of code should give identical results:

  rsbs r1, r1, 0
  cmp  r0, r1
  mov  r0, #0
  it   ls
  mov  r0, #1

and:

  cmn  r0, r1
  mov  r0, #0
  it   ls
  mov  r0, #1

However, the CMN gives the *opposite* result when r1 is 0. This is because the
carry flag is set in the CMP case but not in the CMN case. In short, the CMP
instruction doesn't perform a truncate of the (logical) NOT of 0 plus the value
of r0 and the carry bit (because the "carry bit" parameter to AddWithCarry is
defined as 1 in this case, the carry flag will always be set when r0 >= 0). The
CMN instruction doesn't perform a NOT of 0 so there is never a "carry" when this
AddWithCarry is performed (because the "carry bit" parameter to AddWithCarry is
defined as 0).

The AddWithCarry in the CMP case seems to be relying upon the identity:

  ~x + 1 = -x

However when x is 0 and unsigned, this doesn't hold:

   x = 0
  ~x = 0xFFFF FFFF
  ~x + 1 = 0x1 0000 0000
  (-x = 0) != (0x1 0000 0000 = ~x + 1)

Therefore, we should disable *all* versions of CMN, especially when comparing
against zero, until we can limit when the CMN instruction is used (when we know
that the RHS is not 0) or when we have a hardware fix for this.

(See the ARM docs for the "AddWithCarry" pseudo-code.)

This is related to <rdar://problem/7569620>.

llvm-svn: 112176

a9a0599b

implement SplitVecOp_CONCAT_VECTORS, fixing the included testcase with SSE1. · eb2cc0ce
Chris Lattner authored Aug 26, 2010
```
llvm-svn: 112171
```
eb2cc0ce
Use pseudo instructions for VST1d64Q. · 4cec4497
Bob Wilson authored Aug 26, 2010
```
llvm-svn: 112170
```
4cec4497
fix sse1 only codegen in x86-64 mode, which is something we · cc60609c
Chris Lattner authored Aug 26, 2010
```
apparently try to support.

llvm-svn: 112168
```
cc60609c
remove dead proto · 2d482bb9
Chris Lattner authored Aug 26, 2010
```
llvm-svn: 112131
```
2d482bb9