Commits · 6789f8b6aedf6d7a3b26807a9d37eaa4ad272070 · Roger Ferrer / llvm-epi-0.8

Sep 01, 2010

We have a chance for an optimization. Consider this code: · 6789f8b6

Bill Wendling authored Aug 31, 2010

int x(int t) {
  if (t & 256)
    return -26;
  return 0;
}

We generate this:

     tst.w   r0, #256
     mvn     r0, #25
     it      eq
     moveq   r0, #0

while gcc generates this:

     ands    r0, r0, #256
     it      ne
     mvnne   r0, #25
     bx      lr

Scandalous really!

During ISel time, we can look for this particular pattern. One where we have a
"MOVCC" that uses the flag off of a CMPZ that itself is comparing an AND
instruction to 0. Something like this (greatly simplified):

  %r0 = ISD::AND ...
  ARMISD::CMPZ %r0, 0         @ sets [CPSR]
  %r0 = ARMISD::MOVCC 0, -26  @ reads [CPSR]

All we have to do is convert the "ISD::AND" into an "ARM::ANDS" that sets [CPSR]
when it's zero. The zero value will all ready be in the %r0 register and we only
need to change it if the AND wasn't zero. Easy!

llvm-svn: 112664

6789f8b6

Use x86 specific MOVSLDUP node, add more patterns to match it and remove useless load nodes · 4b56d872
Bruno Cardoso Lopes authored Aug 31, 2010
```
llvm-svn: 112661
```
4b56d872
Use x86 specific MOVSHDUP node and add more patterns to match it · 61996ef8
Bruno Cardoso Lopes authored Aug 31, 2010
```
llvm-svn: 112657
```
61996ef8
And ANDS pattern to match the t2ANDS pattern. · d657d825
Bill Wendling authored Aug 31, 2010
```
llvm-svn: 112654
```
d657d825

Aug 31, 2010
- Make %EFLAGS unallocatable. · 33e9fce2
  Jakob Stoklund Olesen authored Aug 31, 2010
```
No CCR virtual registers should exist, and %EFLAGS is used in ways that can
surprise RegAllocFast.

llvm-svn: 112650
```
  33e9fce2
- Use MOVHLPS node instead of matching using movhlps and movhlps_undef pattern fragments · 5de15ce4
  Bruno Cardoso Lopes authored Aug 31, 2010
```
llvm-svn: 112644
```
  5de15ce4
- Use MOVLHPS and MOVHLPS x86 nodes whenever possible. Also remove some useless nodes · 03e4c353
  Bruno Cardoso Lopes authored Aug 31, 2010
```
llvm-svn: 112642
```
  03e4c353
- SP relative offsets need to be adjusted by the local allocation size when · 9ce9210e
  Jim Grosbach authored Aug 31, 2010
```
determining if they're likely to be in range of the SP when resolving
frame references.

llvm-svn: 112624
```
  9ce9210e
- this assert should just be a condition, since this function is just asking if · 6f6b590b
  Jim Grosbach authored Aug 31, 2010
```
the offset is legally encodable, not actually trying to do the encoding.

llvm-svn: 112622
```
  6f6b590b
- - Cleanup some whitespaces. · b70dc877
  Bill Wendling authored Aug 31, 2010
```
- Convert {0,1} and friends into 0b01, which is identical and more consistent.

llvm-svn: 112593
```
  b70dc877
- Use X86ISD::MOVSS and MOVSD to represent the movl mask pattern, also fix the... · dfd9dd5d
  Bruno Cardoso Lopes authored Aug 31, 2010
```
Use X86ISD::MOVSS and MOVSD to represent the movl mask pattern, also fix the handling of those nodes when seeking for scalars inside vector shuffles

llvm-svn: 112570
```
  dfd9dd5d
- Rewrite slightly so we can expand for floating point types easier. · 901176a7
  Eric Christopher authored Aug 31, 2010
```
llvm-svn: 112568
```
  901176a7
- If we have an unhandled type then assert, we shouldn't get here for · bbd10989
  Eric Christopher authored Aug 30, 2010
```
things we can't handle.

llvm-svn: 112559
```
  bbd10989
- Expand MOVi32imm in ARM mode after regalloc. This provides · 48043d01
  Anton Korobeynikov authored Aug 30, 2010
```
scheduling opportunities (extra instruction can go in between
MOVT / MOVW pair removing the stall).

llvm-svn: 112546
```
  48043d01
- Use the existing T2I_bin_s_irs pattern instead of creating T2I_bin_sw_irs, which · 87bb14c5
  Bill Wendling authored Aug 30, 2010
```
is meant to do exactly the same thing. Thanks to Jim Grosbach for pointing this
out! :-)

llvm-svn: 112538
```
  87bb14c5
Aug 30, 2010

Remember to clear the shadow kill flag at the same time as clearing the real · 4d30f90e

Jakob Stoklund Olesen authored Aug 30, 2010

kill flag.

This could cause duplicate kill flags when the same register was used twice in a
continuous sequence of STRs.

There is no small test case. <rdar://problem/8218046>

llvm-svn: 112534

4d30f90e

Remove NEON vmovn intrinsic, replacing it with vector truncate operations. · 4cd8a126
Bob Wilson authored Aug 30, 2010
```
Auto-upgrade the old intrinsic and update tests.

llvm-svn: 112507
```
4cd8a126

Make ARM add rN, sp, #imm instructions rematerializable. That's how the... · fef37287

Jim Grosbach authored Aug 30, 2010

Make ARM add rN, sp, #imm instructions rematerializable. That's how the address of locals is calculated, so this should
help relieve register pressure a bit. Recalculating the local address is
almost always going to be better than spilling.

llvm-svn: 112503

fef37287

When expanding NEON VST pseudo instructions, if the original super-register · e2f8bdac

Bob Wilson authored Aug 30, 2010

operand is killed, add it to the expanded instruction as an implicit kill
operand instead of marking the individual subregs with kill flags.  This
should work better in general and also handles the case for VST3 where one
of the subregs was not referenced in the expanded instruction and so was
not marked killed.

llvm-svn: 112494

e2f8bdac

Create Thumb2sI_cpsr and T2sI_cpsr. These new classes indicate that CPSR is the · f8dfa461

Bill Wendling authored Aug 30, 2010

optional modified register (instead of reg0). Along with r112461 it will make
sure that the optional define of CPSR is marked as "def" and will thus mark the
instructions using these classes (t2ANDS*) as setting the 's' flag.

llvm-svn: 112462

f8dfa461

Aug 29, 2010
- Fix lowering of INSERT_VECTOR_ELT in SPU. · 1e616572
  Kalle Raiskila authored Aug 29, 2010
```
The IDX was treated as byte index, not element index.

llvm-svn: 112422
```
  1e616572
- Fix whitespaces. No functionality changes. · 8fc2b590
  Bill Wendling authored Aug 29, 2010
```
llvm-svn: 112421
```
  8fc2b590
- Remove NEON vaddl, vaddw, vsubl, and vsubw intrinsics. Instead, use llvm · d0c05488
  Bob Wilson authored Aug 29, 2010
```
IR add/sub operations with one or both operands sign- or zero-extended.
Auto-upgrade the old intrinsics.

llvm-svn: 112416
```
  d0c05488
- A couple of small missed optimizations. · f75de6ea
  Eli Friedman authored Aug 29, 2010
```
llvm-svn: 112411
```
  f75de6ea
- - Add a parameter to T2I_bin_irs for those patterns which set the S bit. · df9ec17d
  Bill Wendling authored Aug 29, 2010
```
- Create T2I_bin_sw_irs to be like T2I_bin_w_irs, but that it sets the S bit.

llvm-svn: 112399
```
  df9ec17d
- add a bunch more common shuffles to the instprinter. · 38ccc8b8
  Chris Lattner authored Aug 29, 2010
```
llvm-svn: 112397
```
  38ccc8b8
- Name ANDflag to ANDS, which is less stupid. · b0dc465c
  Bill Wendling authored Aug 29, 2010
```
llvm-svn: 112395
```
  b0dc465c
- File missing from last commit. · ac64ed09
  Bill Wendling authored Aug 29, 2010
```
llvm-svn: 112394
```
  ac64ed09
- Create an ARMISD::AND node. This node is exactly like the "ARM::AND" node, but · 0a65116c
  Bill Wendling authored Aug 29, 2010
```
it sets the CPSR register.

llvm-svn: 112393
```
  0a65116c
Aug 28, 2010

I have manually decoded the imm field of an insertps one too many · 7a05e6dc

Chris Lattner authored Aug 28, 2010

times.  This patch causes llc and llvm-mc (which both default to
verbose-asm) to print out comments after a few common shuffle 
instructions which indicates the shuffle mask, e.g.:

	insertps	$113, %xmm3, %xmm0     ## xmm0 = zero,xmm0[1,2],xmm3[1]
	unpcklps	%xmm1, %xmm0    ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	pshufd	$1, %xmm1, %xmm1        ## xmm1 = xmm1[1,0,0,0]

This is carefully factored to keep the information extraction (of the
shuffle mask) separate from the printing logic.  I plan to move the
extraction part out somewhere else at some point for other parts of
the x86 backend that want to introspect on the behavior of shuffles.

llvm-svn: 112387

7a05e6dc

fix the buildvector->insertp[sd] logic to not always create a redundant · 94656b1c

Chris Lattner authored Aug 28, 2010

insertp[sd] $0, which is a noop.  Before:

_f32:                                   ## @f32
	pshufd	$1, %xmm1, %xmm2
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm2, %xmm3
	addss	%xmm1, %xmm0
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
	insertps	$0, %xmm0, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

after:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movdqa	%xmm2, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

The extra movs are due to a random (poor) scheduling decision.

llvm-svn: 112379

94656b1c

fix the BuildVector -> unpcklps logic to not do pointless shuffles · bcb6090a

Chris Lattner authored Aug 28, 2010

when the top elements of a vector are undefined.  This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined.  For example, on:

_Complex float f32(_Complex float A, _Complex float B) {
  return A+B;
}

We used to produce (with SSE2, SSE4.1+ uses insertps):

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$16, %xmm2, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	unpcklps	%xmm1, %xmm0
	ret

We now produce:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movaps	%xmm2, %xmm0
	unpcklps	%xmm3, %xmm0
	ret

This implements rdar://8368414

llvm-svn: 112378

bcb6090a

improve comments in the unpcklps generating logic, introduce · 96db6e66

Chris Lattner authored Aug 28, 2010

a new EltStride variable instead of reusing NumElems variable
for a non-obvious purpose.  No functionality change.

llvm-svn: 112377

96db6e66

remove the MSIL backend. It isn't maintained, is buggy, has no testcases · bd244047
Chris Lattner authored Aug 28, 2010
```
and hasn't kept up with ToT.  Approved by Anton.

llvm-svn: 112375
```
bd244047
Use pseudo instructions for VST1 and VST2. · 950882be
Bob Wilson authored Aug 28, 2010
```
llvm-svn: 112357
```
950882be
remove unions from LLVM IR. They are severely buggy and not · 13ee795c
Chris Lattner authored Aug 28, 2010
```
being actively maintained, improved, or extended.

llvm-svn: 112356
```
13ee795c

Clean up the logic of vector shuffles -> vector shifts. · a982aa24

Bruno Cardoso Lopes authored Aug 28, 2010

Also teach this logic how to handle target specific shuffles if
needed, this is necessary while searching recursively for zeroed
scalar elements in vector shuffle operands.

llvm-svn: 112348

a982aa24

We don't need to custom-select VLDMQ and VSTMQ anymore. · 8ee93947
Bob Wilson authored Aug 28, 2010
```
llvm-svn: 112336
```
8ee93947

When merging Thumb2 loads/stores, do not give up when the offset is one of · ca5af129

Bob Wilson authored Aug 27, 2010

the special values that for ARM would be used with IB or DA modes.  Fall
through and consider materializing a new base address is it would be
profitable.

llvm-svn: 112329

ca5af129

Change ARM VFP VLDM/VSTM instructions to use addressing mode #4, just like · 13ce07fa

Bob Wilson authored Aug 27, 2010

all the other LDM/STM instructions. This fixes asm printer crashes when
compiling with -O0. I've changed one of the NEON tests (vst3.ll) to run
with -O0 to check this in the future.

Prior to this change VLDM/VSTM used addressing mode #5, but not really.
The offset field was used to hold a count of the number of registers being
loaded or stored, and the AM5 opcode field was expanded to specify the IA
or DB mode, instead of the standard ADD/SUB specifier. Much of the backend
was not aware of these special cases. The crashes occured when rewriting
a frameindex caused the AM5 offset field to be changed so that it did not
have a valid submode. I don't know exactly what changed to expose this now.
Maybe we've never done much with -O0 and NEON. Regardless, there's no longer
any reason to keep a count of the VLDM/VSTM registers, so we can use
addressing mode #4 and clean things up in a lot of places.

llvm-svn: 112322

13ce07fa