Commits · 964179cb582e4c8bd52cb9d5c1cf01a2bab08902 · Roger Ferrer / llvm-epi-0.8

Jul 30, 2010

Many Thumb2 instructions can reference the full ARM register set (i.e., · d343166a

Jim Grosbach authored Jul 30, 2010

have 4 bits per register in the operand encoding), but have undefined
behavior when the operand value is 13 or 15 (SP and PC, respectively).
The trivial coalescer in linear scan sometimes will merge a copy from
SP into a subsequent instruction which uses the copy, and if that
instruction cannot legally reference SP, we get bad code such as:
  mls r0,r9,r0,sp
instead of:
  mov r2, sp
  mls r0, r9, r0, r2

This patch adds a new register class for use by Thumb2 that excludes
the problematic registers (SP and PC) and is used instead of GPR
for those operands which cannot legally reference PC or SP. The
trivial coalescer explicitly requires that the register class
of the destination for the COPY instruction contain the source
register for the COPY to be considered for coalescing. This prevents
errant instructions like that above.

PR7499

llvm-svn: 109842

d343166a

Add builtins for ssat/usat, similar to RealView's __ssat and __usat intrinsics. · c4a96c0e
Nate Begeman authored Jul 29, 2010
```
llvm-svn: 109813
```
c4a96c0e

Jul 29, 2010

Refactor ARM-specific DAG combining in preparation for adding some more · 728eb292
Bob Wilson authored Jul 29, 2010
```
transformations.

llvm-svn: 109800
```
728eb292

Implement vector constants which are splat of · 2bff5054

Dale Johannesen authored Jul 29, 2010

integers with mov + vdup.  8003375.  This is
currently disabled by default because LICM will
not hoist a VDUP, so it pessimizes the code if
the construct occurs inside a loop (8248029).

llvm-svn: 109799

2bff5054

Don't assert on an unrecognized BrMiscFrm instruction. · a9bf1b14
Bob Wilson authored Jul 29, 2010
```
PR7745.

llvm-svn: 109788
```
a9bf1b14

Add intrinsics __builtin_arm_qadd & __builtin_arm_qsub to allow access to the... · 7010a71a

Nate Begeman authored Jul 29, 2010

Add intrinsics __builtin_arm_qadd & __builtin_arm_qsub to allow access to the QADD & QSUB instructions.
Behave identically to __qadd & __qsub RealView instruction intrinsics.

llvm-svn: 109770

7010a71a

Revert r109652, and remove the offending assert in loadRegFromStackSlot instead. · ba0e124a

Jakob Stoklund Olesen authored Jul 29, 2010

We do sometimes load from a too small stack slot when dealing with x86 arguments
(varargs and smaller-than-32-bit args). It looks like we know what we are doing
in those cases, so I am going to remove the assert instead of artifically
enlarging stack slot sizes.

The assert in storeRegToStackSlot stays in. We don't want to write beyond the
bounds of a stack slot.

llvm-svn: 109764

ba0e124a

ARM mode version of r109693. Remove incorrect substitution pattern for UXTB16.... · c445a7d2

Jim Grosbach authored Jul 28, 2010

ARM mode version of r109693. Remove incorrect substitution pattern for UXTB16. It wrongly assumed the input shift was actually a rotate. rdar://8240138

llvm-svn: 109696

c445a7d2

Remove incorrect substitution pattern for UXTB16. It wrongly assumed the input... · 716a596c

Jim Grosbach authored Jul 28, 2010

Remove incorrect substitution pattern for UXTB16. It wrongly assumed the input shift was actually a rotate. rdar://8240138

llvm-svn: 109693

716a596c

Remove dead prototype · de0874a4
Jim Grosbach authored Jul 28, 2010
```
llvm-svn: 109691
```
de0874a4

Jul 28, 2010

Create a fixed stack object for varargs that is as large as any register. · f2234fbe

Jakob Stoklund Olesen authored Jul 28, 2010

The size of this object isn't used for anything - technically it is of variable
size.

This avoids a false positive from the assert in
X86InstrInfo::loadRegFromStackSlot, and fixes PR7735.

llvm-svn: 109652

f2234fbe

Fix this code to avoid decrementing an iterator past the beginning · 1da02dfb
Dan Gohman authored Jul 28, 2010
```
of a std::vector.

llvm-svn: 109597
```
1da02dfb
Do GEP offset calculations with unsigned math rather than signed math · 32f889e5
Dan Gohman authored Jul 28, 2010
```
to avoid undefined behavior on overflow, noticed by John Regehr.

llvm-svn: 109594
```
32f889e5
Implement a vectorized algorithm for <16 x i8> << <16 x i8> · 53afc8f0
Nate Begeman authored Jul 28, 2010
```
This is about 4x faster and smaller than the existing scalarization.

llvm-svn: 109566
```
53afc8f0

~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller... · 269a6da0

Nate Begeman authored Jul 27, 2010

~40% faster vector shl <4 x i32> on SSE 4.1  Larger improvements for smaller types coming in future patches.

For:

define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp {
entry:
  %shl = shl <4 x i32> %r, %a                     ; <<4 x i32>> [#uses=1]
  %tmp2 = bitcast <4 x i32> %shl to <2 x i64>     ; <<2 x i64>> [#uses=1]
  ret <2 x i64> %tmp2
}

We get:

_shl:                                   ## @shl
	pslld	$23, %xmm1
	paddd	LCPI0_0, %xmm1
	cvttps2dq	%xmm1, %xmm1
	pmulld	%xmm1, %xmm0
	ret

Instead of:

_shl:                                   ## @shl
	pshufd	$3, %xmm0, %xmm2
	movd	%xmm2, %eax
	pshufd	$3, %xmm1, %xmm2
	movd	%xmm2, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	pshufd	$1, %xmm0, %xmm3
	movd	%xmm3, %eax
	pshufd	$1, %xmm1, %xmm3
	movd	%xmm3, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm3
	punpckldq	%xmm2, %xmm3
	movd	%xmm0, %eax
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	movhlps	%xmm0, %xmm0
	movd	%xmm0, %eax
	movhlps	%xmm1, %xmm1
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm0
	punpckldq	%xmm0, %xmm2
	movdqa	%xmm2, %xmm0
	punpckldq	%xmm3, %xmm0
	ret

llvm-svn: 109549

269a6da0

Jul 27, 2010
- Make MC use Windows COFF on Windows and add tests. · f8270bdb
  Michael J. Spencer authored Jul 27, 2010
```
llvm-svn: 109494
```
  f8270bdb
- The isLoadFromStackSlot and isStoreToStackSlot have no way of reporting · 96a890a7
  Jakob Stoklund Olesen authored Jul 27, 2010
```
subregister operands like this:

%reg1040:sub_32bit<def> = MOV32rm <fi#-2>, 1, %reg0, 0, %reg0, %reg1040<imp-def>; mem:LD4[FixedStack-2](align=8)

Make them return false when subreg operands are present. VirtRegRewriter is
making bad assumptions otherwise.

This fixes PR7713.

llvm-svn: 109489
```
  96a890a7
- Add assertions that expose the PR7713 miscompilation: Accessing a stack slot · c3c05ed0
  Jakob Stoklund Olesen authored Jul 27, 2010
```
with a too-big register class.

llvm-svn: 109488
```
  c3c05ed0
- And a bit more non-ASCII stuff. · f902befe
  Eli Friedman authored Jul 26, 2010
```
llvm-svn: 109458
```
  f902befe
- Drop some non-ascii stuff · 1e0d76bf
  Anton Korobeynikov authored Jul 26, 2010
```
llvm-svn: 109456
```
  1e0d76bf
Jul 26, 2010

On x86, f32 / f64 nodes share the same registers as 128-bit vector values. · d4218b87
Evan Cheng authored Jul 26, 2010
```
llvm-svn: 109450
```
d4218b87
Add a note · b61a6f27
Anton Korobeynikov authored Jul 26, 2010
```
llvm-svn: 109448
```
b61a6f27

Temporary hack to let codegen assert or generate poor code in case · 36c2ea6c

Bruno Cardoso Lopes authored Jul 26, 2010

we are using AVX and no AVX version of the desired intruction is present,
this is better for incremental dev (without fallbacks it's easier to spot
what's missing). Not sure this is the best hack thought (we can also disable
all HasSSE* predicates by dinamically marking them 'false' if AVX is present)

llvm-svn: 109434

36c2ea6c

Currently EH lowering code expects typeinfo to be global only. · 6bcea068

Anton Korobeynikov authored Jul 26, 2010

This assumption is not satisfied due to global mergeing.
Workaround the issue by temporary disablinge mergeing of const globals.
Also, ignore LLVM "special" globals. This fixes PR7716

llvm-svn: 109423

6bcea068

ARM fastisel isn't ready. · 23b05d1c
Evan Cheng authored Jul 26, 2010
```
llvm-svn: 109421
```
23b05d1c

Jul 25, 2010
- Remove extraneous semicolon · 8f452bc2
  Douglas Gregor authored Jul 25, 2010
```
llvm-svn: 109373
```
  8f452bc2
- Unbreak CMake build · 8fcfe7aa
  Douglas Gregor authored Jul 25, 2010
```
llvm-svn: 109372
```
  8fcfe7aa
Jul 24, 2010

Hook in GlobalMerge pass · 19edda03
Anton Korobeynikov authored Jul 24, 2010
```
llvm-svn: 109359
```
19edda03

Add an ILP scheduler. This is a register pressure aware scheduler that's · 37b740c4

Evan Cheng authored Jul 24, 2010

appropriate for targets without detailed instruction iterineries.
The scheduler schedules for increased instruction level parallelism in
low register pressure situation; it schedules to reduce register pressure
when the register pressure becomes high.

On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2
by 16%.

llvm-svn: 109300

37b740c4

Support x86 "eiz" and "riz" pseudo index registers in the assembler. · 306a1f97
Bruno Cardoso Lopes authored Jul 24, 2010
```
llvm-svn: 109295
```
306a1f97

Use the appropriate register class for an i32 when adding ARM::LR to the · 0acbcb1a

Jim Grosbach authored Jul 23, 2010

function live in set. This will give us tGPR for Thumb1 and GPR otherwise,
so the copy will be spillable. rdar://8224931

llvm-svn: 109293

0acbcb1a

Revert 109076. It is wrong and was causing regressions. Add some · c17dd579

Dale Johannesen authored Jul 23, 2010

comments explaining why it was wrong.  8225024.

Fix the real problem in 8213383: the code that splits very large
blocks when no other place to put constants can be found was not
considering the case that the block contained a Thumb tablejump.

llvm-svn: 109282

c17dd579

- Allow target to specify when is register pressure "too high". In most cases, · df907f45

Evan Cheng authored Jul 23, 2010

  it's too late to start backing off aggressive latency scheduling when most
  of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
  For ARM, this is almost always a win on # of instructions. It's runtime
  neutral for most of the tests. But for some kernels with high register
  pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
  54 and sped up by 20%.

llvm-svn: 109279

df907f45

Remove trailing whitespace · d65cd1d5
Bruno Cardoso Lopes authored Jul 23, 2010
```
llvm-svn: 109276
```
d65cd1d5

Jul 23, 2010
- Add AVX version of CLMUL instructions · ea0e05a3
  Bruno Cardoso Lopes authored Jul 23, 2010
```
llvm-svn: 109248
```
  ea0e05a3
- fix constness warnings · 4ad72717
  Gabor Greif authored Jul 23, 2010
```
llvm-svn: 109224
```
  4ad72717
- do not (implicitly) dereference iterator many times, cache it instead · a1e9c983
  Gabor Greif authored Jul 23, 2010
```
llvm-svn: 109222
```
  a1e9c983
- Declare CLMUL as a subtarget feature · d618c8ac
  Bruno Cardoso Lopes authored Jul 23, 2010
```
llvm-svn: 109207
```
  d618c8ac
- Add x86 CLMUL (Carry-less multiplication) cpu feature · 09dc24be
  Bruno Cardoso Lopes authored Jul 23, 2010
```
llvm-svn: 109206
```
  09dc24be
- Add complete assembler support for FMA3 instructions, with descriptions and... · acd9230b
  Bruno Cardoso Lopes authored Jul 23, 2010
```
Add complete assembler support for FMA3 instructions, with descriptions and encodings taken from the AVX manual

llvm-svn: 109204
```
  acd9230b