Commits · 85ba59eade4ef710f5da9f5a42e601d699e82f78 · Roger Ferrer / llvm-epi-0.8

Mar 12, 2009

Move 3 "(add (select cc, 0, c), x) -> (select cc, x, (add, x, c))" · 4147f08e

Chris Lattner authored Mar 12, 2009

related transformations out of target-specific dag combine into the
ARM backend.  These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).

Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0  -> (zext(cond) << 3).  This happens frequently
with the recently added cp constant select optimization, but is a
very general xform.  For example, we now compile the second example
in const-select.ll to:

_test:
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        seta    %al
        movzbl  %al, %eax
        movl    4(%esp), %ecx
        movsbl  (%ecx,%eax,4), %eax
        ret

instead of:

_test:
        movl    4(%esp), %eax
        leal    4(%eax), %ecx
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        cmovbe  %eax, %ecx
        movsbl  (%ecx), %eax
        ret

This passes multisource and dejagnu.

llvm-svn: 66779

4147f08e

improve comment. · a492d29c
Chris Lattner authored Mar 12, 2009
```
llvm-svn: 66778
```
a492d29c
On x86, if the only use of a i64 load is a i64 store, generate a pair of... · ef0b7cc2
Evan Cheng authored Mar 12, 2009
```
On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead.

llvm-svn: 66776
```
ef0b7cc2
Revert r66024. The JIT encoding for CALLpcrel32 is wrong -- see PR3773, and the · 5637df37
Dan Gohman authored Mar 11, 2009
```
assembly text output uses an indirect call ("call *") instead of a direct call.

llvm-svn: 66735
```
5637df37

Mar 11, 2009
- optimize i8 and i16 tls values. · 294943c9
  Rafael Espindola authored Mar 11, 2009
```
llvm-svn: 66725
```
  294943c9
- Add a -no-implicit-float flag. This acts like -soft-float, but may generate · 42adc73a
  Bill Wendling authored Mar 11, 2009
```
floating point instructions that are explicitly specified by the user.

llvm-svn: 66719
```
  42adc73a
- It makes no sense to have a ODR version of common · 4581bebf
  Duncan Sands authored Mar 11, 2009
```
linkage, so remove it.

llvm-svn: 66690
```
  4581bebf
- For yonah, fix a vector shuffle case for v16i8 where we didn't properly clear some bits. · 25c6a46a
  Mon P Wang authored Mar 11, 2009
```
llvm-svn: 66684
```
  25c6a46a
- Fixed a v8i16 shuffle case that should generate a pshufb instead of a pshuflw/hw. · ce6a26cb
  Mon P Wang authored Mar 11, 2009
```
llvm-svn: 66645
```
  ce6a26cb
- formatting change, reduce indentation. No functionality change. · 248ad00a
  Chris Lattner authored Mar 11, 2009
```
llvm-svn: 66642
```
  248ad00a
Mar 10, 2009
- Add more information to the EFLAGS note. · b0d4009e
  Dan Gohman authored Mar 10, 2009
```
llvm-svn: 66515
```
  b0d4009e
- Add a note about EFLAGS optimization. · d5b35ee2
  Dan Gohman authored Mar 09, 2009
```
llvm-svn: 66508
```
  d5b35ee2
Mar 08, 2009
- do not export all the X86FastISel symbols, ever. · d5ac9d87
  Chris Lattner authored Mar 08, 2009
```
llvm-svn: 66382
```
  d5ac9d87
- add a note. · 393ac628
  Chris Lattner authored Mar 08, 2009
```
llvm-svn: 66360
```
  393ac628
- add a note. · cfd1f7aa
  Chris Lattner authored Mar 08, 2009
```
llvm-svn: 66359
```
  cfd1f7aa
Mar 07, 2009

Introduce new linkage types linkonce_odr, weak_odr, common_odr · 12da8ce3

Duncan Sands authored Mar 07, 2009

and extern_weak_odr.  These are the same as the non-odr versions,
except that they indicate that the global will only be overridden
by an *equivalent* global.  In C, a function with weak linkage can
be overridden by a function which behaves completely differently.
This means that IP passes have to skip weak functions, since any
deductions made from the function definition might be wrong, since
the definition could be replaced by something completely different
at link time.   This is not allowed in C++, thanks to the ODR
(One-Definition-Rule): if a function is replaced by another at
link-time, then the new function must be the same as the original
function.  If a language knows that a function or other global can
only be overridden by an equivalent global, it can give it the
weak_odr linkage type, and the optimizers will understand that it
is alright to make deductions based on the function body.  The
code generators on the other hand map weak and weak_odr linkage
to the same thing.

llvm-svn: 66339

12da8ce3

Arithmetic instructions don't set EFLAGS bits OF and CF bits · ff659b5b

Dan Gohman authored Mar 07, 2009

the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.

llvm-svn: 66318

ff659b5b

Mar 05, 2009

Don't use plain INC32 and DEC32 on x86-64; it needs · c719d73e

Dan Gohman authored Mar 05, 2009

INC64_32r and INC64_16r, because these instructions are encoded
differently on x86-64. This fixes JIT regressions on x86-64 in
kimwitu++ and others.

llvm-svn: 66207

c719d73e

When creating X86ISD::INC and X86ISD::DEC nodes, only add one operand. · e014b193
Dan Gohman authored Mar 05, 2009
```
The extra operand didn't appear to cause any trouble, but it was
erroneous regardless.

llvm-svn: 66206
```
e014b193
Fix the "test" optimization to recognize "dec" as an add of · 2c2f192c
Dan Gohman authored Mar 05, 2009
```
negative one, as subtracts of immediates are canonicalized
to adds.

llvm-svn: 66180
```
2c2f192c

Mar 04, 2009
- Re-apply 66008, now that the unfoldMemoryOperand bug is fixed. · 55d7b2ac
  Dan Gohman authored Mar 04, 2009
```
llvm-svn: 66058
```
  55d7b2ac
- Correct this comment. · f8920d0c
  Dan Gohman authored Mar 04, 2009
```
llvm-svn: 66057
```
  f8920d0c
- When using MachineInstr operand indices on SDNodes, the number · cc329b56
  Dan Gohman authored Mar 04, 2009
```
of MachineInstr def operands must be subtracted out. This bug
was uncovered by the recent x86 EFLAGS optimization. Before
that, the only instructions that ever needed unfolding were
things like CMP32rm, where NumDefs is zero.

llvm-svn: 66056
```
  cc329b56
- Fix PR3666: isel calls to constant addresses. · 9edd616b
  Evan Cheng authored Mar 04, 2009
```
llvm-svn: 66024
```
  9edd616b
- Revert r66004 for now; it's causing a variety of test failures. · 6728f892
  Dan Gohman authored Mar 04, 2009
```
llvm-svn: 66008
```
  6728f892
- Teach the x86 backend to eliminate "test" instructions by using the EFLAGS · fe8d71f4
  Dan Gohman authored Mar 04, 2009
```
result from add, sub, inc, and dec instructions in simple cases.

llvm-svn: 66004
```
  fe8d71f4
- Fix PR3701. 1. X86 target renamed eflags register to flags. This matches what... · b8905c4e
  Evan Cheng authored Mar 04, 2009
```
Fix PR3701. 1. X86 target renamed eflags register to flags. This matches what llvm-gcc generates so codegen knows flags register is being clobbered by inline asm. 2. BURR scheduler should also check if inline asm nodes can clobber "live" physical registers. Previously it was only checking target nodes with implicit defs.

llvm-svn: 65996
```
  b8905c4e
Mar 03, 2009
- Add '(implicit EFLAGS)' for AND, OR, XOR, NEG, INC, and DEC · 3a72265d
  Dan Gohman authored Mar 03, 2009
```
instructions. These aren't used yet.

llvm-svn: 65965
```
  3a72265d
- Fix a bunch of Doxygen syntax issues. Escape special characters, · 92b551bc
  Dan Gohman authored Mar 03, 2009
```
and put @file directives on their own comment line.

llvm-svn: 65920
```
  92b551bc
Feb 28, 2009
- Added another darwin subtarget · d844dc30
  Mon P Wang authored Feb 28, 2009
```
llvm-svn: 65662
```
  d844dc30
Feb 27, 2009

Refactor TLS code and add some tests. The tests and expected results are: · 000421ea

Rafael Espindola authored Feb 27, 2009

 pic |  declaration | linkage  | visibility |

!pic |  declaration | external | default    | tls1.ll     tls2.ll     | local exec
 pic |  declaration | external | default    | tls1-pic.ll tls2-pic.ll | general dynamic
!pic | !declaration | external | default    | tls3.ll     tls4.ll     | initial exec
 pic | !declaration | external | default    | tls3-pic.ll tls4-pic.ll | general dynamic

!pic |  declaration | external | hidden     | tls7.ll     tls8.ll     | local exec
 pic |  declaration | external | hidden     | X                       | local dynamic
!pic | !declaration | external | hidden     | tls9.ll     tls10.ll    | local exec
 pic | !declaration | external | hidden     | X                       | local dynamic

!pic |  declaration | internal | default    | tls5.ll     tls6.ll     | local exec
 pic |  declaration | internal | default    | X                       | local dynamic

The ones marked with an X have not been implemented since local dynamic is not implemented.

llvm-svn: 65632

000421ea

Feb 26, 2009

ADDS{D|S}rr_Int and MULS{D|S}rr_Int are not commutable. The users of these... · 40abb7b5

Evan Cheng authored Feb 26, 2009

ADDS{D|S}rr_Int and MULS{D|S}rr_Int are not commutable. The users of these intrinsics expect the high bits will not be modified.

llvm-svn: 65499

40abb7b5

Feb 25, 2009
- Revert BuildVectorSDNode related patches: 65426, 65427, and 65296. · a49de9de
  Evan Cheng authored Feb 25, 2009
```
llvm-svn: 65482
```
  a49de9de
Feb 24, 2009

Overhaul my earlier submission due to feedback. It's a large patch, but most of · c5437ea4

Bill Wendling authored Feb 24, 2009

them are generic changes.

- Use the "fast" flag that's already being passed into the asm printers instead
  of shoving it into the DwarfWriter.

- Instead of calling "MI->getParent()->getParent()" for every MI, set the
  machine function when calling "runOnMachineFunction" in the asm printers.

llvm-svn: 65379

c5437ea4

Feb 23, 2009

Fast-isel can't do TLS yet, so it should fall back to SDISel · 318d7376
Dan Gohman authored Feb 23, 2009
```
if it sees TLS addresses.

llvm-svn: 65341
```
318d7376
Only v1i16 (i.e. _m64) is returned via RAX / RDX. · 9f8fddee
Evan Cheng authored Feb 23, 2009
```
llvm-svn: 65313
```
9f8fddee

Generate better code for v8i16 shuffles on SSE2 · e684da3e

Nate Begeman authored Feb 23, 2009

Generate better code for v16i8 shuffles on SSE2 (avoids stack)
Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
Document the shuffle matching logic and add some FIXMEs for later further
  cleanups.
New tests that test the above.

Examples:

New:
_shuf2:
	pextrw	$7, %xmm0, %eax
	punpcklqdq	%xmm1, %xmm0
	pshuflw	$128, %xmm0, %xmm0
	pinsrw	$2, %eax, %xmm0

Old:
_shuf2:
	pextrw	$2, %xmm0, %eax
	pextrw	$7, %xmm0, %ecx
	pinsrw	$2, %ecx, %xmm0
	pinsrw	$3, %eax, %xmm0
	movd	%xmm1, %eax
	pinsrw	$4, %eax, %xmm0
	ret

=========

New:
_shuf4:
	punpcklqdq	%xmm1, %xmm0
	pshufb	LCPI1_0, %xmm0

Old:
_shuf4:
	pextrw	$3, %xmm0, %eax
	movsd	%xmm1, %xmm0
	pextrw	$3, %xmm1, %ecx
	pinsrw	$4, %ecx, %xmm0
	pinsrw	$5, %eax, %xmm0

========

New:
_shuf1:
	pushl	%ebx
	pushl	%edi
	pushl	%esi
	pextrw	$1, %xmm0, %eax
	rolw	$8, %ax
	movd	%xmm0, %ecx
	rolw	$8, %cx
	pextrw	$5, %xmm0, %edx
	pextrw	$4, %xmm0, %esi
	pextrw	$3, %xmm0, %edi
	pextrw	$2, %xmm0, %ebx
	movaps	%xmm0, %xmm1
	pinsrw	$0, %ecx, %xmm1
	pinsrw	$1, %eax, %xmm1
	rolw	$8, %bx
	pinsrw	$2, %ebx, %xmm1
	rolw	$8, %di
	pinsrw	$3, %edi, %xmm1
	rolw	$8, %si
	pinsrw	$4, %esi, %xmm1
	rolw	$8, %dx
	pinsrw	$5, %edx, %xmm1
	pextrw	$7, %xmm0, %eax
	rolw	$8, %ax
	movaps	%xmm1, %xmm0
	pinsrw	$7, %eax, %xmm0
	popl	%esi
	popl	%edi
	popl	%ebx
	ret

Old:
_shuf1:
	subl	$252, %esp
	movaps	%xmm0, (%esp)
	movaps	%xmm0, 16(%esp)
	movaps	%xmm0, 32(%esp)
	movaps	%xmm0, 48(%esp)
	movaps	%xmm0, 64(%esp)
	movaps	%xmm0, 80(%esp)
	movaps	%xmm0, 96(%esp)
	movaps	%xmm0, 224(%esp)
	movaps	%xmm0, 208(%esp)
	movaps	%xmm0, 192(%esp)
	movaps	%xmm0, 176(%esp)
	movaps	%xmm0, 160(%esp)
	movaps	%xmm0, 144(%esp)
	movaps	%xmm0, 128(%esp)
	movaps	%xmm0, 112(%esp)
	movzbl	14(%esp), %eax
	movd	%eax, %xmm1
	movzbl	22(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	42(%esp), %eax
	movd	%eax, %xmm1
	movzbl	50(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm1, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	77(%esp), %eax
	movd	%eax, %xmm1
	movzbl	84(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	104(%esp), %eax
	movd	%eax, %xmm1
	punpcklbw	%xmm1, %xmm0
	punpcklbw	%xmm2, %xmm0
	movaps	%xmm0, %xmm1
	punpcklbw	%xmm3, %xmm1
	movzbl	127(%esp), %eax
	movd	%eax, %xmm0
	movzbl	135(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	155(%esp), %eax
	movd	%eax, %xmm0
	movzbl	163(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm0, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	188(%esp), %eax
	movd	%eax, %xmm0
	movzbl	197(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	217(%esp), %eax
	movd	%eax, %xmm4
	movzbl	225(%esp), %eax
	movd	%eax, %xmm0
	punpcklbw	%xmm4, %xmm0
	punpcklbw	%xmm2, %xmm0
	punpcklbw	%xmm3, %xmm0
	punpcklbw	%xmm1, %xmm0
	addl	$252, %esp
	ret

llvm-svn: 65311

e684da3e

Introduce the BuildVectorSDNode class that encapsulates the ISD::BUILD_VECTOR · 9d31aca6

Scott Michel authored Feb 22, 2009

instruction. The class also consolidates the code for detecting constant
splats that's shared across PowerPC and the CellSPU backends (and might be
useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
generating new BUILD_VECTOR nodes.

llvm-svn: 65296

9d31aca6

Feb 22, 2009
- Add a note. · 2448aa1d
  Evan Cheng authored Feb 22, 2009
```
llvm-svn: 65275
```
  2448aa1d
- Be bug compatible with gcc by returning MMX values in RAX. · e4ffc030
  Evan Cheng authored Feb 22, 2009
```
llvm-svn: 65274
```
  e4ffc030