Commits · 99c16729d362f6b3d148c8924e12a10ab82e79e3 · Roger Ferrer / llvm-epi-0.8

Mar 26, 2009

Pull transform from target-dependent code into target-independent code. · aa28be65
Bill Wendling authored Mar 26, 2009
```
llvm-svn: 67742
```
aa28be65

Match this pattern so that we can generate simpler code: · 94f299f2

Bill Wendling authored Mar 26, 2009

  %a = ...
  %b = and i32 %a, 2
  %c = srl i32 %b, 1
  %d = br i32 %c, 

into

  %a = ...
  %b = and %a, 2
  %c = X86ISD::CMP %b, 0
  %d = X86ISD::BRCOND %c ...

This applies only when the AND constant value has one bit set and the SRL
constant is equal to the log2 of the AND constant. The back-end is smart enough
to convert the result into a TEST/JMP sequence.

llvm-svn: 67728

94f299f2

Mar 13, 2009

These instructions have special lowering that may lower them to SSE · 798fd56d
Bill Wendling authored Mar 13, 2009
```
instructions. Prevent that if we don't want implicit uses of SSE.

llvm-svn: 66877
```
798fd56d

Fix some significant problems with constant pools that resulted in unnecessary... · 1fb8aedd

Evan Cheng authored Mar 13, 2009

Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues.

1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.

Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.

llvm-svn: 66875

1fb8aedd

generalize the previous code to use the full generality of LEA · 99cc1337

Chris Lattner authored Mar 13, 2009

for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this).  On the example, we now
generate:

_test:
	movl	4(%esp), %eax
	cmpl	$42, (%eax)
	setl	%al
	movzbl	%al, %eax
	leal	4(%eax,%eax,8), %eax
	ret

instead of:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	movl	$4, %ecx
	movl	$13, %eax
	cmovg	%ecx, %eax
	ret

llvm-svn: 66869

99cc1337

optimize the case of cond ? 42 : 41 and friends. This compiles the · 4be6df5d

Chris Lattner authored Mar 13, 2009

example to:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	setg	%al
	movzbl	%al, %eax
	orl	$4294967294, %eax
	ret

instead of:

        movl    4(%esp), %eax
        cmpl    $41, (%eax)
	movl	$4294967294, %ecx
	movl	$4294967295, %eax
	cmova	%ecx, %eax
	ret

which is smaller in code size and faster. rdar://6668608

llvm-svn: 66868

4be6df5d

Mar 12, 2009

Move 3 "(add (select cc, 0, c), x) -> (select cc, x, (add, x, c))" · 4147f08e

Chris Lattner authored Mar 12, 2009

related transformations out of target-specific dag combine into the
ARM backend.  These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).

Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0  -> (zext(cond) << 3).  This happens frequently
with the recently added cp constant select optimization, but is a
very general xform.  For example, we now compile the second example
in const-select.ll to:

_test:
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        seta    %al
        movzbl  %al, %eax
        movl    4(%esp), %ecx
        movsbl  (%ecx,%eax,4), %eax
        ret

instead of:

_test:
        movl    4(%esp), %eax
        leal    4(%eax), %ecx
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        cmovbe  %eax, %ecx
        movsbl  (%ecx), %eax
        ret

This passes multisource and dejagnu.

llvm-svn: 66779

4147f08e

On x86, if the only use of a i64 load is a i64 store, generate a pair of... · ef0b7cc2
Evan Cheng authored Mar 12, 2009
```
On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead.

llvm-svn: 66776
```
ef0b7cc2

Mar 11, 2009
- Add a -no-implicit-float flag. This acts like -soft-float, but may generate · 42adc73a
  Bill Wendling authored Mar 11, 2009
```
floating point instructions that are explicitly specified by the user.

llvm-svn: 66719
```
  42adc73a
- For yonah, fix a vector shuffle case for v16i8 where we didn't properly clear some bits. · 25c6a46a
  Mon P Wang authored Mar 11, 2009
```
llvm-svn: 66684
```
  25c6a46a
- Fixed a v8i16 shuffle case that should generate a pshufb instead of a pshuflw/hw. · ce6a26cb
  Mon P Wang authored Mar 11, 2009
```
llvm-svn: 66645
```
  ce6a26cb
- formatting change, reduce indentation. No functionality change. · 248ad00a
  Chris Lattner authored Mar 11, 2009
```
llvm-svn: 66642
```
  248ad00a
Mar 07, 2009

Arithmetic instructions don't set EFLAGS bits OF and CF bits · ff659b5b

Dan Gohman authored Mar 07, 2009

the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.

llvm-svn: 66318

ff659b5b

Mar 05, 2009
- When creating X86ISD::INC and X86ISD::DEC nodes, only add one operand. · e014b193
  Dan Gohman authored Mar 05, 2009
```
The extra operand didn't appear to cause any trouble, but it was
erroneous regardless.

llvm-svn: 66206
```
  e014b193
- Fix the "test" optimization to recognize "dec" as an add of · 2c2f192c
  Dan Gohman authored Mar 05, 2009
```
negative one, as subtracts of immediates are canonicalized
to adds.

llvm-svn: 66180
```
  2c2f192c
Mar 04, 2009
- Re-apply 66008, now that the unfoldMemoryOperand bug is fixed. · 55d7b2ac
  Dan Gohman authored Mar 04, 2009
```
llvm-svn: 66058
```
  55d7b2ac
- Revert r66004 for now; it's causing a variety of test failures. · 6728f892
  Dan Gohman authored Mar 04, 2009
```
llvm-svn: 66008
```
  6728f892
- Teach the x86 backend to eliminate "test" instructions by using the EFLAGS · fe8d71f4
  Dan Gohman authored Mar 04, 2009
```
result from add, sub, inc, and dec instructions in simple cases.

llvm-svn: 66004
```
  fe8d71f4
Feb 27, 2009

Refactor TLS code and add some tests. The tests and expected results are: · 000421ea

Rafael Espindola authored Feb 27, 2009

 pic |  declaration | linkage  | visibility |

!pic |  declaration | external | default    | tls1.ll     tls2.ll     | local exec
 pic |  declaration | external | default    | tls1-pic.ll tls2-pic.ll | general dynamic
!pic | !declaration | external | default    | tls3.ll     tls4.ll     | initial exec
 pic | !declaration | external | default    | tls3-pic.ll tls4-pic.ll | general dynamic

!pic |  declaration | external | hidden     | tls7.ll     tls8.ll     | local exec
 pic |  declaration | external | hidden     | X                       | local dynamic
!pic | !declaration | external | hidden     | tls9.ll     tls10.ll    | local exec
 pic | !declaration | external | hidden     | X                       | local dynamic

!pic |  declaration | internal | default    | tls5.ll     tls6.ll     | local exec
 pic |  declaration | internal | default    | X                       | local dynamic

The ones marked with an X have not been implemented since local dynamic is not implemented.

llvm-svn: 65632

000421ea

Feb 25, 2009
- Revert BuildVectorSDNode related patches: 65426, 65427, and 65296. · a49de9de
  Evan Cheng authored Feb 25, 2009
```
llvm-svn: 65482
```
  a49de9de
Feb 23, 2009

Only v1i16 (i.e. _m64) is returned via RAX / RDX. · 9f8fddee
Evan Cheng authored Feb 23, 2009
```
llvm-svn: 65313
```
9f8fddee

Generate better code for v8i16 shuffles on SSE2 · e684da3e

Nate Begeman authored Feb 23, 2009

Generate better code for v16i8 shuffles on SSE2 (avoids stack)
Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
Document the shuffle matching logic and add some FIXMEs for later further
  cleanups.
New tests that test the above.

Examples:

New:
_shuf2:
	pextrw	$7, %xmm0, %eax
	punpcklqdq	%xmm1, %xmm0
	pshuflw	$128, %xmm0, %xmm0
	pinsrw	$2, %eax, %xmm0

Old:
_shuf2:
	pextrw	$2, %xmm0, %eax
	pextrw	$7, %xmm0, %ecx
	pinsrw	$2, %ecx, %xmm0
	pinsrw	$3, %eax, %xmm0
	movd	%xmm1, %eax
	pinsrw	$4, %eax, %xmm0
	ret

=========

New:
_shuf4:
	punpcklqdq	%xmm1, %xmm0
	pshufb	LCPI1_0, %xmm0

Old:
_shuf4:
	pextrw	$3, %xmm0, %eax
	movsd	%xmm1, %xmm0
	pextrw	$3, %xmm1, %ecx
	pinsrw	$4, %ecx, %xmm0
	pinsrw	$5, %eax, %xmm0

========

New:
_shuf1:
	pushl	%ebx
	pushl	%edi
	pushl	%esi
	pextrw	$1, %xmm0, %eax
	rolw	$8, %ax
	movd	%xmm0, %ecx
	rolw	$8, %cx
	pextrw	$5, %xmm0, %edx
	pextrw	$4, %xmm0, %esi
	pextrw	$3, %xmm0, %edi
	pextrw	$2, %xmm0, %ebx
	movaps	%xmm0, %xmm1
	pinsrw	$0, %ecx, %xmm1
	pinsrw	$1, %eax, %xmm1
	rolw	$8, %bx
	pinsrw	$2, %ebx, %xmm1
	rolw	$8, %di
	pinsrw	$3, %edi, %xmm1
	rolw	$8, %si
	pinsrw	$4, %esi, %xmm1
	rolw	$8, %dx
	pinsrw	$5, %edx, %xmm1
	pextrw	$7, %xmm0, %eax
	rolw	$8, %ax
	movaps	%xmm1, %xmm0
	pinsrw	$7, %eax, %xmm0
	popl	%esi
	popl	%edi
	popl	%ebx
	ret

Old:
_shuf1:
	subl	$252, %esp
	movaps	%xmm0, (%esp)
	movaps	%xmm0, 16(%esp)
	movaps	%xmm0, 32(%esp)
	movaps	%xmm0, 48(%esp)
	movaps	%xmm0, 64(%esp)
	movaps	%xmm0, 80(%esp)
	movaps	%xmm0, 96(%esp)
	movaps	%xmm0, 224(%esp)
	movaps	%xmm0, 208(%esp)
	movaps	%xmm0, 192(%esp)
	movaps	%xmm0, 176(%esp)
	movaps	%xmm0, 160(%esp)
	movaps	%xmm0, 144(%esp)
	movaps	%xmm0, 128(%esp)
	movaps	%xmm0, 112(%esp)
	movzbl	14(%esp), %eax
	movd	%eax, %xmm1
	movzbl	22(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	42(%esp), %eax
	movd	%eax, %xmm1
	movzbl	50(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm1, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	77(%esp), %eax
	movd	%eax, %xmm1
	movzbl	84(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	104(%esp), %eax
	movd	%eax, %xmm1
	punpcklbw	%xmm1, %xmm0
	punpcklbw	%xmm2, %xmm0
	movaps	%xmm0, %xmm1
	punpcklbw	%xmm3, %xmm1
	movzbl	127(%esp), %eax
	movd	%eax, %xmm0
	movzbl	135(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	155(%esp), %eax
	movd	%eax, %xmm0
	movzbl	163(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm0, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	188(%esp), %eax
	movd	%eax, %xmm0
	movzbl	197(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	217(%esp), %eax
	movd	%eax, %xmm4
	movzbl	225(%esp), %eax
	movd	%eax, %xmm0
	punpcklbw	%xmm4, %xmm0
	punpcklbw	%xmm2, %xmm0
	punpcklbw	%xmm3, %xmm0
	punpcklbw	%xmm1, %xmm0
	addl	$252, %esp
	ret

llvm-svn: 65311

e684da3e

Introduce the BuildVectorSDNode class that encapsulates the ISD::BUILD_VECTOR · 9d31aca6

Scott Michel authored Feb 22, 2009

instruction. The class also consolidates the code for detecting constant
splats that's shared across PowerPC and the CellSPU backends (and might be
useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
generating new BUILD_VECTOR nodes.

llvm-svn: 65296

9d31aca6

Feb 22, 2009
- Be bug compatible with gcc by returning MMX values in RAX. · e4ffc030
  Evan Cheng authored Feb 22, 2009
```
llvm-svn: 65274
```
  e4ffc030
Feb 20, 2009
- Support return of MMX values in 64-bit mode. · 2a9bad5a
  Evan Cheng authored Feb 20, 2009
```
llvm-svn: 65152
```
  2a9bad5a
Feb 17, 2009

Remove trailing whitespace to reduce later commit patch noise. · cf0da6c5

Scott Michel authored Feb 17, 2009

(Note: Eventually, commits like this will be handled via a pre-commit hook that
 does this automagically, as well as expand tabs to spaces and look for 80-col
 violations.)

llvm-svn: 64827

cf0da6c5

Feb 13, 2009
- Teach x86 target -soft-float. · c2fde917
  Evan Cheng authored Feb 13, 2009
```
llvm-svn: 64496
```
  c2fde917
Feb 12, 2009

Arrange to print constants that match "n" and "i" constraints · 65577529

Dale Johannesen authored Feb 12, 2009

in inline asm as signed (what gcc does).  Add partial support
for x86-specific "e" and "Z" constraints, with appropriate
signedness for printing.

llvm-svn: 64400

65577529

Feb 07, 2009

Use getDebugLoc forwarder instead of getNode()->getDebugLoc. · 9c310711
Dale Johannesen authored Feb 07, 2009
```
No functional change.

llvm-svn: 64026
```
9c310711
Constify TargetInstrInfo::EmitInstrWithCustomInserter, allowing · 747e55bc
Dan Gohman authored Feb 07, 2009
```
ScheduleDAG's TLI member to use const.

llvm-svn: 64018
```
747e55bc

Get rid of the last non-DebugLoc versions of getNode! · 62fd95d6

Dale Johannesen authored Feb 07, 2009

Many targets build placeholder nodes for special operands, e.g.
GlobalBaseReg on X86 and PPC for the PIC base.  There's no
sensible way to associate debug info with these.  I've left
them built with getNode calls with explicit DebugLoc::getUnknownLoc operands. 
I'm not too happy about this but don't see a good improvement;
I considered adding a getPseudoOperand or something, but it
seems to me that'll just make it harder to read.

llvm-svn: 63992

62fd95d6

Remove more non-DebugLoc getNode variants. Use · 84935759

Dale Johannesen authored Feb 06, 2009

getCALLSEQ_{END,START} to permit passing no DebugLoc
there.  UNDEF doesn't logically have DebugLoc; add
getUNDEF to encapsulate this.

llvm-svn: 63978

84935759

Feb 06, 2009
- Remove more non-DebugLoc versions of getNode. · 400dc2e2
  Dale Johannesen authored Feb 06, 2009
```
llvm-svn: 63969
```
  400dc2e2
- Get rid of one more non-DebugLoc getNode and · 9f3f72f1
  Dale Johannesen authored Feb 06, 2009
```
its corresponding getTargetNode.  Lots of
caller changes.

llvm-svn: 63904
```
  9f3f72f1
Feb 04, 2009
- Remove non-DebugLoc versions of getLoad and getStore. · 021052a7
  Dale Johannesen authored Feb 04, 2009
```
Adjust the many callers of those versions.

llvm-svn: 63767
```
  021052a7
- Minor code cleanups; no functionality change. · 556d14d4
  Dan Gohman authored Feb 04, 2009
```
llvm-svn: 63740
```
  556d14d4
- Fixes a case where we generate an incorrect mask for pshfhw in the presence · 4379a795
  Mon P Wang authored Feb 04, 2009
```
of undefs and incorrectly determining if we have punpckldq.

llvm-svn: 63702
```
  4379a795
- Patch up omissions in DebugLoc propagation. · bbf13f54
  Dale Johannesen authored Feb 04, 2009
```
llvm-svn: 63693
```
  bbf13f54
Feb 03, 2009
- Add some DL propagation to places that didn't · abf66b83
  Dale Johannesen authored Feb 03, 2009
```
have it yet.  More coming.

llvm-svn: 63673
```
  abf66b83
- DebugLoc propagation. done with file. · 1eb1ef2c
  Dale Johannesen authored Feb 03, 2009
```
llvm-svn: 63656
```
  1eb1ef2c