Commits · 3b2df10c9ed9b92c0c2ae4c84b5e649f52002176 · Roger Ferrer / llvm-epi-0.8

Apr 08, 2009

Rafael Espindola authored Apr 08, 2009

Tested by bootstrapping llvm-gcc and using that to build llvm.

llvm-svn: 68645

3b2df10c

Implement support for using modeling implicit-zero-extension on x86-64 · ad3e549a

Dan Gohman authored Apr 08, 2009

with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.

This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.

Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.

Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.

Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.

Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.

llvm-svn: 68576

ad3e549a

Temporarily revert r68552. This was causing a failure in the self-hosting LLVM · 4aa25b79

Bill Wendling authored Apr 07, 2009

builds.

--- Reverse-merging (from foreign repository) r68552 into '.':
U    test/CodeGen/X86/tls8.ll
U    test/CodeGen/X86/tls10.ll
U    test/CodeGen/X86/tls2.ll
U    test/CodeGen/X86/tls6.ll
U    lib/Target/X86/X86Instr64bit.td
U    lib/Target/X86/X86InstrSSE.td
U    lib/Target/X86/X86InstrInfo.td
U    lib/Target/X86/X86RegisterInfo.cpp
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86CodeEmitter.cpp
U    lib/Target/X86/X86FastISel.cpp
U    lib/Target/X86/X86InstrInfo.h
U    lib/Target/X86/X86ISelDAGToDAG.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U    lib/Target/X86/X86ISelLowering.h
U    lib/Target/X86/X86InstrInfo.cpp
U    lib/Target/X86/X86InstrBuilder.h
U    lib/Target/X86/X86RegisterInfo.td

llvm-svn: 68560

4aa25b79

Apr 07, 2009

Reduce code duplication on the TLS implementation. · 1edda067

Rafael Espindola authored Apr 07, 2009

This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.

Will work on it and on X86-64 support.

llvm-svn: 68552

1edda067

Mar 30, 2009

When optimzing a mul by immediate into two, the resulting mul's should get a... · a84a3188

Evan Cheng authored Mar 30, 2009

When optimzing a mul by immediate into two, the resulting mul's should get a x86 specific node to avoid dag combiner from hacking on them further.

llvm-svn: 68066

a84a3188

Mar 26, 2009
- Doxygen-ify comments. · 189d6718
  Bill Wendling authored Mar 26, 2009
```
llvm-svn: 67727
```
  189d6718
Mar 23, 2009
- Correct some comments. Operand numbers start at 0. · 4a683478
  Dan Gohman authored Mar 23, 2009
```
llvm-svn: 67518
```
  4a683478
Mar 12, 2009
- improve comment. · a492d29c
  Chris Lattner authored Mar 12, 2009
```
llvm-svn: 66778
```
  a492d29c
Mar 07, 2009

Arithmetic instructions don't set EFLAGS bits OF and CF bits · ff659b5b

Dan Gohman authored Mar 07, 2009

the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.

llvm-svn: 66318

ff659b5b

Mar 04, 2009
- Re-apply 66008, now that the unfoldMemoryOperand bug is fixed. · 55d7b2ac
  Dan Gohman authored Mar 04, 2009
```
llvm-svn: 66058
```
  55d7b2ac
- Revert r66004 for now; it's causing a variety of test failures. · 6728f892
  Dan Gohman authored Mar 04, 2009
```
llvm-svn: 66008
```
  6728f892
- Teach the x86 backend to eliminate "test" instructions by using the EFLAGS · fe8d71f4
  Dan Gohman authored Mar 04, 2009
```
result from add, sub, inc, and dec instructions in simple cases.

llvm-svn: 66004
```
  fe8d71f4
Feb 23, 2009

Generate better code for v8i16 shuffles on SSE2 · e684da3e

Nate Begeman authored Feb 23, 2009

Generate better code for v16i8 shuffles on SSE2 (avoids stack)
Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
Document the shuffle matching logic and add some FIXMEs for later further
  cleanups.
New tests that test the above.

Examples:

New:
_shuf2:
	pextrw	$7, %xmm0, %eax
	punpcklqdq	%xmm1, %xmm0
	pshuflw	$128, %xmm0, %xmm0
	pinsrw	$2, %eax, %xmm0

Old:
_shuf2:
	pextrw	$2, %xmm0, %eax
	pextrw	$7, %xmm0, %ecx
	pinsrw	$2, %ecx, %xmm0
	pinsrw	$3, %eax, %xmm0
	movd	%xmm1, %eax
	pinsrw	$4, %eax, %xmm0
	ret

=========

New:
_shuf4:
	punpcklqdq	%xmm1, %xmm0
	pshufb	LCPI1_0, %xmm0

Old:
_shuf4:
	pextrw	$3, %xmm0, %eax
	movsd	%xmm1, %xmm0
	pextrw	$3, %xmm1, %ecx
	pinsrw	$4, %ecx, %xmm0
	pinsrw	$5, %eax, %xmm0

========

New:
_shuf1:
	pushl	%ebx
	pushl	%edi
	pushl	%esi
	pextrw	$1, %xmm0, %eax
	rolw	$8, %ax
	movd	%xmm0, %ecx
	rolw	$8, %cx
	pextrw	$5, %xmm0, %edx
	pextrw	$4, %xmm0, %esi
	pextrw	$3, %xmm0, %edi
	pextrw	$2, %xmm0, %ebx
	movaps	%xmm0, %xmm1
	pinsrw	$0, %ecx, %xmm1
	pinsrw	$1, %eax, %xmm1
	rolw	$8, %bx
	pinsrw	$2, %ebx, %xmm1
	rolw	$8, %di
	pinsrw	$3, %edi, %xmm1
	rolw	$8, %si
	pinsrw	$4, %esi, %xmm1
	rolw	$8, %dx
	pinsrw	$5, %edx, %xmm1
	pextrw	$7, %xmm0, %eax
	rolw	$8, %ax
	movaps	%xmm1, %xmm0
	pinsrw	$7, %eax, %xmm0
	popl	%esi
	popl	%edi
	popl	%ebx
	ret

Old:
_shuf1:
	subl	$252, %esp
	movaps	%xmm0, (%esp)
	movaps	%xmm0, 16(%esp)
	movaps	%xmm0, 32(%esp)
	movaps	%xmm0, 48(%esp)
	movaps	%xmm0, 64(%esp)
	movaps	%xmm0, 80(%esp)
	movaps	%xmm0, 96(%esp)
	movaps	%xmm0, 224(%esp)
	movaps	%xmm0, 208(%esp)
	movaps	%xmm0, 192(%esp)
	movaps	%xmm0, 176(%esp)
	movaps	%xmm0, 160(%esp)
	movaps	%xmm0, 144(%esp)
	movaps	%xmm0, 128(%esp)
	movaps	%xmm0, 112(%esp)
	movzbl	14(%esp), %eax
	movd	%eax, %xmm1
	movzbl	22(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	42(%esp), %eax
	movd	%eax, %xmm1
	movzbl	50(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm1, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	77(%esp), %eax
	movd	%eax, %xmm1
	movzbl	84(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	104(%esp), %eax
	movd	%eax, %xmm1
	punpcklbw	%xmm1, %xmm0
	punpcklbw	%xmm2, %xmm0
	movaps	%xmm0, %xmm1
	punpcklbw	%xmm3, %xmm1
	movzbl	127(%esp), %eax
	movd	%eax, %xmm0
	movzbl	135(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	155(%esp), %eax
	movd	%eax, %xmm0
	movzbl	163(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm0, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	188(%esp), %eax
	movd	%eax, %xmm0
	movzbl	197(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	217(%esp), %eax
	movd	%eax, %xmm4
	movzbl	225(%esp), %eax
	movd	%eax, %xmm0
	punpcklbw	%xmm4, %xmm0
	punpcklbw	%xmm2, %xmm0
	punpcklbw	%xmm3, %xmm0
	punpcklbw	%xmm1, %xmm0
	addl	$252, %esp
	ret

llvm-svn: 65311

e684da3e

Feb 07, 2009
- Constify TargetInstrInfo::EmitInstrWithCustomInserter, allowing · 747e55bc
  Dan Gohman authored Feb 07, 2009
```
ScheduleDAG's TLI member to use const.

llvm-svn: 64018
```
  747e55bc
Feb 04, 2009
- Remove non-DebugLoc versions of getLoad and getStore. · 021052a7
  Dale Johannesen authored Feb 04, 2009
```
Adjust the many callers of those versions.

llvm-svn: 63767
```
  021052a7
Feb 03, 2009
- Need this file too. · 0404dc11
  Dale Johannesen authored Feb 03, 2009
```
llvm-svn: 63674
```
  0404dc11
- DebugLoc propagation. 2/3 through file. · 66e03e6f
  Dale Johannesen authored Feb 03, 2009
```
llvm-svn: 63650
```
  66e03e6f
Jan 24, 2009
- Fix an indent and a typo. · b09b0242
  Nate Begeman authored Jan 24, 2009
```
llvm-svn: 62940
```
  b09b0242
Jan 17, 2009

Implement a special algorithm for converting uint_to_fp for i32 values on · 4d527590

Bill Wendling authored Jan 17, 2009

X86. This code:

void f() {
  uint32_t x;
  float y = (float)x;
}

used to be:

     movl     %eax, -8(%ebp)
     movl     [2^52 double], -4(%ebp)
     movsd    -8(%ebp), %xmm0
     subsd    [2^52 double], %xmm0
     cvtsd2ss %xmm0, %xmm0

Is now:

   movsd        [2^52 double], %xmm0
   movsd        %xmm0, %xmm1
   movd         %ecx, %xmm2
   orps         %xmm2, %xmm1
   subsd        %xmm0, %xmm1
   cvtsd2ss     %xmm1, %xmm0

This is faster on X86. Note that there's an extra load of %xmm0 into %xmm1. That
will be fixed in a later coalescer fix.

llvm-svn: 62404

4d527590

Jan 15, 2009
- Make getWidenVectorType const. · 0ad43ca6
  Dan Gohman authored Jan 15, 2009
```
llvm-svn: 62265
```
  0ad43ca6
Jan 13, 2009

· 5c6e1e3b

Devang Patel authored Jan 13, 2009

Use DebugInfo interface to lower dbg_* intrinsics.

llvm-svn: 62127

5c6e1e3b

Jan 01, 2009

Fix PR3274: when promoting the condition of a BRCOND node, · 8feb694e

Duncan Sands authored Jan 01, 2009

promote from i1 all the way up to the canonical SetCC type.
In order to discover an appropriate type to use, pass
MVT::Other to getSetCCResultType.  In order to be able to
do this, change getSetCCResultType to take a type as an
argument, not a value (this is also more logical).

llvm-svn: 61542

8feb694e

Dec 23, 2008
- Add instruction patterns and encodings for the x86 bt instructions. · 25a767d7
  Dan Gohman authored Dec 23, 2008
```
llvm-svn: 61400
```
  25a767d7
Dec 18, 2008
- Fixed x86 code generation of multiple for v2i64. It was incorrect for SSE4.1. · 998fd29c
  Mon P Wang authored Dec 18, 2008
```
llvm-svn: 61211
```
  998fd29c
Dec 12, 2008

- Use patterns instead of creating completely new instruction matching patterns, · c4499feb

Bill Wendling authored Dec 12, 2008

  which are identical to the original patterns.

- Change the multiply with overflow so that we distinguish between signed and
  unsigned multiplication. Currently, unsigned multiplication with overflow
  isn't working!

llvm-svn: 60963

c4499feb

Redo the arithmetic with overflow architecture. I was changing the semantics of · 1a317678

Bill Wendling authored Dec 12, 2008

ISD::ADD to emit an implicit EFLAGS. This was horribly broken. Instead, replace
the intrinsic with an ISD::SADDO node. Then custom lower that into an
X86ISD::ADD node with a associated SETCC that checks the correct condition code
(overflow or carry). Then that gets lowered into the correct X86::ADDOvf
instruction.

Similar for SUB and MUL instructions.

llvm-svn: 60915

1a317678

Dec 09, 2008

Add sub/mul overflow intrinsics. This currently doesn't have a · db8ec2d7

Bill Wendling authored Dec 09, 2008

target-independent way of determining overflow on multiplication. It's very
tricky. Patch by Zoltan Varga!

llvm-svn: 60800

db8ec2d7

Dec 02, 2008

Second stab at target-dependent lowering of everyone's favorite nodes: [SU]ADDO · 30e9dc81

Bill Wendling authored Dec 02, 2008

- LowerXADDO lowers [SU]ADDO into an ADD with an implicit EFLAGS define. The
  EFLAGS are fed into a SETCC node which has the conditional COND_O or COND_C,
  depending on the type of ADDO requested.

- LowerBRCOND now recognizes if it's coming from a SETCC node with COND_O or
  COND_C set.

llvm-svn: 60388

30e9dc81

Dec 01, 2008

Change the interface to the type legalization method · 6ed40141

Duncan Sands authored Dec 01, 2008

ReplaceNodeResults: rather than returning a node which
must have the same number of results as the original
node (which means mucking around with MERGE_VALUES,
and which is also easy to get wrong since SelectionDAG
folding may mean you don't get the node you expect),
return the results in a vector.

llvm-svn: 60348

6ed40141

Nov 24, 2008

- Make lowering of "add with overflow" customizable by back-ends. · 66835479

Bill Wendling authored Nov 24, 2008

- Mark "add with overflow" as having a custom lowering for X86. Give it a null
  lowering representation for now.

llvm-svn: 59971

66835479

Oct 30, 2008
- Add initial support for vector widening. Logic is set to widen for X86. · 58c3794c
  Mon P Wang authored Oct 30, 2008
```
One will only see an effect if legalizetype is not active.  Will move
support to LegalizeType soon.

llvm-svn: 58426
```
  58c3794c
Oct 21, 2008

Add an SSE2 algorithm for uint64->f64 conversion. · 28929589

Dale Johannesen authored Oct 21, 2008

The same one Apple gcc uses, faster.  Also gets the
extreme case in gcc.c-torture/execute/ieee/rbug.c
correct which we weren't before; this is not
sufficient to get the test to pass though, there
is another bug.

llvm-svn: 57926

28929589

Oct 18, 2008

Teach DAGCombine to fold constant offsets into GlobalAddress nodes, · 2fe6bee5

Dan Gohman authored Oct 18, 2008

and add a TargetLowering hook for it to use to determine when this
is legal (i.e. not in PIC mode, etc.)

This allows instruction selection to emit folded constant offsets
in more cases, such as the included testcase, eliminating the need
for explicit arithmetic instructions.

This eliminates the need for the C++ code in X86ISelDAGToDAG.cpp
that attempted to achieve the same effect, but wasn't as effective.

Also, fix handling of offsets in GlobalAddressSDNodes in several
places, including changing GlobalAddressSDNode's offset from
int to int64_t.

The Mips, Alpha, Sparc, and CellSPU targets appear to be
unaware of GlobalAddress offsets currently, so set the hook to
false on those targets.

llvm-svn: 57748

2fe6bee5

Oct 15, 2008

FastISel support for exception-handling constructs. · e7ced745

Dan Gohman authored Oct 14, 2008

 - Move the EH landing-pad code and adjust it so that it works
   with FastISel as well as with SDISel.
 - Add FastISel support for @llvm.eh.exception and
   @llvm.eh.selector.

llvm-svn: 57539

e7ced745

Oct 04, 2008
- Make atomic Swap work, 64-bit on x86-32. · 8c36a1c0
  Dale Johannesen authored Oct 03, 2008
```
Make it all work in non-pic mode.

llvm-svn: 57034
```
  8c36a1c0
Oct 02, 2008
- Handle some 64-bit atomics on x86-32, some of the time. · 867d549f
  Dale Johannesen authored Oct 02, 2008
```
llvm-svn: 56963
```
  867d549f
Oct 01, 2008
- Implement the -fno-builtin option in the front-end, not in the back-end. · 68f12ee5
  Bill Wendling authored Oct 01, 2008
```
llvm-svn: 56900
```
  68f12ee5
Sep 30, 2008

Add the new `-no-builtin' flag. This flag is meant to mimic the GCC · bd09262e

Bill Wendling authored Sep 30, 2008

`-fno-builtin' flag. Currently, it's used to replace "memset" with "_bzero"
instead of "__bzero" on Darwin10+. This arguably violates the meaning of this
flag, but is currently sufficient. The meaning of this flag should become more
specific over time.

llvm-svn: 56885

bd09262e

Remove misuse of ReplaceNodeResults for atomics with · f61a84ec
Dale Johannesen authored Sep 29, 2008
```
valid types.  No functional change.

llvm-svn: 56808
```
f61a84ec

Sep 25, 2008

With sse3 and when the source is a load or has multiple uses, favors movddup... · 74c9ed91

Evan Cheng authored Sep 25, 2008

With sse3 and when the source is a load or has multiple uses, favors movddup over shuffp*, pshufd, etc. Without sse3 or when the source is from a register, make use of movlhps

llvm-svn: 56620

74c9ed91