Commits · a9cda8abf28bc8f684348cc312ffbc3cd8354906 · Roger Ferrer / llvm-epi-0.8

May 28, 2009

Added optimization that narrow load / op / store and the 'op' is a bit... · a9cda8ab

Evan Cheng authored May 28, 2009

Added optimization that narrow load / op / store and the 'op' is a bit twiddling instruction and its second operand is an immediate. If bits that are touched by 'op' can be done with a narrower instruction, reduce the width of the load and store as well. This happens a lot with bitfield manipulation code.
e.g.
orl     $65536, 8(%rax)
=>
orb     $1, 10(%rax)

Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.

llvm-svn: 72507

a9cda8ab

May 27, 2009
- Ger rid of some dead code. · a56159b7
  Eli Friedman authored May 27, 2009
```
llvm-svn: 72494
```
  a56159b7
- Don't abuse the quirky behavior of LegalizeDAG for XINT_TO_FP and · acb851a8
  Eli Friedman authored May 27, 2009
```
FP_TO_XINT.  Necessary for some cleanups I'm working on.  Updated 
from the previous version (r72431) to fix a bug and make some things a 
bit clearer.

llvm-svn: 72445
```
  acb851a8
May 26, 2009
- Back out r72431, it is causing a number of compilation crashes with clang. · d96b1178
  Daniel Dunbar authored May 26, 2009
```
llvm-svn: 72436
```
  d96b1178
- Don't abuse the quirky behavior of LegalizeDAG for XINT_TO_FP and · 8c7bff96
  Eli Friedman authored May 26, 2009
```
FP_TO_XINT.  Necessary for some cleanups I'm working on. 

llvm-svn: 72431
```
  8c7bff96
May 24, 2009
- Make the X86 backend mark EXTRACT_SUBVECTOR as Expand, at least for the · 2199ed39
  Eli Friedman authored May 23, 2009
```
moment.

llvm-svn: 72350
```
  2199ed39
May 23, 2009

Make the x86 backend custom-lower UINT_TO_FP and FP_TO_UINT on 32-bit · dfe4f253

Eli Friedman authored May 23, 2009

systems instead of attempting to promote them to a 64-bit SINT_TO_FP or 
FP_TO_SINT.  This is in preparation for removing the type legalization 
code from LegalizeDAG: once type legalization is gone from LegalizeDAG, 
it won't be able to handle the i64 operand/result correctly.

This isn't quite ideal, but I don't think any other operation for any 
target ends up in this situation, so treating this case specially seems 
reasonable.

llvm-svn: 72324

dfe4f253

May 13, 2009
- Run code placement optimization for targets that want it (arm and x86 for now). · ab0d2339
  Evan Cheng authored May 13, 2009
```
llvm-svn: 71726
```
  ab0d2339
May 08, 2009
- Fix PR4152: asm constraint validation happens before dag combine, so we · f1d9b914
  Chris Lattner authored May 08, 2009
```
need to work a bit to combine things like (x+c1+c2) into x+c3.

llvm-svn: 71232
```
  f1d9b914
Apr 30, 2009
- Fix infinite recursion in the C++ code which handles movddup by making it unnecessary. · 7e6e3527
  Nate Begeman authored Apr 29, 2009
```
llvm-svn: 70425
```
  7e6e3527
Apr 29, 2009
- Implement review feedback for vector shuffle work. · 5f829d89
  Nate Begeman authored Apr 29, 2009
```
llvm-svn: 70372
```
  5f829d89
Apr 27, 2009

2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan. · 8d6d4b92

Nate Begeman authored Apr 27, 2009

PR2957

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

llvm-svn: 70225

8d6d4b92

Apr 24, 2009

Fix PR 4004 by including the call to __tls_get_addr in X86tlsaddr. This is not · c1396a23
Rafael Espindola authored Apr 24, 2009
```
very elegant, but neither is the tls specification :-(

llvm-svn: 69968
```
c1396a23
Revert 69952. Causes testsuite failures on linux x86-64. · b93db668
Rafael Espindola authored Apr 24, 2009
```
llvm-svn: 69967
```
b93db668

PR2957 · bb881d66

Nate Begeman authored Apr 24, 2009

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.

llvm-svn: 69952

bb881d66

Apr 21, 2009
- Get rid of what looks like a copy-and-pasted typo. · 7ce5cc6b
  Duncan Sands authored Apr 21, 2009
```
Spotted by gcc-4.5.

llvm-svn: 69673
```
  7ce5cc6b
Apr 20, 2009

Move duplicated AddLiveIn function from X86 and ARM backends to be a method · f8b85477

Bob Wilson authored Apr 20, 2009

in the MachineFunction class, renaming it to addLiveIn for consistency with
the same method in MachineBasicBlock.  Thanks for Anton for suggesting this.

llvm-svn: 69615

f8b85477

Apr 17, 2009

For general dynamic TLS access we must use · 355fe12c

Rafael Espindola authored Apr 17, 2009

leaq	foo@TLSGD(%rip), %rdi

as part of the instruction sequence. Using a register other than %rdi and then
copying it to %rdi is not valid.

llvm-svn: 69350

355fe12c

Apr 13, 2009
- X86-64 TLS support for local exec and initial exec. · 6d6c6043
  Rafael Espindola authored Apr 13, 2009
```
llvm-svn: 68947
```
  6d6c6043
Apr 10, 2009
- Remove the obsolete SelectionDAG::getNodeValueTypes and simplify · de912e24
  Dan Gohman authored Apr 09, 2009
```
code that uses it by using SelectionDAG::getVTList instead.

llvm-svn: 68744
```
  de912e24
Apr 09, 2009
- Fix grammaros in comments. · f1545486
  Dan Gohman authored Apr 09, 2009
```
llvm-svn: 68666
```
  f1545486
Apr 08, 2009

Re-apply 68552. · 3b2df10c

Rafael Espindola authored Apr 08, 2009

Tested by bootstrapping llvm-gcc and using that to build llvm.

llvm-svn: 68645

3b2df10c

Avoid a hard coded constant. · d173f423
Rafael Espindola authored Apr 08, 2009
```
llvm-svn: 68603
```
d173f423

Implement support for using modeling implicit-zero-extension on x86-64 · ad3e549a

Dan Gohman authored Apr 08, 2009

with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.

This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.

Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.

Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.

Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.

Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.

llvm-svn: 68576

ad3e549a

Temporarily revert r68552. This was causing a failure in the self-hosting LLVM · 4aa25b79

Bill Wendling authored Apr 07, 2009

builds.

--- Reverse-merging (from foreign repository) r68552 into '.':
U    test/CodeGen/X86/tls8.ll
U    test/CodeGen/X86/tls10.ll
U    test/CodeGen/X86/tls2.ll
U    test/CodeGen/X86/tls6.ll
U    lib/Target/X86/X86Instr64bit.td
U    lib/Target/X86/X86InstrSSE.td
U    lib/Target/X86/X86InstrInfo.td
U    lib/Target/X86/X86RegisterInfo.cpp
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86CodeEmitter.cpp
U    lib/Target/X86/X86FastISel.cpp
U    lib/Target/X86/X86InstrInfo.h
U    lib/Target/X86/X86ISelDAGToDAG.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U    lib/Target/X86/X86ISelLowering.h
U    lib/Target/X86/X86InstrInfo.cpp
U    lib/Target/X86/X86InstrBuilder.h
U    lib/Target/X86/X86RegisterInfo.td

llvm-svn: 68560

4aa25b79

Apr 07, 2009

Reduce code duplication on the TLS implementation. · 1edda067

Rafael Espindola authored Apr 07, 2009

This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.

Will work on it and on X86-64 support.

llvm-svn: 68552

1edda067

Apr 03, 2009
- Added a x86 dag combine to increase the chances to use a · 9c186c5d
  Mon P Wang authored Apr 03, 2009
```
movq for v2i64 on x86-32.

llvm-svn: 68368
```
  9c186c5d
Apr 02, 2009
- silence warning in release-asserts build. · d2eb0a63
  Chris Lattner authored Apr 01, 2009
```
llvm-svn: 68253
```
  d2eb0a63
Mar 31, 2009
- i128 shift libcalls are not available on x86. · d9d6e427
  Evan Cheng authored Mar 31, 2009
```
llvm-svn: 68133
```
  d9d6e427
Mar 30, 2009

When optimzing a mul by immediate into two, the resulting mul's should get a... · a84a3188

Evan Cheng authored Mar 30, 2009

When optimzing a mul by immediate into two, the resulting mul's should get a x86 specific node to avoid dag combiner from hacking on them further.

llvm-svn: 68066

a84a3188

Mar 28, 2009

Have only one definition of X86AddrNumOperands. · 6ff3dabb
Rafael Espindola authored Mar 28, 2009
```
llvm-svn: 67949
```
6ff3dabb

Optimize some 64-bit multiplication by constants into two lea's or one lea +... · fd81c73c

Evan Cheng authored Mar 28, 2009

Optimize some 64-bit multiplication by constants into two lea's or one lea + shl since imulq is slow (latency 5). e.g.
x * 40
=>
shlq    $3, %rdi
leaq    (%rdi,%rdi,4), %rax

This has the added benefit of allowing more multiply to be folded into addressing mode. e.g.
a * 24 + b
=>
leaq    (%rdi,%rdi,2), %rax
leaq    (%rsi,%rax,8), %rax

llvm-svn: 67917

fd81c73c

Mar 27, 2009
- I am trying to add a segment to the X86 addresses matching to · e7280193
  Rafael Espindola authored Mar 27, 2009
```
improve TLS support (see http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090309/075220.html), but that code is VERY brittle.

This patch just makes it a bit more resistant.

llvm-svn: 67843
```
  e7280193
- -no-implicit-float means explicit fp operations are legal. · d88ebc35
  Evan Cheng authored Mar 26, 2009
```
llvm-svn: 67784
```
  d88ebc35
Mar 26, 2009

Pull transform from target-dependent code into target-independent code. · aa28be65
Bill Wendling authored Mar 26, 2009
```
llvm-svn: 67742
```
aa28be65

Match this pattern so that we can generate simpler code: · 94f299f2

Bill Wendling authored Mar 26, 2009

  %a = ...
  %b = and i32 %a, 2
  %c = srl i32 %b, 1
  %d = br i32 %c, 

into

  %a = ...
  %b = and %a, 2
  %c = X86ISD::CMP %b, 0
  %d = X86ISD::BRCOND %c ...

This applies only when the AND constant value has one bit set and the SRL
constant is equal to the log2 of the AND constant. The back-end is smart enough
to convert the result into a TEST/JMP sequence.

llvm-svn: 67728

94f299f2

Mar 13, 2009

These instructions have special lowering that may lower them to SSE · 798fd56d
Bill Wendling authored Mar 13, 2009
```
instructions. Prevent that if we don't want implicit uses of SSE.

llvm-svn: 66877
```
798fd56d

Fix some significant problems with constant pools that resulted in unnecessary... · 1fb8aedd

Evan Cheng authored Mar 13, 2009

Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues.

1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.

Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.

llvm-svn: 66875

1fb8aedd

generalize the previous code to use the full generality of LEA · 99cc1337

Chris Lattner authored Mar 13, 2009

for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this).  On the example, we now
generate:

_test:
	movl	4(%esp), %eax
	cmpl	$42, (%eax)
	setl	%al
	movzbl	%al, %eax
	leal	4(%eax,%eax,8), %eax
	ret

instead of:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	movl	$4, %ecx
	movl	$13, %eax
	cmovg	%ecx, %eax
	ret

llvm-svn: 66869

99cc1337

optimize the case of cond ? 42 : 41 and friends. This compiles the · 4be6df5d

Chris Lattner authored Mar 13, 2009

example to:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	setg	%al
	movzbl	%al, %eax
	orl	$4294967294, %eax
	ret

instead of:

        movl    4(%esp), %eax
        cmpl    $41, (%eax)
	movl	$4294967294, %ecx
	movl	$4294967295, %eax
	cmova	%ecx, %eax
	ret

which is smaller in code size and faster. rdar://6668608

llvm-svn: 66868

4be6df5d