Commits · 8673657275819f80a34fec9afe112d061ee6e8e3 · Roger Ferrer / llvm-epi-0.8

May 13, 2009
- Run code placement optimization for targets that want it (arm and x86 for now). · ab0d2339
  Evan Cheng authored May 13, 2009
```
llvm-svn: 71726
```
  ab0d2339
May 08, 2009
- Fix PR4152: asm constraint validation happens before dag combine, so we · f1d9b914
  Chris Lattner authored May 08, 2009
```
need to work a bit to combine things like (x+c1+c2) into x+c3.

llvm-svn: 71232
```
  f1d9b914
Apr 30, 2009
- Fix infinite recursion in the C++ code which handles movddup by making it unnecessary. · 7e6e3527
  Nate Begeman authored Apr 29, 2009
```
llvm-svn: 70425
```
  7e6e3527
Apr 29, 2009
- Implement review feedback for vector shuffle work. · 5f829d89
  Nate Begeman authored Apr 29, 2009
```
llvm-svn: 70372
```
  5f829d89
Apr 27, 2009

2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan. · 8d6d4b92

Nate Begeman authored Apr 27, 2009

PR2957

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

llvm-svn: 70225

8d6d4b92

Apr 24, 2009

Fix PR 4004 by including the call to __tls_get_addr in X86tlsaddr. This is not · c1396a23
Rafael Espindola authored Apr 24, 2009
```
very elegant, but neither is the tls specification :-(

llvm-svn: 69968
```
c1396a23
Revert 69952. Causes testsuite failures on linux x86-64. · b93db668
Rafael Espindola authored Apr 24, 2009
```
llvm-svn: 69967
```
b93db668

PR2957 · bb881d66

Nate Begeman authored Apr 24, 2009

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.

llvm-svn: 69952

bb881d66

Apr 21, 2009
- Get rid of what looks like a copy-and-pasted typo. · 7ce5cc6b
  Duncan Sands authored Apr 21, 2009
```
Spotted by gcc-4.5.

llvm-svn: 69673
```
  7ce5cc6b
Apr 20, 2009

Move duplicated AddLiveIn function from X86 and ARM backends to be a method · f8b85477

Bob Wilson authored Apr 20, 2009

in the MachineFunction class, renaming it to addLiveIn for consistency with
the same method in MachineBasicBlock.  Thanks for Anton for suggesting this.

llvm-svn: 69615

f8b85477

Apr 17, 2009

For general dynamic TLS access we must use · 355fe12c

Rafael Espindola authored Apr 17, 2009

leaq	foo@TLSGD(%rip), %rdi

as part of the instruction sequence. Using a register other than %rdi and then
copying it to %rdi is not valid.

llvm-svn: 69350

355fe12c

Apr 13, 2009
- X86-64 TLS support for local exec and initial exec. · 6d6c6043
  Rafael Espindola authored Apr 13, 2009
```
llvm-svn: 68947
```
  6d6c6043
Apr 10, 2009
- Remove the obsolete SelectionDAG::getNodeValueTypes and simplify · de912e24
  Dan Gohman authored Apr 09, 2009
```
code that uses it by using SelectionDAG::getVTList instead.

llvm-svn: 68744
```
  de912e24
Apr 09, 2009
- Fix grammaros in comments. · f1545486
  Dan Gohman authored Apr 09, 2009
```
llvm-svn: 68666
```
  f1545486
Apr 08, 2009

Re-apply 68552. · 3b2df10c

Rafael Espindola authored Apr 08, 2009

Tested by bootstrapping llvm-gcc and using that to build llvm.

llvm-svn: 68645

3b2df10c

Avoid a hard coded constant. · d173f423
Rafael Espindola authored Apr 08, 2009
```
llvm-svn: 68603
```
d173f423

Implement support for using modeling implicit-zero-extension on x86-64 · ad3e549a

Dan Gohman authored Apr 08, 2009

with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
instructions), and teach the DAGCombiner to take advantage of this on
targets which support it. This eliminates many redundant
zero-extension operations on x86-64.

This adds a new TargetLowering hook, isZExtFree. It's similar to
isTruncateFree, except it only applies to actual definitions, and not
no-op truncates which may not zero the high bits.

Also, this adds a new optimization to SimplifyDemandedBits: transform
operations like x+y into (zext (add (trunc x), (trunc y))) on targets
where all the casts are no-ops. In contexts where the high part of the
add is explicitly masked off, this allows the mask operation to be
eliminated. Fix the DAGCombiner to avoid undoing these transformations
to eliminate casts on targets where the casts are no-ops.

Also, this adds a new two-address lowering heuristic. Since
two-address lowering runs before coalescing, it helps to be able to
look through copies when deciding whether commuting and/or
three-address conversion are profitable.

Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
the case that a clobber range extended both before and beyond an
existing live range. In that case, multiple live ranges need to be
added. This was exposed by the new subreg coalescing code.

Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
spiller behavior it was looking for no longer occurrs with the new
instruction selection.

llvm-svn: 68576

ad3e549a

Temporarily revert r68552. This was causing a failure in the self-hosting LLVM · 4aa25b79

Bill Wendling authored Apr 07, 2009

builds.

--- Reverse-merging (from foreign repository) r68552 into '.':
U    test/CodeGen/X86/tls8.ll
U    test/CodeGen/X86/tls10.ll
U    test/CodeGen/X86/tls2.ll
U    test/CodeGen/X86/tls6.ll
U    lib/Target/X86/X86Instr64bit.td
U    lib/Target/X86/X86InstrSSE.td
U    lib/Target/X86/X86InstrInfo.td
U    lib/Target/X86/X86RegisterInfo.cpp
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86CodeEmitter.cpp
U    lib/Target/X86/X86FastISel.cpp
U    lib/Target/X86/X86InstrInfo.h
U    lib/Target/X86/X86ISelDAGToDAG.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U    lib/Target/X86/X86ISelLowering.h
U    lib/Target/X86/X86InstrInfo.cpp
U    lib/Target/X86/X86InstrBuilder.h
U    lib/Target/X86/X86RegisterInfo.td

llvm-svn: 68560

4aa25b79

Apr 07, 2009

Reduce code duplication on the TLS implementation. · 1edda067

Rafael Espindola authored Apr 07, 2009

This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.

Will work on it and on X86-64 support.

llvm-svn: 68552

1edda067

Apr 03, 2009
- Added a x86 dag combine to increase the chances to use a · 9c186c5d
  Mon P Wang authored Apr 03, 2009
```
movq for v2i64 on x86-32.

llvm-svn: 68368
```
  9c186c5d
Apr 02, 2009
- silence warning in release-asserts build. · d2eb0a63
  Chris Lattner authored Apr 01, 2009
```
llvm-svn: 68253
```
  d2eb0a63
Mar 31, 2009
- i128 shift libcalls are not available on x86. · d9d6e427
  Evan Cheng authored Mar 31, 2009
```
llvm-svn: 68133
```
  d9d6e427
Mar 30, 2009

When optimzing a mul by immediate into two, the resulting mul's should get a... · a84a3188

Evan Cheng authored Mar 30, 2009

When optimzing a mul by immediate into two, the resulting mul's should get a x86 specific node to avoid dag combiner from hacking on them further.

llvm-svn: 68066

a84a3188

Mar 28, 2009

Have only one definition of X86AddrNumOperands. · 6ff3dabb
Rafael Espindola authored Mar 28, 2009
```
llvm-svn: 67949
```
6ff3dabb

Optimize some 64-bit multiplication by constants into two lea's or one lea +... · fd81c73c

Evan Cheng authored Mar 28, 2009

Optimize some 64-bit multiplication by constants into two lea's or one lea + shl since imulq is slow (latency 5). e.g.
x * 40
=>
shlq    $3, %rdi
leaq    (%rdi,%rdi,4), %rax

This has the added benefit of allowing more multiply to be folded into addressing mode. e.g.
a * 24 + b
=>
leaq    (%rdi,%rdi,2), %rax
leaq    (%rsi,%rax,8), %rax

llvm-svn: 67917

fd81c73c

Mar 27, 2009
- I am trying to add a segment to the X86 addresses matching to · e7280193
  Rafael Espindola authored Mar 27, 2009
```
improve TLS support (see http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090309/075220.html), but that code is VERY brittle.

This patch just makes it a bit more resistant.

llvm-svn: 67843
```
  e7280193
- -no-implicit-float means explicit fp operations are legal. · d88ebc35
  Evan Cheng authored Mar 26, 2009
```
llvm-svn: 67784
```
  d88ebc35
Mar 26, 2009

Pull transform from target-dependent code into target-independent code. · aa28be65
Bill Wendling authored Mar 26, 2009
```
llvm-svn: 67742
```
aa28be65

Match this pattern so that we can generate simpler code: · 94f299f2

Bill Wendling authored Mar 26, 2009

  %a = ...
  %b = and i32 %a, 2
  %c = srl i32 %b, 1
  %d = br i32 %c, 

into

  %a = ...
  %b = and %a, 2
  %c = X86ISD::CMP %b, 0
  %d = X86ISD::BRCOND %c ...

This applies only when the AND constant value has one bit set and the SRL
constant is equal to the log2 of the AND constant. The back-end is smart enough
to convert the result into a TEST/JMP sequence.

llvm-svn: 67728

94f299f2

Mar 13, 2009

These instructions have special lowering that may lower them to SSE · 798fd56d
Bill Wendling authored Mar 13, 2009
```
instructions. Prevent that if we don't want implicit uses of SSE.

llvm-svn: 66877
```
798fd56d

Fix some significant problems with constant pools that resulted in unnecessary... · 1fb8aedd

Evan Cheng authored Mar 13, 2009

Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues.

1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.

Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.

llvm-svn: 66875

1fb8aedd

generalize the previous code to use the full generality of LEA · 99cc1337

Chris Lattner authored Mar 13, 2009

for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this).  On the example, we now
generate:

_test:
	movl	4(%esp), %eax
	cmpl	$42, (%eax)
	setl	%al
	movzbl	%al, %eax
	leal	4(%eax,%eax,8), %eax
	ret

instead of:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	movl	$4, %ecx
	movl	$13, %eax
	cmovg	%ecx, %eax
	ret

llvm-svn: 66869

99cc1337

optimize the case of cond ? 42 : 41 and friends. This compiles the · 4be6df5d

Chris Lattner authored Mar 13, 2009

example to:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	setg	%al
	movzbl	%al, %eax
	orl	$4294967294, %eax
	ret

instead of:

        movl    4(%esp), %eax
        cmpl    $41, (%eax)
	movl	$4294967294, %ecx
	movl	$4294967295, %eax
	cmova	%ecx, %eax
	ret

which is smaller in code size and faster. rdar://6668608

llvm-svn: 66868

4be6df5d

Mar 12, 2009

Move 3 "(add (select cc, 0, c), x) -> (select cc, x, (add, x, c))" · 4147f08e

Chris Lattner authored Mar 12, 2009

related transformations out of target-specific dag combine into the
ARM backend.  These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).

Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0  -> (zext(cond) << 3).  This happens frequently
with the recently added cp constant select optimization, but is a
very general xform.  For example, we now compile the second example
in const-select.ll to:

_test:
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        seta    %al
        movzbl  %al, %eax
        movl    4(%esp), %ecx
        movsbl  (%ecx,%eax,4), %eax
        ret

instead of:

_test:
        movl    4(%esp), %eax
        leal    4(%eax), %ecx
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        cmovbe  %eax, %ecx
        movsbl  (%ecx), %eax
        ret

This passes multisource and dejagnu.

llvm-svn: 66779

4147f08e

On x86, if the only use of a i64 load is a i64 store, generate a pair of... · ef0b7cc2
Evan Cheng authored Mar 12, 2009
```
On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead.

llvm-svn: 66776
```
ef0b7cc2

Mar 11, 2009
- Add a -no-implicit-float flag. This acts like -soft-float, but may generate · 42adc73a
  Bill Wendling authored Mar 11, 2009
```
floating point instructions that are explicitly specified by the user.

llvm-svn: 66719
```
  42adc73a
- For yonah, fix a vector shuffle case for v16i8 where we didn't properly clear some bits. · 25c6a46a
  Mon P Wang authored Mar 11, 2009
```
llvm-svn: 66684
```
  25c6a46a
- Fixed a v8i16 shuffle case that should generate a pshufb instead of a pshuflw/hw. · ce6a26cb
  Mon P Wang authored Mar 11, 2009
```
llvm-svn: 66645
```
  ce6a26cb
- formatting change, reduce indentation. No functionality change. · 248ad00a
  Chris Lattner authored Mar 11, 2009
```
llvm-svn: 66642
```
  248ad00a
Mar 07, 2009

Arithmetic instructions don't set EFLAGS bits OF and CF bits · ff659b5b

Dan Gohman authored Mar 07, 2009

the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.

llvm-svn: 66318

ff659b5b