Commits · 839ad0a5f37ab3653b25724d6aca0309b928818d · Roger Ferrer / llvm-epi-0.8

Mar 17, 2009

CellSPU: · 839ad0a5

Scott Michel authored Mar 17, 2009

- Fix fabs, fneg for f32 and f64.
- Use BuildVectorSDNode.isConstantSplat, now that the functionality exists
- Continue to improve i64 constant lowering. Lower certain special constants
  to the constant pool when they correspond to SPU's shufb instruction's
  special mask values. This avoids the overhead of performing a shuffle on a
  zero-filled vector just to get the special constant when the memory load
  suffices.

llvm-svn: 67067

839ad0a5

Mar 16, 2009

CellSPU: · d1db1aba

Scott Michel authored Mar 16, 2009

Incorporate Tilmann's 128-bit operation patch. Evidently, it gets the
llvm-gcc bootstrap a bit further along.

llvm-svn: 67048

d1db1aba

This causes incorrect stack frame allocation when the last object is an array... · aa7db252

Bruno Cardoso Lopes authored Mar 15, 2009

This causes incorrect stack frame allocation when the last object is an array allocated on the stack which would lead
the compiled program to run over its stack. Thanks to Gil Dogon

llvm-svn: 67034

aa7db252

Mar 14, 2009

Use %rip-relative addressing on x86-64 whenever practical, as · f98cd1b4
Dan Gohman authored Mar 14, 2009
```
it has a smaller encoding than absolute addressing.

llvm-svn: 67002
```
f98cd1b4

Don't forego folding of loads into 64-bit adds when the other · 2293eb60

Dan Gohman authored Mar 14, 2009

operand is a signed 32-bit immediate. Unlike with the 8-bit
signed immediate case, it isn't actually smaller to fold a
32-bit signed immediate instead of a load. In fact, it's
larger in the case of 32-bit unsigned immediates, because
they can be materialized with movl instead of movq.

llvm-svn: 67001

2293eb60

Improve FastISel's handling of truncates to i1, and implement · a62e4ab6

Dan Gohman authored Mar 13, 2009

ptrtoint and inttoptr in X86FastISel. These casts aren't always
handled in the generic FastISel code because X86 sometimes needs
custom code to do truncation and zero-extension.

llvm-svn: 66988

a62e4ab6

Mar 13, 2009

Fix FastISel's assumption that i1 values are always zero-extended · c0bb9595

Dan Gohman authored Mar 13, 2009

by inserting explicit zero extensions where necessary. Included
is a testcase where SelectionDAG produces a virtual register
holding an i1 value which FastISel previously mistakenly assumed
to be zero-extended.

llvm-svn: 66941

c0bb9595

add 8 and 16 bit TLS moves. · 997b74ac
Rafael Espindola authored Mar 13, 2009
```
add a fixme note on how to remove code duplication.

llvm-svn: 66932
```
997b74ac
Improve sext and zext of TLS variables. · 71144973
Rafael Espindola authored Mar 13, 2009
```
llvm-svn: 66922
```
71144973

generalize this code so that fast isel handles integer truncates to i1, which · 3fb71c8f

Chris Lattner authored Mar 13, 2009

codegen to the same thing as integer truncates to i8 (the top bits are 
just undefined).  This implements rdar://6667338

llvm-svn: 66902

3fb71c8f

These instructions have special lowering that may lower them to SSE · 798fd56d
Bill Wendling authored Mar 13, 2009
```
instructions. Prevent that if we don't want implicit uses of SSE.

llvm-svn: 66877
```
798fd56d

Fix some significant problems with constant pools that resulted in unnecessary... · 1fb8aedd

Evan Cheng authored Mar 13, 2009

Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues.

1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.

Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.

llvm-svn: 66875

1fb8aedd

generalize the previous code to use the full generality of LEA · 99cc1337

Chris Lattner authored Mar 13, 2009

for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this).  On the example, we now
generate:

_test:
	movl	4(%esp), %eax
	cmpl	$42, (%eax)
	setl	%al
	movzbl	%al, %eax
	leal	4(%eax,%eax,8), %eax
	ret

instead of:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	movl	$4, %ecx
	movl	$13, %eax
	cmovg	%ecx, %eax
	ret

llvm-svn: 66869

99cc1337

optimize the case of cond ? 42 : 41 and friends. This compiles the · 4be6df5d

Chris Lattner authored Mar 13, 2009

example to:

_test:
	movl	4(%esp), %eax
	cmpl	$41, (%eax)
	setg	%al
	movzbl	%al, %eax
	orl	$4294967294, %eax
	ret

instead of:

        movl    4(%esp), %eax
        cmpl    $41, (%eax)
	movl	$4294967294, %ecx
	movl	$4294967295, %eax
	cmova	%ecx, %eax
	ret

which is smaller in code size and faster. rdar://6668608

llvm-svn: 66868

4be6df5d

Enhance address-mode folding of ISD::ADD to handle cases where the · a1d92423

Dan Gohman authored Mar 13, 2009

operands can't both be fully folded at the same time. For example,
in the included testcase, a global variable is being added with
an add of two values. The global variable wants RIP-relative
addressing, so it can't share the address with another base
register, but it's still possible to fold the initial add.

llvm-svn: 66865

a1d92423

Mar 12, 2009

Re-apply 66024 with fixes: 1. Fixed indirect call to immediate address... · 2a332aa8

Evan Cheng authored Mar 12, 2009

Re-apply 66024 with fixes: 1. Fixed indirect call to immediate address assembly. 2. Fixed JIT encoding by making the address pc-relative.

llvm-svn: 66803

2a332aa8

Move 3 "(add (select cc, 0, c), x) -> (select cc, x, (add, x, c))" · 4147f08e

Chris Lattner authored Mar 12, 2009

related transformations out of target-specific dag combine into the
ARM backend.  These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).

Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0  -> (zext(cond) << 3).  This happens frequently
with the recently added cp constant select optimization, but is a
very general xform.  For example, we now compile the second example
in const-select.ll to:

_test:
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        seta    %al
        movzbl  %al, %eax
        movl    4(%esp), %ecx
        movsbl  (%ecx,%eax,4), %eax
        ret

instead of:

_test:
        movl    4(%esp), %eax
        leal    4(%eax), %ecx
        movsd   LCPI2_0, %xmm0
        ucomisd 8(%esp), %xmm0
        cmovbe  %eax, %ecx
        movsbl  (%ecx), %eax
        ret

This passes multisource and dejagnu.

llvm-svn: 66779

4147f08e

improve comment. · a492d29c
Chris Lattner authored Mar 12, 2009
```
llvm-svn: 66778
```
a492d29c
On x86, if the only use of a i64 load is a i64 store, generate a pair of... · ef0b7cc2
Evan Cheng authored Mar 12, 2009
```
On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead.

llvm-svn: 66776
```
ef0b7cc2
Forgot to check-in this as part of 7761. · 8bb50e23
Sanjiv Gupta authored Mar 12, 2009
```
llvm-svn: 66763
```
8bb50e23

Banksel optimization is now based on the section names of symbols, since the... · f883419b

Sanjiv Gupta authored Mar 12, 2009

Banksel optimization is now based on the section names of symbols, since the symbols in one section will always be put into one bank.

llvm-svn: 66761

f883419b

Revert r66024. The JIT encoding for CALLpcrel32 is wrong -- see PR3773, and the · 5637df37
Dan Gohman authored Mar 11, 2009
```
assembly text output uses an indirect call ("call *") instead of a direct call.

llvm-svn: 66735
```
5637df37

Mar 11, 2009
- optimize i8 and i16 tls values. · 294943c9
  Rafael Espindola authored Mar 11, 2009
```
llvm-svn: 66725
```
  294943c9
- Add a -no-implicit-float flag. This acts like -soft-float, but may generate · 42adc73a
  Bill Wendling authored Mar 11, 2009
```
floating point instructions that are explicitly specified by the user.

llvm-svn: 66719
```
  42adc73a
- It makes no sense to have a ODR version of common · 4581bebf
  Duncan Sands authored Mar 11, 2009
```
linkage, so remove it.

llvm-svn: 66690
```
  4581bebf
- For yonah, fix a vector shuffle case for v16i8 where we didn't properly clear some bits. · 25c6a46a
  Mon P Wang authored Mar 11, 2009
```
llvm-svn: 66684
```
  25c6a46a
- fix PR3785, a valgrind error on test/CodeGen/ARM/pr3502.ll · 93e87652
  Chris Lattner authored Mar 11, 2009
```
llvm-svn: 66660
```
  93e87652
- Remove the one-definition-rule version of extern_weak · e2881053
  Duncan Sands authored Mar 11, 2009
```
linkage: this linkage type only applies to declarations,
but ODR is only relevant to globals with definitions.

llvm-svn: 66650
```
  e2881053
- Fixed a v8i16 shuffle case that should generate a pshufb instead of a pshuflw/hw. · ce6a26cb
  Mon P Wang authored Mar 11, 2009
```
llvm-svn: 66645
```
  ce6a26cb
- formatting change, reduce indentation. No functionality change. · 248ad00a
  Chris Lattner authored Mar 11, 2009
```
llvm-svn: 66642
```
  248ad00a
Mar 10, 2009
- Mark the Defs and Uses of STATUS register correctly, plus some reformatting. · afb355f2
  Sanjiv Gupta authored Mar 10, 2009
```
llvm-svn: 66540
```
  afb355f2
- Add more information to the EFLAGS note. · b0d4009e
  Dan Gohman authored Mar 10, 2009
```
llvm-svn: 66515
```
  b0d4009e
- Add a note about EFLAGS optimization. · d5b35ee2
  Dan Gohman authored Mar 09, 2009
```
llvm-svn: 66508
```
  d5b35ee2
Mar 09, 2009

ARM target now also recognize triplets like thumbv6-apple-darwin and set thumb... · 0ee0da84

Evan Cheng authored Mar 09, 2009

ARM target now also recognize triplets like thumbv6-apple-darwin and set thumb mode and arch subversion. Eventually thumb triplets will go way and replaced with function notes.

llvm-svn: 66435

0ee0da84

ARM isLegalAddressImmediate should check if type is a simple type now that... · ce5dfb69

Evan Cheng authored Mar 09, 2009

ARM isLegalAddressImmediate should check if type is a simple type now that optimizer can create values of funky scalar types.

llvm-svn: 66429

ce5dfb69

Mar 08, 2009
- do not export all the X86FastISel symbols, ever. · d5ac9d87
  Chris Lattner authored Mar 08, 2009
```
llvm-svn: 66382
```
  d5ac9d87
- Recognize triplets starting with armv5-, armv6- etc. And set the ARM arch version accordingly. · ec415efb
  Evan Cheng authored Mar 08, 2009
```
llvm-svn: 66365
```
  ec415efb
- add a note. · 393ac628
  Chris Lattner authored Mar 08, 2009
```
llvm-svn: 66360
```
  393ac628
- add a note. · cfd1f7aa
  Chris Lattner authored Mar 08, 2009
```
llvm-svn: 66359
```
  cfd1f7aa
Mar 07, 2009

Introduce new linkage types linkonce_odr, weak_odr, common_odr · 12da8ce3

Duncan Sands authored Mar 07, 2009

and extern_weak_odr.  These are the same as the non-odr versions,
except that they indicate that the global will only be overridden
by an *equivalent* global.  In C, a function with weak linkage can
be overridden by a function which behaves completely differently.
This means that IP passes have to skip weak functions, since any
deductions made from the function definition might be wrong, since
the definition could be replaced by something completely different
at link time.   This is not allowed in C++, thanks to the ODR
(One-Definition-Rule): if a function is replaced by another at
link-time, then the new function must be the same as the original
function.  If a language knows that a function or other global can
only be overridden by an equivalent global, it can give it the
weak_odr linkage type, and the optimizers will understand that it
is alright to make deductions based on the function body.  The
code generators on the other hand map weak and weak_odr linkage
to the same thing.

llvm-svn: 66339

12da8ce3