Commits · b9bfd0945aaee4eef416ff6becec6154815d53e4 · Roger Ferrer / llvm-epi-0.8

Dec 07, 2010
- Remove target specific node MipsISD::CMov, which is not used because all... · b9bfd094
  Bruno Cardoso Lopes authored Dec 07, 2010
```
Remove target specific node MipsISD::CMov, which is not used because all conditional moves are directly matched using tablegen patterns. If there's a need in the future, we can introduce it again

llvm-svn: 121164
```
  b9bfd094
- Match a pattern generated by a dag combiner opt where: · f0c6e378
  Bruno Cardoso Lopes authored Dec 07, 2010
```
(select (load (load tga0)) (load tga1)) => (load (select (load tga0) tga1))

Thanks to Akira for pointing that.

llvm-svn: 121163
```
  f0c6e378
- Encode the literal field for tCMPzi instruction. · 6e517d65
  Jim Grosbach authored Dec 07, 2010
```
llvm-svn: 121153
```
  6e517d65
- Add parens to pacify gcc. · cfa9a893
  Benjamin Kramer authored Dec 07, 2010
```
llvm-svn: 121142
```
  cfa9a893
- PR5207: Change APInt methods trunc(), sext(), zext(), sextOrTrunc() and · 583abbc4
  Jay Foad authored Dec 07, 2010
```
zextOrTrunc(), and APSInt methods extend(), extOrTrunc() and new method
trunc(), to be const and to return a new value instead of modifying the
object in place.

llvm-svn: 121120
```
  583abbc4
- lib/Target/X86/X86MCAsmInfo.cpp: [PR8741] On Win64, specify explicit PrivateGlobalPrefix as ".L". · 547cc6f0
  NAKAMURA Takumi authored Dec 07, 2010
```
Or, global symbols @Lxxxx might be treated as temporal symbol by MCSymbol.

llvm-svn: 121103
```
  547cc6f0
- Second attempt at converting Thumb2's LDRpci, including updating the gazillion... · 99ea8a35
  Owen Anderson authored Dec 07, 2010
```
Second attempt at converting Thumb2's LDRpci, including updating the gazillion places that need to know about it.

llvm-svn: 121082
```
  99ea8a35
- Add fixup for Thumb1 BL/BLX instructions. · 9e199469
  Jim Grosbach authored Dec 06, 2010
```
llvm-svn: 121072
```
  9e199469
Dec 06, 2010
- Adding bug fix that was suppose to be part of 121044. · dba03b05
  Wesley Peck authored Dec 06, 2010
```
patch contributed by Jack Whitham!

llvm-svn: 121049
```
  dba03b05
- Fixed reversed operands for IDIV and CMP instructions in MBlaze backend. · 8da34b6c
  Wesley Peck authored Dec 06, 2010
```
Use BRAD instead of BRD for indirect branches in MBlaze backend.

patch contributed by Jack Whitham!

llvm-svn: 121044
```
  8da34b6c
- Fix a 16-bit immediate value detection bug in the MBlaze delay slot filler. · 6ce9b608
  Wesley Peck authored Dec 06, 2010
```
Address more hazards in the MBlaze delay slot filler.

patch contributed by Jack Whitham!

llvm-svn: 121037
```
  6ce9b608
- Remove the instruction fragment to data fragment lowering since it was causing · 0f30fec0
  Rafael Espindola authored Dec 06, 2010
```
freed data to be read. I will open a bug to track it being reenabled.

llvm-svn: 121028
```
  0f30fec0
- Revert r121021, which broke the buildbots. · c1ee8e35
  Owen Anderson authored Dec 06, 2010
```
llvm-svn: 121026
```
  c1ee8e35
- Trailing whitespace. · 67f13b19
  Jim Grosbach authored Dec 06, 2010
```
llvm-svn: 121024
```
  67f13b19
- Improve handling of Thumb2 PC-relative loads by converting LDRpci (and friends) to Pseudos. · bb4a76fc
  Owen Anderson authored Dec 06, 2010
```
llvm-svn: 121021
```
  bb4a76fc
- Encode the register operand of ARM CondCode operands correctly. ARM::CPSR if · 968c9272
  Jim Grosbach authored Dec 06, 2010
```
the instruction is predicated, reg0 otherwise.

llvm-svn: 121020
```
  968c9272
- The ARM AsmMatcher needs to know that the CCOut operand is a register value, · 0bfb4d50
  Jim Grosbach authored Dec 06, 2010
```
not an immediate. It stores either ARM::CPSR or reg0.

llvm-svn: 121018
```
  0bfb4d50
- Second try at making direct object emission produce the same results · 44bbe36d
  Rafael Espindola authored Dec 06, 2010
```
as llc + llvm-mc. This time ELF is not changed and I tested that llvm-gcc
bootstrap on darwin10 using darwin9's assembler and linker.

llvm-svn: 121006
```
  44bbe36d
- ptx: add shift instructions · 9f2af628
  Che-Liang Chiou authored Dec 06, 2010
```
llvm-svn: 120982
```
  9f2af628
- Eliminate unneeded #include's. · abd6d274
  Evan Cheng authored Dec 05, 2010
```
llvm-svn: 120971
```
  abd6d274
- ARM/CMakeLists.txt: Add missing MLxExpansionPass.cpp since r120960. · 70fbbf53
  NAKAMURA Takumi authored Dec 05, 2010
```
llvm-svn: 120966
```
  70fbbf53
- Code clean up. · 12f4d615
  Evan Cheng authored Dec 05, 2010
```
llvm-svn: 120965
```
  12f4d615
- Remove an unused variable. · b8a662f0
  Evan Cheng authored Dec 05, 2010
```
llvm-svn: 120964
```
  b8a662f0
Dec 05, 2010

Making use of VFP / NEON floating point multiply-accumulate / subtraction is · 62c7b5bf

Evan Cheng authored Dec 05, 2010

difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Work in progress, only A+B are enabled.

llvm-svn: 120960

62c7b5bf

Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags · 68861717

Chris Lattner authored Dec 05, 2010

result.  This allows us to compile:

void *test12(long count) {
      return new int[count];
}

into:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	movq	$-1, %rdi
	cmovnoq	%rax, %rdi
	jmp	__Znam                  ## TAILCALL

instead of:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	seto	%cl
	testb	%cl, %cl
	movq	$-1, %rdi
	cmoveq	%rax, %rdi
	jmp	__Znam

Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
which would eliminate the need for the 'movq %rdi, %rax'.

llvm-svn: 120936

68861717

it turns out that when ".with.overflow" intrinsics were added to the X86 · 364bb0a0

Chris Lattner authored Dec 05, 2010

backend that they were all implemented except umul.  This one fell back
to the default implementation that did a hi/lo multiply and compared the
top.  Fix this to check the overflow flag that the 'mul' instruction
sets, so we can avoid an explicit test.  Now we compile:

void *func(long count) {
      return new int[count];
}

into:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
	testb	%cl, %cl                ## encoding: [0x84,0xc9]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

instead of:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

Other than the silly seto+test, this is using the o bit directly, so it's going in the right
direction.

llvm-svn: 120935

364bb0a0

generalize the previous check to handle -1 on either side of the · 116580a1

Chris Lattner authored Dec 05, 2010

select, inserting a not to compensate.  Add a missing isZero check
that I lost somehow.

This improves codegen of:

void *func(long count) {
      return new int[count];
}

from:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

to:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

llvm-svn: 120932

116580a1

Improve an integer select optimization in two ways: · 342e6ea5

Chris Lattner authored Dec 05, 2010

1. generalize 
    (select (x == 0), -1, 0) -> (sign_bit (x - 1))
to:
    (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y

2. Handle the identical pattern that happens with !=:
   (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y

cmov is often high latency and can't fold immediates or
memory operands.  For example for (x == 0) ? -1 : 1, before 
we got:

< 	testb	%sil, %sil
< 	movl	$-1, %ecx
< 	movl	$1, %eax
< 	cmovel	%ecx, %eax

now we get:

> 	cmpb	$1, %sil
> 	sbbl	%eax, %eax
> 	orl	$1, %eax

llvm-svn: 120929

342e6ea5

Initialize HasPOPCNT. · 2bce78e8
Bill Wendling authored Dec 04, 2010
```
llvm-svn: 120923
```
2bce78e8

Dec 04, 2010

Add patterns for the x86 popcnt instruction. · 2f489236

Benjamin Kramer authored Dec 04, 2010

- Also adds a new POPCNT subtarget feature that is currently enabled if the target
  supports SSE4.2 (nehalem) or SSE4A (barcelona).

llvm-svn: 120917

2f489236

Simplify code. No functionality change. · 8ceebfaa
Benjamin Kramer authored Dec 04, 2010
```
llvm-svn: 120907
```
8ceebfaa
The Thumb tADDrSPi instruction is not valid when the destination is SP. · ed854baa
Bob Wilson authored Dec 04, 2010
```
Check for that and try narrowing it to tADDspi instead.  Radar 8724703.

llvm-svn: 120892
```
ed854baa

There are two reasons why we might want to use · 1c8ac8f0

Rafael Espindola authored Dec 04, 2010

foo = a - b
.long foo
instead of just
.long a - b

First, on darwin9 64 bits the assembler produces the wrong result. Second,
if "a" is the end of the section all darwin assemblers (9, 10 and mc) will not
consider a - b to be a constant but will if the dummy foo is created.

Split how we handle these cases. The first one is something MC should take care
of. The second one has to be handled by the caller.

llvm-svn: 120889

1c8ac8f0

Encode condition code for Thumb1 conditional branch instruction. · ce18d7eb
Jim Grosbach authored Dec 04, 2010
```
llvm-svn: 120865
```
ce18d7eb
Correctly size-reduce the t2CMPzrr instruction to tCMPzr when possible. · 5bae054f
Jim Grosbach authored Dec 03, 2010
```
tCMPzhir has undefined behavior when both source registers are low registers.
rdar://8728577

llvm-svn: 120858
```
5bae054f
Use correct variable names to match the patterns. · 127d7485
Bill Wendling authored Dec 03, 2010
```
llvm-svn: 120857
```
127d7485
Match pattern operand names to expected encoding field names. This corrects the · a09cbbee
Jim Grosbach authored Dec 03, 2010
```
operand encoding ordering of the instruction.

llvm-svn: 120852
```
a09cbbee

Dec 03, 2010
- Remove incorrect BL target encoding (it's similar to, but not the same as the · e4fee204
  Jim Grosbach authored Dec 03, 2010
```
ARM instruction). Add encoding of bits 13 and 11.

llvm-svn: 120849
```
  e4fee204
- Encode the 32-bit wide Thumb (and Thumb2) instructions with the high order · 567ebd0c
  Jim Grosbach authored Dec 03, 2010
```
halfword being emitted to the stream first. rdar://8728174

llvm-svn: 120848
```
  567ebd0c
- Revert this change since it breaks a couple of the AVX tests. · a6c55a31
  Nate Begeman authored Dec 03, 2010
```
I'm unclear if the tests are actually correct or not, but reverting for now.

llvm-svn: 120847
```
  a6c55a31