Commits · 3e600a29d31e7dd6212f91f9203d4d19ec48235c · Roger Ferrer / llvm-epi-0.8

Feb 05, 2011

· 96d07a82

David Greene authored Feb 05, 2011

[AVX] Revert 124910 until clients are ready.

llvm-svn: 124912

96d07a82

· bdd48150

David Greene authored Feb 04, 2011

[AVX] Add some utilities to insert and extract 128-bit subvectors.
This allows us to easily support 256-bit operations that don't have
native 256-bit support.  This applies to integer operations, certain
types of shuffles and various othher things.

llvm-svn: 124910

bdd48150

Feb 04, 2011

· 653f1eed

David Greene authored Feb 04, 2011

[AVX] Support VSINSERTF128 with more patterns and appropriate
infrastructure.  This makes lowering 256-bit vectors to 128-bit
vectors simple when 256-bit vector support is not available.

llvm-svn: 124868

653f1eed

Feb 03, 2011

· c4da110f

David Greene authored Feb 03, 2011

[AVX] VEXTRACTF128 support.  This commit includes patterns for
matching EXTRACT_SUBVECTOR to VEXTRACTF128 along with support routines
to examine and translate index values.  VINSERTF128 comes next.  With
these two in place we can begin supporting more AVX operations as
INSERT/EXTRACT can be used as a fallback when 256-bit support is not
available.

llvm-svn: 124797

c4da110f

Fix PR9127 by reversing the operands even if they have more then one use. · d11311f2

Rafael Espindola authored Feb 03, 2011

Reversing the operands allows us to fold, but doesn't force us to. Also, at
this point the DAG is still being optimized, so the check for hasOneUse is not
very precise.

llvm-svn: 124773

d11311f2

Feb 01, 2011
- Patches to build EFI with Clang/LLVM. By Carl Norum. · d22a4a1f
  Evan Cheng authored Feb 01, 2011
```
llvm-svn: 124639
```
  d22a4a1f
Jan 31, 2011
- Keep track of incoming argument's location while emitting LiveIns. · 56cc5fdf
  Devang Patel authored Jan 31, 2011
```
llvm-svn: 124611
```
  56cc5fdf
Jan 27, 2011

· 34f7c0d8

David Greene authored Jan 27, 2011

[AVX] Clean up the code to configure target lowering for AVX.  Specify
how to lower more/new operations.  This is a prerequisite for adding
additional AVX lowering.

llvm-svn: 124447

34f7c0d8

Jan 26, 2011

· bab5e6ed

David Greene authored Jan 26, 2011

[AVX] Add INSERT_SUBVECTOR and support it on x86.  This provides a
default implementation for x86, going through the stack in a similr
fashion to how the codegen implements BUILD_VECTOR.  Eventually this
will get matched to VINSERTF128 if AVX is available.

llvm-svn: 124307

bab5e6ed

· b6f16119

David Greene authored Jan 26, 2011

[AVX] Support EXTRACT_SUBVECTOR on x86.  This provides a default
implementation of EXTRACT_SUBVECTOR for x86, going through the stack
in a similr fashion to how the codegen implements BUILD_VECTOR.
Eventually this will get matched to VEXTRACTF128 if AVX is available.

llvm-svn: 124292

b6f16119

Target/X86: Tweak win64's tailcall. · 0cfdac07
NAKAMURA Takumi authored Jan 26, 2011
```
llvm-svn: 124272
```
0cfdac07
Fix whitespace. · 9d29eff1
NAKAMURA Takumi authored Jan 26, 2011
```
llvm-svn: 124270
```
9d29eff1

Jan 16, 2011
- fix PR8981, a crash trying to form a conditional inc with a floating point compare. · 218092e6
  Chris Lattner authored Jan 16, 2011
```
llvm-svn: 123560
```
  218092e6
Jan 10, 2011
- Rename TargetFrameInfo into TargetFrameLowering. Also, put couple of FIXMEs... · 2f931281
  Anton Korobeynikov authored Jan 10, 2011
```
Rename TargetFrameInfo into TargetFrameLowering. Also, put couple of FIXMEs and fixes here and there.

llvm-svn: 123170
```
  2f931281
- Simplify a bunch of isVirtualRegister() and isPhysicalRegister() logic. · 2fb5b315
  Jakob Stoklund Olesen authored Jan 10, 2011
```
These functions not longer assert when passed 0, but simply return false instead.

No functional change intended.

llvm-svn: 123155
```
  2fb5b315
Jan 08, 2011
- Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call. · 078b0b09
  Evan Cheng authored Jan 08, 2011
```
llvm-svn: 123048
```
  078b0b09
Jan 07, 2011

Revert r122955. It seems using movups to lower memcpy can cause massive... · a048c83f

Evan Cheng authored Jan 07, 2011

Revert r122955. It seems using movups to lower memcpy can cause massive regression (even on Nehalem) in edge cases. I also didn't see any real performance benefit.

llvm-svn: 123015

a048c83f

Jan 06, 2011

Use movups to lower memcpy and memset even if it's not fast (like corei7). · 7998b1d6

Evan Cheng authored Jan 06, 2011

The theory is it's still faster than a pair of movq / a quad of movl. This
will probably hurt older chips like P4 but should run faster on current
and future Intel processors. rdar://8817010

llvm-svn: 122955

7998b1d6

Re-implement r122936 with proper target hooks. Now getMaxStoresPerMemcpy · 3ae2b79a

Evan Cheng authored Jan 06, 2011

etc. takes an option OptSize. If OptSize is true, it would return
the inline limit for functions with attribute OptSize.

llvm-svn: 122952

3ae2b79a

Dec 23, 2010

X86: Lower a select directly to a setcc_carry if possible. · 6020ed9d

Benjamin Kramer authored Dec 22, 2010

  int test(unsigned long a, unsigned long b) { return -(a < b); }
compiles to
  _test:                              ## @test
    cmpq  %rsi, %rdi                  ## encoding: [0x48,0x39,0xf7]
    sbbl  %eax, %eax                  ## encoding: [0x19,0xc0]
    ret                               ## encoding: [0xc3]
instead of
  _test:                              ## @test
    xorl  %ecx, %ecx                  ## encoding: [0x31,0xc9]
    cmpq  %rsi, %rdi                  ## encoding: [0x48,0x39,0xf7]
    movl  $-1, %eax                   ## encoding: [0xb8,0xff,0xff,0xff,0xff]
    cmovael %ecx, %eax                ## encoding: [0x0f,0x43,0xc1]
    ret                               ## encoding: [0xc3]

llvm-svn: 122451

6020ed9d

Dec 21, 2010

Add some x86 specific dagcombines for conditional increments. · f6ddc4a1

Benjamin Kramer authored Dec 21, 2010

(add Y, (sete  X, 0)) -> cmp X, 1; adc  0, Y
(add Y, (setne X, 0)) -> cmp X, 1; sbb -1, Y
(sub (sete  X, 0), Y) -> cmp X, 1; sbb  0, Y
(sub (setne X, 0), Y) -> cmp X, 1; adc -1, Y

for
  unsigned foo(unsigned a, unsigned b) {
    if (a == 0) b++;
    return b;
  }
we now get:
  foo:
    cmpl  $1, %edi
    movl  %esi, %eax
    adcl  $0, %eax
    ret
instead of:
  foo:
    testl %edi, %edi
    sete  %al
    movzbl  %al, %eax
    addl  %esi, %eax
    ret

llvm-svn: 122364

f6ddc4a1

rename MVT::Flag to MVT::Glue. "Flag" is a terrible name for · 3e5fbd74
Chris Lattner authored Dec 21, 2010
```
something that just glues two nodes together, even if it is
sometimes used for flags.

llvm-svn: 122310
```
3e5fbd74

Dec 20, 2010

Implement feedback from Bruno on making pblendvb an x86-specific ISD node in... · 4b9db07b

Nate Begeman authored Dec 20, 2010

Implement feedback from Bruno on making pblendvb an x86-specific ISD node in addition to being an intrinsic, and convert
lowering to use it. Hopefully the pattern fragment is doing the right thing with XMM0, looks correct in testing.

llvm-svn: 122277

4b9db07b

now that addc/adde are gone, "ADDC" in the X86 backend uses EFLAGS results, · 5c00d416

Chris Lattner authored Dec 20, 2010

the same as setcc.  Optimize ADDC(0,0,FLAGS) -> SET_CARRY(FLAGS).  This is
a step towards finishing off PR5443.  In the testcase in that bug we now  get:

	movq	%rdi, %rax
	addq	%rsi, %rax
	sbbq	%rcx, %rcx
	testb	$1, %cl
	setne	%dl
	ret

instead of:

	movq	%rdi, %rax
	addq	%rsi, %rax
	movl	$0, %ecx
	adcq	$0, %rcx
	testq	%rcx, %rcx
	setne	%dl
	ret

llvm-svn: 122219

5c00d416

use for loop over types. · 9c26d271
Chris Lattner authored Dec 20, 2010
```
llvm-svn: 122214
```
9c26d271

Change the X86 backend to stop using the evil ADDC/ADDE/SUBC/SUBE nodes (which · 846c20d4

Chris Lattner authored Dec 20, 2010

their carry depenedencies with MVT::Flag operands) and use clean and beautiful
EFLAGS dependences instead.

We do this by changing the modelling of SBB/ADC to have EFLAGS input and outputs
(which is what requires the previous scheduler change) and change X86 ISelLowering
to custom lower ADDC and friends down to X86ISD::ADD/ADC/SUB/SBB nodes.

With the previous series of changes, this causes no changes in the testsuite, woo.

llvm-svn: 122213

846c20d4

Prevents PerformShuffleCombine from creating a node with an illegal type after legalize types · 1064992c
Mon P Wang authored Dec 19, 2010
```
has run, e.g., prevent creating an i64 node from a v2i64 when i64 is not a legal type.

llvm-svn: 122206
```
1064992c

Dec 19, 2010

improve the setcc -> setcc_carry optimization to happen more · 9edf3f50

Chris Lattner authored Dec 19, 2010

consistently by moving it out of lowering into dag combine.

Add some missing patterns for matching away extended versions of setcc_c.

llvm-svn: 122201

9edf3f50

simplify some code to just reuse a setcc if we can instead of · 6dddab2f
Chris Lattner authored Dec 19, 2010
```
going through the CSE maps to get it.

llvm-svn: 122196
```
6dddab2f
now that generic vector types aren't selected onto MMX operations, · c37bb023
Chris Lattner authored Dec 19, 2010
```
we don't need -disable-mmx anymore.

llvm-svn: 122189
```
c37bb023
reduce copy/paste programming with the power of for loops. · ae756e19
Chris Lattner authored Dec 19, 2010
```
llvm-svn: 122187
```
ae756e19

X86 supports i8/i16 overflow ops (except i8 multiplies), we should · 1e8c032a

Chris Lattner authored Dec 19, 2010

generate them.  

Now we compile:

define zeroext i8 @X(i8 signext %a, i8 signext %b) nounwind ssp {
entry:
  %0 = tail call %0 @llvm.sadd.with.overflow.i8(i8 %a, i8 %b)
  %cmp = extractvalue %0 %0, 1
  br i1 %cmp, label %if.then, label %if.end

into:

_X:                                     ## @X
## BB#0:                                ## %entry
	subl	$12, %esp
	movb	16(%esp), %al
	addb	20(%esp), %al
	jo	LBB0_2

Before we were generating:

_X:                                     ## @X
## BB#0:                                ## %entry
	pushl	%ebp
	movl	%esp, %ebp
	subl	$8, %esp
	movb	12(%ebp), %al
	testb	%al, %al
	setge	%cl
	movb	8(%ebp), %dl
	testb	%dl, %dl
	setge	%ah
	cmpb	%cl, %ah
	sete	%cl
	addb	%al, %dl
	testb	%dl, %dl
	setge	%al
	cmpb	%al, %ah
	setne	%al
	andb	%cl, %al
	testb	%al, %al
	jne	LBB0_2

llvm-svn: 122186

1e8c032a

Dec 17, 2010
- Add support for matching psign & plendvb to the x86 target · 97b72c99
  Nate Begeman authored Dec 17, 2010
```
Remove unnecessary pandn patterns, 'vnot' patfrag looks through bitcasts

llvm-svn: 122098
```
  97b72c99
Dec 10, 2010

Formalize the notion that AVX and SSE are non-overlapping extensions from the... · 8b08f523

Nate Begeman authored Dec 10, 2010

Formalize the notion that AVX and SSE are non-overlapping extensions from the compiler's point of view.  Per email discussion, we either want to always use VEX-prefixed instructions or never use them, and are taking "HasAVX" to mean "Always use VEX".  Passing -mattr=-avx,+sse42 should serve to restore legacy SSE support when desirable.

llvm-svn: 121439

8b08f523

Dec 09, 2010
- Rewrite the darwin tlv support to use a chain and return to copying · a8aaaee3
  Eric Christopher authored Dec 09, 2010
```
the output to the correct register. Fixes a hidden problem uncovered
by the last patch where we'd try to DAG combine our MVT::Other node
oddly.

llvm-svn: 121358
```
  a8aaaee3
- Stop confusing people, it's not really a chain, or a tumor. · 87830740
  Eric Christopher authored Dec 09, 2010
```
llvm-svn: 121340
```
  87830740
- Remove extraneous copy from DAG conversion for darwin tls. This was · d84970ae
  Eric Christopher authored Dec 09, 2010
```
popping up at O0 when it wasn't folded and the fast allocator would
complain.

llvm-svn: 121330
```
  d84970ae
Dec 05, 2010

Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags · 68861717

Chris Lattner authored Dec 05, 2010

result.  This allows us to compile:

void *test12(long count) {
      return new int[count];
}

into:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	movq	$-1, %rdi
	cmovnoq	%rax, %rdi
	jmp	__Znam                  ## TAILCALL

instead of:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	seto	%cl
	testb	%cl, %cl
	movq	$-1, %rdi
	cmoveq	%rax, %rdi
	jmp	__Znam

Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
which would eliminate the need for the 'movq %rdi, %rax'.

llvm-svn: 120936

68861717

it turns out that when ".with.overflow" intrinsics were added to the X86 · 364bb0a0

Chris Lattner authored Dec 05, 2010

backend that they were all implemented except umul.  This one fell back
to the default implementation that did a hi/lo multiply and compared the
top.  Fix this to check the overflow flag that the 'mul' instruction
sets, so we can avoid an explicit test.  Now we compile:

void *func(long count) {
      return new int[count];
}

into:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
	testb	%cl, %cl                ## encoding: [0x84,0xc9]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

instead of:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

Other than the silly seto+test, this is using the o bit directly, so it's going in the right
direction.

llvm-svn: 120935

364bb0a0

generalize the previous check to handle -1 on either side of the · 116580a1

Chris Lattner authored Dec 05, 2010

select, inserting a not to compensate.  Add a missing isZero check
that I lost somehow.

This improves codegen of:

void *func(long count) {
      return new int[count];
}

from:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

to:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

llvm-svn: 120932

116580a1