Commits · c7223a3e377aadfa9263599ab2b8065b8bc1f1af · Roger Ferrer / llvm-epi-0.8

Dec 05, 2010

Some cleanup before I start committing some incremental progress on · c7223a3e
Cameron Zwarich authored Dec 05, 2010
```
StrongPHIElimination.

llvm-svn: 120961
```
c7223a3e

Making use of VFP / NEON floating point multiply-accumulate / subtraction is · 62c7b5bf

Evan Cheng authored Dec 05, 2010

difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Work in progress, only A+B are enabled.

llvm-svn: 120960

62c7b5bf

Remove the PHIElimination.h header, as it is no longer needed. · a3fb8cb3
Cameron Zwarich authored Dec 05, 2010
```
llvm-svn: 120959
```
a3fb8cb3

Clarify some of the differences between indexing with getelementptr and... · 7cf63ace

Frits van Bommel authored Dec 05, 2010

Clarify some of the differences between indexing with getelementptr and indexing with insertvalue/extractvalue.

llvm-svn: 120957

7cf63ace

Fix PR 4170 by having ExtractValueInst::getIndexedType() reject out-of-bounds indexing. · 16ebe77b

Frits van Bommel authored Dec 05, 2010

Also add asserts that the indices are valid in InsertValueInst::init(). ExtractValueInst already asserts when constructed with invalid indices.

llvm-svn: 120956

16ebe77b

Fixed an issue where SBProcess::LoadImage(...) was not returning the image · c5f57830
Greg Clayton authored Dec 05, 2010
```
token.

llvm-svn: 120954
```
c5f57830
I forgot to actually remove the FindCopyInsertPoint() declaration from · 6766c420
Cameron Zwarich authored Dec 05, 2010
```
PHIElimination.h.

llvm-svn: 120953
```
6766c420
Remove the SplitCriticalEdge() method declaration from PHIElimination.h. At one · 8d169558
Cameron Zwarich authored Dec 05, 2010
```
time, this method existed, but now PHIElimination uses the method of the same
name on MachineBasicBlock.

llvm-svn: 120952
```
8d169558
Move the FindCopyInsertPoint method of PHIElimination to a new standalone · da592a9e
Cameron Zwarich authored Dec 05, 2010
```
function so that it can be shared with StrongPHIElimination.

llvm-svn: 120951
```
da592a9e
Added "void SBBroadcaster::Clear ();" method to SBBroadcaster. · 1c2f2838
Greg Clayton authored Dec 05, 2010
```
llvm-svn: 120949
```
1c2f2838
Fixed a crasher when trying to get event data flavors on events that don't · 920c696c
Greg Clayton authored Dec 05, 2010
```
have event data.

llvm-svn: 120948
```
920c696c
Make sure that STDOUT and STDERR events in lldb_private::Process carry along · a9ff3061
Greg Clayton authored Dec 05, 2010
```
a ProcessEventData so clients can get the process from these events.

llvm-svn: 120947
```
a9ff3061

Refactor jump threading. · 76244867

Frits van Bommel authored Dec 05, 2010

Should have no functional change other than the order of two transformations that are mutually-exclusive and the exact formatting of debug output.
Internally, it now stores the ConstantInt*s as Constant*s, and actual undef values instead of nulls.

llvm-svn: 120946

76244867

Remove trailing whitespace. · 5e75ef4a
Frits van Bommel authored Dec 05, 2010
```
llvm-svn: 120945
```
5e75ef4a

Teach SimplifyCFG to turn · 8fb69ee8

Frits van Bommel authored Dec 05, 2010

  (indirectbr (select cond, blockaddress(@fn, BlockA),
                            blockaddress(@fn, BlockB)))
into
  (br cond, BlockA, BlockB).

llvm-svn: 120943

8fb69ee8

Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags · 68861717

Chris Lattner authored Dec 05, 2010

result.  This allows us to compile:

void *test12(long count) {
      return new int[count];
}

into:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	movq	$-1, %rdi
	cmovnoq	%rax, %rdi
	jmp	__Znam                  ## TAILCALL

instead of:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	seto	%cl
	testb	%cl, %cl
	movq	$-1, %rdi
	cmoveq	%rax, %rdi
	jmp	__Znam

Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
which would eliminate the need for the 'movq %rdi, %rax'.

llvm-svn: 120936

68861717

it turns out that when ".with.overflow" intrinsics were added to the X86 · 364bb0a0

Chris Lattner authored Dec 05, 2010

backend that they were all implemented except umul.  This one fell back
to the default implementation that did a hi/lo multiply and compared the
top.  Fix this to check the overflow flag that the 'mul' instruction
sets, so we can avoid an explicit test.  Now we compile:

void *func(long count) {
      return new int[count];
}

into:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
	testb	%cl, %cl                ## encoding: [0x84,0xc9]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

instead of:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

Other than the silly seto+test, this is using the o bit directly, so it's going in the right
direction.

llvm-svn: 120935

364bb0a0

fix the rest of the linux miscompares :) · 183ddd8e
Chris Lattner authored Dec 05, 2010
```
llvm-svn: 120933
```
183ddd8e

generalize the previous check to handle -1 on either side of the · 116580a1

Chris Lattner authored Dec 05, 2010

select, inserting a not to compensate.  Add a missing isZero check
that I lost somehow.

This improves codegen of:

void *func(long count) {
      return new int[count];
}

from:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

to:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

llvm-svn: 120932

116580a1

Fix a bug in the emission of __real/__imag l-values on scalar operands. · a2342eb8

John McCall authored Dec 05, 2010

Fix a bug in the emission of complex compound assignment l-values.
Introduce a method to emit an expression whose value isn't relevant.
Make that method evaluate its operand as an l-value if it is one.
Fixes our volatile compliance in C++.

llvm-svn: 120931

a2342eb8

relax this to handle linux defaulting to -static. · 77a11c61
Chris Lattner authored Dec 05, 2010
```
llvm-svn: 120930
```
77a11c61

Improve an integer select optimization in two ways: · 342e6ea5

Chris Lattner authored Dec 05, 2010

1. generalize 
    (select (x == 0), -1, 0) -> (sign_bit (x - 1))
to:
    (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y

2. Handle the identical pattern that happens with !=:
   (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y

cmov is often high latency and can't fold immediates or
memory operands.  For example for (x == 0) ? -1 : 1, before 
we got:

< 	testb	%sil, %sil
< 	movl	$-1, %ecx
< 	movl	$1, %eax
< 	cmovel	%ecx, %eax

now we get:

> 	cmpb	$1, %sil
> 	sbbl	%eax, %eax
> 	orl	$1, %eax

llvm-svn: 120929

342e6ea5

merge some tests into select.ll and make them more specific. · 0523388d
Chris Lattner authored Dec 05, 2010
```
llvm-svn: 120928
```
0523388d
rename test · b89b6f17
Chris Lattner authored Dec 05, 2010
```
llvm-svn: 120927
```
b89b6f17
remove two tests that aren't really testing anything. · d4f8c964
Chris Lattner authored Dec 05, 2010
```
llvm-svn: 120926
```
d4f8c964
Put each test in class-layout.cpp into a separate namespace. · 0febb8ac
Anders Carlsson authored Dec 05, 2010
```
llvm-svn: 120925
```
0febb8ac
Add a LayoutBase member function. No functionality change. · a518b2a5
Anders Carlsson authored Dec 04, 2010
```
llvm-svn: 120924
```
a518b2a5
Initialize HasPOPCNT. · 2bce78e8
Bill Wendling authored Dec 04, 2010
```
llvm-svn: 120923
```
2bce78e8
Replace calls to AppendBytes with calls to AppendPadding when the bytes appended are padding. · d74cad80
Anders Carlsson authored Dec 04, 2010
```
llvm-svn: 120922
```
d74cad80

Dec 04, 2010
- Once the layout is done we don't need to keep updating which fragments are · 8867390c
  Rafael Espindola authored Dec 04, 2010
```
valid. Addresses will not change.

llvm-svn: 120921
```
  8867390c
- Remember the contents of leb and dwarfline fragments when relaxing. This avoids · 99e026db
  Rafael Espindola authored Dec 04, 2010
```
having to evaluate the expression again when writing.

llvm-svn: 120920
```
  99e026db
- Fix rewriter to match recent changes in property ref · 83e7d5a9
  Fariborz Jahanian authored Dec 04, 2010
```
AST.

llvm-svn: 120919
```
  83e7d5a9
- Remove PHIElimination's private copy of SkipPHIsAndLabels. · fbd47dcc
  Cameron Zwarich authored Dec 04, 2010
```
llvm-svn: 120918
```
  fbd47dcc
- Add patterns for the x86 popcnt instruction. · 2f489236
  Benjamin Kramer authored Dec 04, 2010
```
- Also adds a new POPCNT subtarget feature that is currently enabled if the target
  supports SSE4.2 (nehalem) or SSE4A (barcelona).

llvm-svn: 120917
```
  2f489236
- Silence 'may be used uninitialized in this function' warnings. Static analysis · 3336f748
  Bill Wendling authored Dec 04, 2010
```
may determine that they cannot be used uninitialized. But that might be a bit
too much for the compiler to determine.

llvm-svn: 120916
```
  3336f748
- oops, forgot std:: · 75357bcd
  Howard Hinnant authored Dec 04, 2010
```
llvm-svn: 120915
```
  75357bcd
- Fix up uses of new/terminate/unexpected handlers to use the new getters. · 816cb897
  Howard Hinnant authored Dec 04, 2010
```
llvm-svn: 120914
```
  816cb897
- Support/PathV2: Remove redundant calls to make_error_code. · 66a1f86f
  Michael J. Spencer authored Dec 04, 2010
```
llvm-svn: 120913
```
  66a1f86f
- APInt: microoptimize a few methods. · f1a04edb
  Benjamin Kramer authored Dec 04, 2010
```
llvm-svn: 120912
```
  f1a04edb
- Simplify APInt::getAllOnesValue. · 5d75f0bd
  Benjamin Kramer authored Dec 04, 2010
```
llvm-svn: 120911
```
  5d75f0bd