- Dec 05, 2010
-
-
Cameron Zwarich authored
StrongPHIElimination. llvm-svn: 120961
-
Evan Cheng authored
difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Work in progress, only A+B are enabled. llvm-svn: 120960
-
Cameron Zwarich authored
llvm-svn: 120959
-
Frits van Bommel authored
Clarify some of the differences between indexing with getelementptr and indexing with insertvalue/extractvalue. llvm-svn: 120957
-
Frits van Bommel authored
Also add asserts that the indices are valid in InsertValueInst::init(). ExtractValueInst already asserts when constructed with invalid indices. llvm-svn: 120956
-
Greg Clayton authored
token. llvm-svn: 120954
-
Cameron Zwarich authored
PHIElimination.h. llvm-svn: 120953
-
Cameron Zwarich authored
time, this method existed, but now PHIElimination uses the method of the same name on MachineBasicBlock. llvm-svn: 120952
-
Cameron Zwarich authored
function so that it can be shared with StrongPHIElimination. llvm-svn: 120951
-
Greg Clayton authored
llvm-svn: 120949
-
Greg Clayton authored
have event data. llvm-svn: 120948
-
Greg Clayton authored
a ProcessEventData so clients can get the process from these events. llvm-svn: 120947
-
Frits van Bommel authored
Should have no functional change other than the order of two transformations that are mutually-exclusive and the exact formatting of debug output. Internally, it now stores the ConstantInt*s as Constant*s, and actual undef values instead of nulls. llvm-svn: 120946
-
Frits van Bommel authored
llvm-svn: 120945
-
Frits van Bommel authored
(indirectbr (select cond, blockaddress(@fn, BlockA), blockaddress(@fn, BlockB))) into (br cond, BlockA, BlockB). llvm-svn: 120943
-
Chris Lattner authored
result. This allows us to compile: void *test12(long count) { return new int[count]; } into: test12: movl $4, %ecx movq %rdi, %rax mulq %rcx movq $-1, %rdi cmovnoq %rax, %rdi jmp __Znam ## TAILCALL instead of: test12: movl $4, %ecx movq %rdi, %rax mulq %rcx seto %cl testb %cl, %cl movq $-1, %rdi cmoveq %rax, %rdi jmp __Znam Of course it would be even better if the regalloc inverted the cmov to 'cmovoq', which would eliminate the need for the 'movq %rdi, %rax'. llvm-svn: 120936
-
Chris Lattner authored
backend that they were all implemented except umul. This one fell back to the default implementation that did a hi/lo multiply and compared the top. Fix this to check the overflow flag that the 'mul' instruction sets, so we can avoid an explicit test. Now we compile: void *func(long count) { return new int[count]; } into: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] seto %cl ## encoding: [0x0f,0x90,0xc1] testb %cl, %cl ## encoding: [0x84,0xc9] movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8] jmp __Znam ## TAILCALL instead of: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2] movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8] jmp __Znam ## TAILCALL Other than the silly seto+test, this is using the o bit directly, so it's going in the right direction. llvm-svn: 120935
-
Chris Lattner authored
llvm-svn: 120933
-
Chris Lattner authored
select, inserting a not to compensate. Add a missing isZero check that I lost somehow. This improves codegen of: void *func(long count) { return new int[count]; } from: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2] movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8] jmp __Znam ## TAILCALL ## encoding: [0xeb,A] to: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] cmpq $1, %rdx ## encoding: [0x48,0x83,0xfa,0x01] sbbq %rdi, %rdi ## encoding: [0x48,0x19,0xff] notq %rdi ## encoding: [0x48,0xf7,0xd7] orq %rax, %rdi ## encoding: [0x48,0x09,0xc7] jmp __Znam ## TAILCALL ## encoding: [0xeb,A] llvm-svn: 120932
-
John McCall authored
Fix a bug in the emission of complex compound assignment l-values. Introduce a method to emit an expression whose value isn't relevant. Make that method evaluate its operand as an l-value if it is one. Fixes our volatile compliance in C++. llvm-svn: 120931
-
Chris Lattner authored
llvm-svn: 120930
-
Chris Lattner authored
1. generalize (select (x == 0), -1, 0) -> (sign_bit (x - 1)) to: (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y 2. Handle the identical pattern that happens with !=: (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y cmov is often high latency and can't fold immediates or memory operands. For example for (x == 0) ? -1 : 1, before we got: < testb %sil, %sil < movl $-1, %ecx < movl $1, %eax < cmovel %ecx, %eax now we get: > cmpb $1, %sil > sbbl %eax, %eax > orl $1, %eax llvm-svn: 120929
-
Chris Lattner authored
llvm-svn: 120928
-
Chris Lattner authored
llvm-svn: 120927
-
Chris Lattner authored
llvm-svn: 120926
-
Anders Carlsson authored
llvm-svn: 120925
-
Anders Carlsson authored
llvm-svn: 120924
-
Bill Wendling authored
llvm-svn: 120923
-
Anders Carlsson authored
llvm-svn: 120922
-
- Dec 04, 2010
-
-
Rafael Espindola authored
valid. Addresses will not change. llvm-svn: 120921
-
Rafael Espindola authored
having to evaluate the expression again when writing. llvm-svn: 120920
-
Fariborz Jahanian authored
AST. llvm-svn: 120919
-
Cameron Zwarich authored
llvm-svn: 120918
-
Benjamin Kramer authored
- Also adds a new POPCNT subtarget feature that is currently enabled if the target supports SSE4.2 (nehalem) or SSE4A (barcelona). llvm-svn: 120917
-
Bill Wendling authored
may determine that they cannot be used uninitialized. But that might be a bit too much for the compiler to determine. llvm-svn: 120916
-
Howard Hinnant authored
llvm-svn: 120915
-
Howard Hinnant authored
llvm-svn: 120914
-
Michael J. Spencer authored
llvm-svn: 120913
-
Benjamin Kramer authored
llvm-svn: 120912
-
Benjamin Kramer authored
llvm-svn: 120911
-