- Apr 20, 2011
-
-
Rafael Espindola authored
llvm-svn: 129844
-
Daniel Dunbar authored
triple component. llvm-svn: 129838
-
Johnny Chen authored
llvm-svn: 129837
-
- Apr 19, 2011
-
-
Daniel Dunbar authored
predicates. llvm-svn: 129816
-
Daniel Dunbar authored
llvm-svn: 129813
-
Daniel Dunbar authored
llvm-svn: 129812
-
Daniel Dunbar authored
llvm-svn: 129811
-
Daniel Dunbar authored
llvm-svn: 129810
-
Daniel Dunbar authored
llvm-svn: 129809
-
Daniel Dunbar authored
llvm-svn: 129803
-
Eric Christopher authored
llvm-svn: 129781
-
rdar://8659675Bob Wilson authored
Making use of VFP / NEON floating point multiply-accumulate / subtraction is difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Enable these fp vmlx codegen changes for Cortex-A9. llvm-svn: 129775
-
-
Bob Wilson authored
(and add false dependency) when it isn't dependent on last CPSR defining instruction. rdar://8928208 llvm-svn: 129773
-
Bob Wilson authored
Add a avoidWriteAfterWrite() target hook to identify register classes that suffer from write-after-write hazards. For those register classes, try to avoid writing the same register in two consecutive instructions. This is currently disabled by default. We should not spill to avoid hazards! The command line flag -avoid-waw-hazard can be used to enable waw avoidance. llvm-svn: 129772
-
Bob Wilson authored
pipelines, at least on Cortex-A9. llvm-svn: 129771
-
Bob Wilson authored
llvm-svn: 129770
-
Eli Friedman authored
llvm-svn: 129765
-
Chris Lattner authored
en-mass for C++ PODs. On my c++ test file, this cuts the fast isel rejects by 10x and shrinks the generated .s file by 5% llvm-svn: 129755
-
Chris Lattner authored
llvm-svn: 129753
-
Chris Lattner authored
when they are a truncate from something else. This eliminates fully half of all the fastisel rejections on a test c++ file I'm working with, which should make a substantial improvement for -O0 compile of c++ code. This fixed rdar://9297003 - fast isel bails out on all functions taking bools llvm-svn: 129752
-
Chris Lattner authored
Before we would bail out on i1 arguments all together, now we just bail on non-constant ones. Also, we used to emit extraneous code. e.g. test12 was: movb $0, %al movzbl %al, %edi callq _test12 and test13 was: movb $0, %al xorl %edi, %edi movb %al, 7(%rsp) callq _test13f Now we get: movl $0, %edi callq _test12 and: movl $0, %edi callq _test13f llvm-svn: 129751
-
Chris Lattner authored
testb $1, %al je LBB0_2 ## BB#1: ## %if.then movb $0, %al instead of: testb $1, %al jne LBB0_1 jmp LBB0_2 LBB0_1: ## %if.then movb $0, %al how 'bout that. llvm-svn: 129749
-
rdar://9297006Chris Lattner authored
a common cause of fast isel rejects on c++ code. llvm-svn: 129748
-
Evan Cheng authored
is, it assumes addresses are 64-bit aligned (which should be the more common case). If the alignment is found not to be aligned, then getOperandLatency() would adjust the operand latency computation by one to compensate for it. rdar://9294833 llvm-svn: 129742
-
Evan Cheng authored
llvm-svn: 129738
-
- Apr 18, 2011
-
-
Jim Grosbach authored
llvm-svn: 129723
-
Eric Christopher authored
true on success and false on failure. Update callers. llvm-svn: 129722
-
Sean Callanan authored
superclass variable is instantiated properly. llvm-svn: 129713
-
Chris Lattner authored
the generated FastISel. X86 doesn't need to generate code to match ADD16ri8 since ADD16ri will do just fine. This is a small codesize win in the generated instruction selector. llvm-svn: 129692
-
Chris Lattner authored
simplifying them and exposing more information to tblgen. It would be nice if other target authors adopted this as well, particularly arm since it has fastisel. llvm-svn: 129676
-
Chris Lattner authored
kind of predicate: one that is specific to imm nodes. The predicate function specified here just checks an int64_t directly instead of messing around with SDNode's. The virtue of this is that it means that fastisel and other things can reason about these predicates. llvm-svn: 129675
-
- Apr 17, 2011
-
-
Chris Lattner authored
structure and fix some fixmes. We now have a TreePredicateFn class that handles all of the decoding of these things. This is an internal cleanup that has no impact on the code generated by tblgen. llvm-svn: 129670
-
Chris Lattner authored
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts 3. teach tblgen to handle shift immediates that are different sizes than the shifted operands, eliminating some code from the X86 fast isel backend. 4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function instead of FastEmit_ri to simplify code. llvm-svn: 129666
-
Chris Lattner authored
when we have a global variable base an an index. Instead, just give up on folding the global variable. Before we'd geenrate: _test: ## @test ## BB#0: movq _rtx_length@GOTPCREL(%rip), %rax leaq (%rax), %rax addq %rdi, %rax movzbl (%rax), %eax ret now we generate: _test: ## @test ## BB#0: movq _rtx_length@GOTPCREL(%rip), %rax movzbl (%rax,%rdi), %eax ret The difference is even more significant when there is a scale involved. This fixes rdar://9289558 - total fail with addr mode formation at -O0/x86-64 llvm-svn: 129664
-
Chris Lattner authored
less trivial things) into a dummy lea. Before we generated: _test: ## @test movq _G@GOTPCREL(%rip), %rax leaq (%rax), %rax ret now we produce: _test: ## @test movq _G@GOTPCREL(%rip), %rax ret This is part of rdar://9289558 llvm-svn: 129662
-
Chris Lattner authored
llvm-svn: 129661
-
Eli Friedman authored
llvm-svn: 129654
-
- Apr 16, 2011
-
-
Francois Pichet authored
llvm-svn: 129640
-
Rafael Espindola authored
error in foo.o; no .eh_frame_hdr table will be created. llvm-svn: 129635
-