- Apr 21, 2011
-
-
Evan Cheng authored
llvm-svn: 129884
-
Jakob Stoklund Olesen authored
llvm-svn: 129883
-
Jakob Stoklund Olesen authored
TII::isTriviallyReMaterializable() shouldn't depend on any properties of the register being defined by the instruction. Rematerialization is going to create a new virtual register anyway. llvm-svn: 129882
-
- Apr 20, 2011
-
-
rdar://problem/9184212Cameron Zwarich authored
generated by llvm-gcc, since llvm-gcc uses 2 i64s for passing a 4 x float vector on ARM rather than an i64 array like Clang. llvm-svn: 129878
-
Cameron Zwarich authored
delete it. llvm-svn: 129877
-
Cameron Zwarich authored
more cases. llvm-svn: 129876
-
Jakob Stoklund Olesen authored
On the x86-64 and thumb2 targets, some registers are more expensive to encode than others in the same register class. Add a CostPerUse field to the TableGen register description, and make it available from TRI->getCostPerUse. This represents the cost of a REX prefix or a 32-bit instruction encoding required by choosing a high register. Teach the greedy register allocator to prefer cheap registers for busy live ranges (as indicated by spill weight). llvm-svn: 129864
-
-
-
Daniel Dunbar authored
llvm-svn: 129852
-
Justin Holewinski authored
used by Clang. To help Clang integration, the PTX target has been split into two targets: ptx32 and ptx64, depending on the desired pointer size. - Add GCCBuiltin class to all intrinsics - Split PTX target into ptx32 and ptx64 llvm-svn: 129851
-
Rafael Espindola authored
llvm-svn: 129850
-
Che-Liang Chiou authored
Patched by Dan Bailey llvm-svn: 129848
-
Che-Liang Chiou authored
Patched by Dan Bailey llvm-svn: 129847
-
Che-Liang Chiou authored
Patched by Dan Bailey llvm-svn: 129846
-
Nick Lewycky authored
llvm is built with unsigned chars where an immediate such as 0xff would be zero extended to 64-bits, turning "cmp $0xff,%eax" into "cmp $0xffffffffffffffff,%eax". llvm-svn: 129845
-
Rafael Espindola authored
llvm-svn: 129844
-
Eric Christopher authored
manually and pass all (now) 4 arguments to the mul libcall. Add a new ExpandLibCall for just this (copied gratuitously from type legalization). Fixes rdar://9292577 llvm-svn: 129842
-
Sean Callanan authored
MCInst operands for ARM. This allows it to be more tolerant of malformed MCInsts or incorrect instruction metadata. llvm-svn: 129840
-
Daniel Dunbar authored
triple component. llvm-svn: 129838
-
Johnny Chen authored
llvm-svn: 129837
-
Daniel Dunbar authored
instead. llvm-svn: 129836
-
Daniel Dunbar authored
Triple::OSX once Clang has moved. llvm-svn: 129833
-
- Apr 19, 2011
-
-
Daniel Dunbar authored
predicates. llvm-svn: 129816
-
Daniel Dunbar authored
llvm-svn: 129815
-
Daniel Dunbar authored
enumeration values. llvm-svn: 129814
-
Daniel Dunbar authored
llvm-svn: 129813
-
Daniel Dunbar authored
llvm-svn: 129812
-
Daniel Dunbar authored
llvm-svn: 129811
-
Daniel Dunbar authored
llvm-svn: 129810
-
Daniel Dunbar authored
llvm-svn: 129809
-
Daniel Dunbar authored
llvm-svn: 129803
-
Daniel Dunbar authored
- There is a minor semantic change here (evidenced by the test change) for Darwin triples that have no version component. I debated changing the default behavior of isOSVersionLT, but decided it made more sense for triples to be explicit. llvm-svn: 129802
-
Daniel Dunbar authored
llvm-svn: 129799
-
Daniel Dunbar authored
llvm-svn: 129798
-
-
Eric Christopher authored
llvm-svn: 129781
-
rdar://8659675Bob Wilson authored
Making use of VFP / NEON floating point multiply-accumulate / subtraction is difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Enable these fp vmlx codegen changes for Cortex-A9. llvm-svn: 129775
-
-