- Apr 22, 2011
-
-
Benjamin Kramer authored
On x86 this allows to fold a load into the cmp, greatly reducing register pressure. movzbl (%rdi), %eax cmpl $47, %eax -> cmpb $47, (%rdi) This shaves 8k off gcc.o on i386. I'll leave applying the patch in README.txt to Chris :) llvm-svn: 130005
-
Devang Patel authored
llvm-svn: 129995
-
Benjamin Kramer authored
X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X & (C2 >> C1)) & C1. (Part of PR5039) This tends to happen a lot with bitfield code generated by clang. A simple example for x86_64 is uint64_t foo(uint64_t x) { return (x&1) << 42; } which used to compile into bloated code: shlq $42, %rdi ## encoding: [0x48,0xc1,0xe7,0x2a] movabsq $4398046511104, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x00,0x04,0x00,0x00] andq %rdi, %rax ## encoding: [0x48,0x21,0xf8] ret ## encoding: [0xc3] with this patch we can fold the immediate into the and: andq $1, %rdi ## encoding: [0x48,0x83,0xe7,0x01] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] shlq $42, %rax ## encoding: [0x48,0xc1,0xe0,0x2a] ret ## encoding: [0xc3] It's possible to save another byte by using 'andl' instead of 'andq' but I currently see no way of doing that without making this code even more complicated. See the TODOs in the code. llvm-svn: 129990
-
Evan Cheng authored
add <rd>, sp, #<imm8> ldr <rd>, [sp, #<imm8>] When the offset from sp is multiple of 4 and in range of 0-1020. This saves code size by utilizing 16-bit instructions. rdar://9321541 llvm-svn: 129971
-
Rafael Espindola authored
X8664_ELFTargetObjectFile::getFDEEncoding to match reality. llvm-svn: 129959
-
Rafael Espindola authored
llvm-svn: 129955
-
Devang Patel authored
llvm-svn: 129952
-
Devang Patel authored
llvm-svn: 129947
-
- Apr 21, 2011
-
-
Devang Patel authored
llvm-svn: 129922
-
Justin Holewinski authored
llvm-svn: 129913
-
Che-Liang Chiou authored
This patch depends on the prior fix r129908 that changes to use std::find, rather than std::binary_search, on unordered array. Patch by Dan Bailey llvm-svn: 129909
-
Che-Liang Chiou authored
llvm-svn: 129908
-
Evan Cheng authored
llvm-svn: 129884
-
- Apr 20, 2011
-
-
Jakob Stoklund Olesen authored
On the x86-64 and thumb2 targets, some registers are more expensive to encode than others in the same register class. Add a CostPerUse field to the TableGen register description, and make it available from TRI->getCostPerUse. This represents the cost of a REX prefix or a 32-bit instruction encoding required by choosing a high register. Teach the greedy register allocator to prefer cheap registers for busy live ranges (as indicated by spill weight). llvm-svn: 129864
-
-
-
Justin Holewinski authored
used by Clang. To help Clang integration, the PTX target has been split into two targets: ptx32 and ptx64, depending on the desired pointer size. - Add GCCBuiltin class to all intrinsics - Split PTX target into ptx32 and ptx64 llvm-svn: 129851
-
Che-Liang Chiou authored
Patched by Dan Bailey llvm-svn: 129848
-
Che-Liang Chiou authored
Patched by Dan Bailey llvm-svn: 129847
-
Che-Liang Chiou authored
Patched by Dan Bailey llvm-svn: 129846
-
Nick Lewycky authored
llvm is built with unsigned chars where an immediate such as 0xff would be zero extended to 64-bits, turning "cmp $0xff,%eax" into "cmp $0xffffffffffffffff,%eax". llvm-svn: 129845
-
Rafael Espindola authored
llvm-svn: 129844
-
Daniel Dunbar authored
triple component. llvm-svn: 129838
-
Johnny Chen authored
llvm-svn: 129837
- Apr 19, 2011
-
-
Daniel Dunbar authored
predicates. llvm-svn: 129816
-
Daniel Dunbar authored
llvm-svn: 129813
-
Daniel Dunbar authored
llvm-svn: 129812
-
Daniel Dunbar authored
llvm-svn: 129811
-
Daniel Dunbar authored
llvm-svn: 129810
-
Daniel Dunbar authored
llvm-svn: 129809
-
Daniel Dunbar authored
llvm-svn: 129803
-
Eric Christopher authored
llvm-svn: 129781
-
rdar://8659675Bob Wilson authored
Making use of VFP / NEON floating point multiply-accumulate / subtraction is difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Enable these fp vmlx codegen changes for Cortex-A9. llvm-svn: 129775
-
-
Bob Wilson authored
(and add false dependency) when it isn't dependent on last CPSR defining instruction. rdar://8928208 llvm-svn: 129773
-
Bob Wilson authored
Add a avoidWriteAfterWrite() target hook to identify register classes that suffer from write-after-write hazards. For those register classes, try to avoid writing the same register in two consecutive instructions. This is currently disabled by default. We should not spill to avoid hazards! The command line flag -avoid-waw-hazard can be used to enable waw avoidance. llvm-svn: 129772
-
Bob Wilson authored
pipelines, at least on Cortex-A9. llvm-svn: 129771
-
Bob Wilson authored
llvm-svn: 129770
-
Eli Friedman authored
llvm-svn: 129765
-