- Apr 12, 2008
-
-
Arnold Schwaighofer authored
optimized x86-64 (and x86) calls so that they work (... at least for my test cases). Should fix the following problems: Problem 1: When i introduced the optimized handling of arguments for tail called functions (using a sequence of copyto/copyfrom virtual registers instead of always lowering to top of the stack) i did not handle byval arguments correctly e.g they did not work at all :). Problem 2: On x86-64 after the arguments of the tail called function are moved to their registers (which include ESI/RSI etc), tail call optimization performs byval lowering which causes xSI,xDI, xCX registers to be overwritten. This is handled in this patch by moving the arguments to virtual registers first and after the byval lowering the arguments are moved from those virtual registers back to RSI/RDI/RCX. llvm-svn: 49584
-
Dan Gohman authored
on any current target and aren't optimized in DAGCombiner. Instead of using intermediate nodes, expand the operations, choosing between simple loads/stores, target-specific code, and library calls, immediately. Previously, the code to emit optimized code for these operations was only used at initial SelectionDAG construction time; now it is used at all times. This fixes some cases where rep;movs was being used for small copies where simple loads/stores would be better. This also cleans up code that checks for alignments less than 4; let the targets make that decision instead of doing it in target-independent code. This allows x86 to use rep;movs in low-alignment cases. Also, this fixes a bug that resulted in the use of rep;stos for memsets of 0 with non-constant memory size when the alignment was at least 4. It's better to use the library in this case, which can be significantly faster when the size is large. This also preserves more SourceValue information when memory intrinsics are lowered into simple loads/stores. llvm-svn: 49572
-
- Apr 09, 2008
-
-
Dan Gohman authored
llvm-svn: 49446
-
- Mar 21, 2008
-
-
Chris Lattner authored
x86-64 return conventions correct, but was never enabled. We can now do the "right thing" with multiple return values. llvm-svn: 48635
-
- Mar 19, 2008
-
-
Arnold Schwaighofer authored
llvm-svn: 48545
-
- Mar 10, 2008
-
-
Chris Lattner authored
copyfromreg/copytoreg instead. llvm-svn: 48174
-
Scott Michel authored
return ValueType can depend its operands' ValueType. This is a cosmetic change, no functionality impacted. llvm-svn: 48145
-
- Mar 09, 2008
-
-
Chris Lattner authored
llvm-svn: 48094
-
Chris Lattner authored
isel'ing value preserving FP roundings from one fp stack reg to another into a noop, instead of stack traffic. llvm-svn: 48093
-
- Mar 05, 2008
-
-
Evan Cheng authored
For x86, if sse2 is available, it's not a good idea since cvtss2sd is slower than a movsd load and it prevents load folding. On x87, it's important to shrink fp constant since fldt is very expensive. llvm-svn: 47931
-
Andrew Lenharth authored
llvm-svn: 47929
-
- Mar 01, 2008
-
-
Andrew Lenharth authored
llvm-svn: 47798
-
- Feb 26, 2008
-
-
Arnold Schwaighofer authored
llvm-svn: 47635
-
Arnold Schwaighofer authored
calls. Before arguments that could overwrite each other were explicitly lowered to a stack slot, not giving the register allocator a chance to optimize. Now a sequence of copyto/copyfrom virtual registers ensures that arguments are loaded in (virtual) registers before they are lowered to the stack slot (and might overwrite each other). Also parameter stack slots are marked mutable for (potentially) tail calling functions. llvm-svn: 47593
-
- Feb 19, 2008
-
-
Evan Cheng authored
- When DAG combiner is folding a bit convert into a BUILD_VECTOR, it should check if it's essentially a SCALAR_TO_VECTOR. Avoid turning (v8i16) <10, u, u, u> to <10, 0, u, u, u, u, u, u>. Instead, simply convert it to a SCALAR_TO_VECTOR of the proper type. - X86 now normalize SCALAR_TO_VECTOR to (BIT_CONVERT (v4i32 SCALAR_TO_VECTOR)). Get rid of X86ISD::S2VEC. llvm-svn: 47290
-
- Feb 13, 2008
-
-
Dan Gohman authored
to pass the mask APInt by value, not by reference. llvm-svn: 47096
-
Dan Gohman authored
Add an overload that supports the uint64_t interface for use by clients that haven't been updated yet. llvm-svn: 47039
-
- Feb 11, 2008
-
-
Nate Begeman authored
Add some notes to the README. llvm-svn: 46949
-
- Feb 10, 2008
-
-
Dan Gohman authored
llvm-svn: 46930
-
- Jan 31, 2008
-
-
Dan Gohman authored
with the real FLT_ROUNDS (defined in <float.h>). llvm-svn: 46587
-
- Jan 30, 2008
-
-
Evan Cheng authored
Even though InsertAtEndOfBasicBlock is an ugly hack it still deserves a proper name. Rename it to EmitInstrWithCustomInserter since it does not necessarily insert instruction at the end. llvm-svn: 46562
-
- Jan 29, 2008
-
-
Evan Cheng authored
Work in progress. This patch *fixes* x86-64 calls which are modelled as StructRet but really should be return in registers, e.g. _Complex long double, some 128-bit aggregates. This is a short term solution that is necessary only because llvm, for now, cannot model i128 nor call's with multiple results. Status: This only works for direct calls, and only the caller side is done. Disabled for now. llvm-svn: 46527
-
Dale Johannesen authored
llvm-svn: 46485
-
- Jan 24, 2008
-
-
Evan Cheng authored
Let each target decide byval alignment. For X86, it's 4-byte unless the aggregare contains SSE vector(s). For x86-64, it's max of 8 or alignment of the type. llvm-svn: 46286
-
- Jan 18, 2008
-
-
Chris Lattner authored
llvm-svn: 46159
-
- Jan 16, 2008
-
-
Chris Lattner authored
llvm-svn: 46058
-
Chris Lattner authored
some code. No functionality change. llvm-svn: 46055
-
- Jan 15, 2008
-
-
Chris Lattner authored
llvm-svn: 46015
-
Anton Korobeynikov authored
as well as PPC codegen llvm-svn: 46001
-
- Jan 05, 2008
-
-
Gordon Henriksen authored
unifying the copied algorithms and saving over 500 LOC. There should be no functionality change, but please test on your favorite x86 target. llvm-svn: 45627
-
- Dec 29, 2007
-
-
Chris Lattner authored
llvm-svn: 45418
-
- Dec 14, 2007
-
-
Evan Cheng authored
llvm-svn: 45024
-
- Nov 24, 2007
-
-
Chris Lattner authored
1) Change the interface to TargetLowering::ExpandOperationResult to take and return entire NODES that need a result expanded, not just the value. This allows us to handle things like READCYCLECOUNTER, which returns two values. 2) Implement (extremely limited) support in LegalizeDAG::ExpandOp for MERGE_VALUES. 3) Reimplement custom lowering in LegalizeDAGTypes in terms of the new ExpandOperationResult. This makes the result simpler and fully general. 4) Implement (fully general) expand support for MERGE_VALUES in LegalizeDAGTypes. 5) Implement ExpandOperationResult support for ARM f64->i64 bitconvert and ARM i64 shifts, allowing them to work with LegalizeDAGTypes. 6) Implement ExpandOperationResult support for X86 READCYCLECOUNTER and FP_TO_SINT, allowing them to work with LegalizeDAGTypes. LegalizeDAGTypes now passes several more X86 codegen tests when enabled and when type legalization in LegalizeDAG is ifdef'd out. llvm-svn: 44300
-
- Nov 16, 2007
-
-
Anton Korobeynikov authored
llvm-svn: 44183
-
- Nov 09, 2007
-
-
Evan Cheng authored
Then: call "L1$pb" "L1$pb": popl %eax ... LBB1_1: # entry imull $4, %ecx, %ecx leal LJTI1_0-"L1$pb"(%eax), %edx addl LJTI1_0-"L1$pb"(%ecx,%eax), %edx jmpl *%edx .align 2 .set L1_0_set_3,LBB1_3-LJTI1_0 .set L1_0_set_2,LBB1_2-LJTI1_0 .set L1_0_set_5,LBB1_5-LJTI1_0 .set L1_0_set_4,LBB1_4-LJTI1_0 LJTI1_0: .long L1_0_set_3 .long L1_0_set_2 Now: call "L1$pb" "L1$pb": popl %eax ... LBB1_1: # entry addl LJTI1_0-"L1$pb"(%eax,%ecx,4), %eax jmpl *%eax .align 2 .set L1_0_set_3,LBB1_3-"L1$pb" .set L1_0_set_2,LBB1_2-"L1$pb" .set L1_0_set_5,LBB1_5-"L1$pb" .set L1_0_set_4,LBB1_4-"L1$pb" LJTI1_0: .long L1_0_set_3 .long L1_0_set_2 llvm-svn: 43924
-
- Nov 06, 2007
-
-
Rafael Espindola authored
Thanks for the suggestions Bill :-) llvm-svn: 43742
-
- Oct 29, 2007
-
-
Evan Cheng authored
transformation. Previously, it's restricted by ensuring the number of load uses is one. Now the restriction is loosened up by allowing setcc uses to be "extended" (e.g. setcc x, c, eq -> setcc sext(x), sext(c), eq). llvm-svn: 43465
-
- Oct 26, 2007
-
-
Evan Cheng authored
Loosen up iv reuse to allow reuse of the same stride but a larger type when truncating from the larger type to smaller type is free. e.g. Turns this loop: LBB1_1: # entry.bb_crit_edge xorl %ecx, %ecx xorw %dx, %dx movw %dx, %si LBB1_2: # bb movl L_X$non_lazy_ptr, %edi movw %si, (%edi) movl L_Y$non_lazy_ptr, %edi movw %dx, (%edi) addw $4, %dx incw %si incl %ecx cmpl %eax, %ecx jne LBB1_2 # bb into LBB1_1: # entry.bb_crit_edge xorl %ecx, %ecx xorw %dx, %dx LBB1_2: # bb movl L_X$non_lazy_ptr, %esi movw %cx, (%esi) movl L_Y$non_lazy_ptr, %esi movw %dx, (%esi) addw $4, %dx incl %ecx cmpl %eax, %ecx jne LBB1_2 # bb llvm-svn: 43375
-
- Oct 11, 2007
-
-
Arnold Schwaighofer authored
enabled by passing -tailcallopt to llc. The optimization is performed if the following conditions are satisfied: * caller/callee are fastcc * elf/pic is disabled OR elf/pic enabled + callee is in module + callee has visibility protected or hidden llvm-svn: 42870
-
- Oct 09, 2007
-
-
Dan Gohman authored
llvm-svn: 42787
-