- Mar 24, 2011
-
-
Andrew Trick authored
I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix. llvm-svn: 128181
-
- Mar 23, 2011
-
-
Andrew Trick authored
(target-specific branchless method for double-width relational comparisons on x86) llvm-svn: 128175
-
- Mar 21, 2011
-
-
Evan Cheng authored
Re-apply r127953 with fixes: eliminate empty return block if it has no predecessors; update dominator tree if cfg is modified. llvm-svn: 127981
-
- Mar 19, 2011
-
-
Daniel Dunbar authored
to canonicalize IR", it broke a lot of things. llvm-svn: 127954
-
Evan Cheng authored
to have single return block (at least getting there) for optimizations. This is general goodness but it would prevent some tailcall optimizations. One specific case is code like this: int f1(void); int f2(void); int f3(void); int f4(void); int f5(void); int f6(void); int foo(int x) { switch(x) { case 1: return f1(); case 2: return f2(); case 3: return f3(); case 4: return f4(); case 5: return f5(); case 6: return f6(); } } => LBB0_2: ## %sw.bb callq _f1 popq %rbp ret LBB0_3: ## %sw.bb1 callq _f2 popq %rbp ret LBB0_4: ## %sw.bb3 callq _f3 popq %rbp ret This patch teaches codegenprep to duplicate returns when the return value is a phi and where the phi operands are produced by tail calls followed by an unconditional branch: sw.bb7: ; preds = %entry %call8 = tail call i32 @f5() nounwind br label %return sw.bb9: ; preds = %entry %call10 = tail call i32 @f6() nounwind br label %return return: %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ] ret i32 %retval.0 This allows codegen to generate better code like this: LBB0_2: ## %sw.bb jmp _f1 ## TAILCALL LBB0_3: ## %sw.bb1 jmp _f2 ## TAILCALL LBB0_4: ## %sw.bb3 jmp _f3 ## TAILCALL rdar://9147433 llvm-svn: 127953
-
Nadav Rotem authored
not have native support for this operation (such as X86). The legalized code uses two vector INT_TO_FP operations and is faster than scalarizing. llvm-svn: 127951
-
- Mar 18, 2011
-
-
Eli Friedman authored
llvm-svn: 127909
-
Eli Friedman authored
comparisons on x86. Essentially, the way this works is that SUB+SBB sets the relevant flags the same way a double-width CMP would. This is a substantial improvement over the generic lowering in LLVM. The output is also shorter than the gcc-generated output; I haven't done any detailed benchmarking, though. llvm-svn: 127852
-
- Mar 17, 2011
-
-
Cameron Zwarich authored
llvm-svn: 127809
-
Cameron Zwarich authored
llvm-svn: 127807
-
- Mar 16, 2011
-
-
Cameron Zwarich authored
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not generate incorrect code. This just fixes the zext at the return. We still insert an i32 ZextAssert when reading a function's arguments, but it is followed by a truncate and another i8 ZextAssert so it is not optimized. llvm-svn: 127766
-
- Mar 11, 2011
-
-
Eric Christopher authored
corresponding testcases back to the previous versions. Fixes some performance regressions only seen on 32-bit. llvm-svn: 127441
-
- Mar 10, 2011
-
-
Stuart Hastings authored
llvm-svn: 127382
-
- Mar 09, 2011
-
-
-
NAKAMURA Takumi authored
llvm-svn: 127328
-
- Mar 08, 2011
-
-
Benjamin Kramer authored
Found by inspection. llvm-svn: 127247
-
Eric Christopher authored
testcases accordingly. Some are currently xfailed and will be filed as bugs to be fixed or understood. Performance results: roughly neutral on SPEC some micro benchmarks in the llvm suite are up between 100 and 150%, only a pair of regressions that are due to be investigated john-the-ripper saw: 10% improvement in traditional DES 8% improvement in BSDI DES 59% improvement in FreeBSD MD5 67% improvement in OpenBSD Blowfish 14% improvement in LM DES Small compile time impact. llvm-svn: 127208
-
- Mar 07, 2011
-
-
Cameron Zwarich authored
llvm-svn: 127175
-
- Mar 05, 2011
-
-
Andrew Trick authored
regs. This is the only change in this checkin that may affects the default scheduler. With better register tracking and heuristics, it doesn't make sense to artificially lower the register limit so much. Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to give the scheduler a way to account for div and sqrt on targets that don't have an itinerary. It is currently defaults to 10 (the actual number doesn't matter much), but only takes effect on non-default schedulers: list-hybrid and list-ilp. Added several heuristics that can be individually disabled for the non-default sched=list-ilp mode. This helps us determine how much better we can do on a given benchmark than the default scheduler. Certain compute intensive loops run much faster in this mode with the right set of heuristics, and it doesn't seem to have much negative impact elsewhere. Not all of the heuristics are needed, but we still need to experiment to decide which should be disabled by default for sched=list-ilp. llvm-svn: 127067
-
- Mar 02, 2011
-
-
David Greene authored
missing patterns for them. Add a SIMD test subdirectory to hold tests for SIMD instruction selection correctness and quality. ' llvm-svn: 126845
-
- Feb 28, 2011
-
-
David Greene authored
[AVX] Add decode support for VUNPCKLPS/D instructions, both 128-bit and 256-bit forms. Because the number of elements in a vector does not determine the vector type (4 elements could be v4f32 or v4f64), pass the full type of the vector to decode routines. llvm-svn: 126664
-
- Feb 25, 2011
-
-
Owen Anderson authored
llvm-svn: 126518
-
- Feb 24, 2011
-
-
Chris Lattner authored
llvm-svn: 126441
-
- Feb 23, 2011
-
-
David Greene authored
[AVX] General VUNPCKL codegen support. llvm-svn: 126264
-
- Feb 22, 2011
-
-
Devang Patel authored
In other words, do not keep track of argument's location. The debugger (gdb) is not prepared to see line table entries for arguments. For the debugger, "second" line table entry marks beginning of function body. This requires some coordination with debugger to get this working. - The debugger needs to be aware of prolog_end attribute attached with line table entries. - The compiler needs to accurately mark prolog_end in line table entries (at -O0 and at -O1+) llvm-svn: 126155
-
- Feb 20, 2011
-
-
Eric Christopher authored
since one needs to be a register operand. Just use movss instead of forcing an operand into a register. Fixes PR9239 llvm-svn: 126072
-
- Feb 19, 2011
-
-
Eric Christopher authored
llvm-svn: 126018
-
- Feb 17, 2011
-
-
David Greene authored
[AVX] Recorganize X86ShuffleDecode into its own library (LLVMX86Utils.a) to break cyclic library dependencies between LLVMX86CodeGen.a and LLVMX86AsmParser.a. Previously this code was in a header file and marked static but AVX requires some additional functionality here that won't be used by all clients. Since including unused static functions causes a gcc compiler warning, keeping it as a header would break builds that use -Werror. Putting this in its own library solves both problems at once. llvm-svn: 125765
-
- Feb 16, 2011
-
-
Stuart Hastings authored
other getNode() methods. Radar 9002173. llvm-svn: 125665
-
- Feb 13, 2011
-
-
Chris Lattner authored
have their low bits set to zero. This allows us to optimize out explicit stack alignment code like in stack-align.ll:test4 when it is redundant. Doing this causes the code generator to start turning FI+cst into FI|cst all over the place, which is general goodness (that is the canonical form) except that various pieces of the code generator don't handle OR aggressively. Fix this by introducing a new SelectionDAG::isBaseWithConstantOffset predicate, and using it in places that are looking for ADD(X,CST). The ARM backend in particular was missing a lot of addressing mode folding opportunities around OR. llvm-svn: 125470
-
- Feb 11, 2011
-
-
David Greene authored
[AVX] Implement 256-bit vector lowering for SCALAR_TO_VECTOR. This largely completes support for 128-bit fallback lowering for code that is not 256-bit ready. llvm-svn: 125315
-
- Feb 10, 2011
-
-
David Greene authored
[AVX] Implement 256-bit vector lowering for EXTRACT_VECTOR_ELT. llvm-svn: 125284
-
- Feb 09, 2011
-
-
David Greene authored
[AVX] Implement 256-bit vector lowering for INSERT_VECTOR_ELT. llvm-svn: 125187
-
- Feb 08, 2011
-
-
David Greene authored
[AVX] Implement BUILD_VECTOR lowering for 256-bit vectors. For anything but the simplest of cases, lower a 256-bit BUILD_VECTOR by splitting it into 128-bit parts and recombining. llvm-svn: 125105
-
- Feb 07, 2011
-
-
David Greene authored
[AVX] Insert/extract subvector lowering support. This includes a couple of utility functions that will be used in other places for more AVX lowering. llvm-svn: 125029
-
- Feb 05, 2011
-
-
NAKAMURA Takumi authored
Target/X86: Tweak allocating shadow area (aka home) on Win64. It must be enough for caller to allocate one. llvm-svn: 124949
-
NAKAMURA Takumi authored
llvm-svn: 124948
-
NAKAMURA Takumi authored
llvm-svn: 124946
-
David Greene authored
[AVX] Revert 124910 until clients are ready. llvm-svn: 124912
-
David Greene authored
[AVX] Add some utilities to insert and extract 128-bit subvectors. This allows us to easily support 256-bit operations that don't have native 256-bit support. This applies to integer operations, certain types of shuffles and various othher things. llvm-svn: 124910
-