- Dec 21, 2012
-
-
Nadav Rotem authored
llvm-svn: 170830
-
- Dec 19, 2012
-
-
Patrik Hagglund authored
MVTs, instead of EVTs. llvm-svn: 170537
-
Patrik Hagglund authored
of EVT. llvm-svn: 170532
-
- Dec 17, 2012
-
-
Craig Topper authored
Simplify BMI ANDN matching to use patterns instead of a DAG combine. Also add ANDN to isDefConvertible. llvm-svn: 170305
-
- Dec 15, 2012
-
-
Benjamin Kramer authored
We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases if y is a constant. DAGCombiner canonicalizes those so we first have to undo the canonicalization for those cases. The pattern occurs in gzip when the loop vectorizer is enabled. Part of PR14613. llvm-svn: 170273
-
- Dec 12, 2012
-
-
Evan Cheng authored
mention the inline memcpy / memset expansion code is a mess? This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset. The first indicates whether it is expanding a memset or a memcpy / memmove. The later is whether the memset is a memset of zero. It's totally possible (likely even) that targets may want to do different things for memcpy and memset of zero. llvm-svn: 169959
-
Evan Cheng authored
Also added more comments to explain why it is generally ok to return true. - Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to be true for loaded source (memcpy) or zero constants (memset). The poor name choice is probably some kind of legacy issue. llvm-svn: 169954
-
Evan Cheng authored
f64 load / store on non-SSE2 x86 targets. llvm-svn: 169944
-
- Dec 11, 2012
-
-
Patrik Hagglund authored
llvm-svn: 169854
-
Patrik Hagglund authored
MVTs, instead of EVTs. Accordingly, add bitsLT (and similar) to MVT. llvm-svn: 169850
-
Patrik Hagglund authored
of EVT. llvm-svn: 169845
-
Evan Cheng authored
1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do *not* replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 llvm-svn: 169791
-
- Dec 09, 2012
-
-
Shuxin Yang authored
- fix a bug which cause sigfault. - add two testing cases which was causing crash llvm-svn: 169687
-
- Dec 08, 2012
-
-
Chandler Carruth authored
There are still bugs in this pass, as well as other issues that are being worked on, but the bugs are crashers that occur pretty easily in the wild. Test cases have been sent to the original commit's review thread. This reverts the commits: r169671: Fix a logic error. r169604: Move the popcnt tests to an X86 subdirectory. r168931: Initial commit adding the pass. llvm-svn: 169683
-
- Dec 06, 2012
-
-
Evan Cheng authored
understand target implementation of any_extend / extload, just generate zero_extend in place of any_extend for liveouts when the target knows the zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz). rdar://12771555 llvm-svn: 169536
-
Evan Cheng authored
and extload's. If they are implemented as zero-extend, or implicitly zero-extend, then this can enable more demanded bits optimizations. e.g. define void @foo(i16* %ptr, i32 %a) nounwind { entry: %tmp1 = icmp ult i32 %a, 100 br i1 %tmp1, label %bb1, label %bb2 bb1: %tmp2 = load i16* %ptr, align 2 br label %bb2 bb2: %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ] %cmp = icmp ult i16 %tmp3, 24 br i1 %cmp, label %bb3, label %exit bb3: call void @bar() nounwind br label %exit exit: ret void } This compiles to the followings before: push {lr} mov r2, #0 cmp r1, #99 bhi LBB0_2 @ BB#1: @ %bb1 ldrh r2, [r0] LBB0_2: @ %bb2 uxth r0, r2 cmp r0, #23 bhi LBB0_4 @ BB#3: @ %bb3 bl _bar LBB0_4: @ %exit pop {lr} bx lr The uxth is not needed since ldrh implicitly zero-extend the high bits. With this change it's eliminated. rdar://12771555 llvm-svn: 169459
-
- Dec 05, 2012
-
-
Elena Demikhovsky authored
Generate VPBLENDD for AVX2 and VPBLENDW for v16i16 type on AVX2. llvm-svn: 169366
-
- Dec 04, 2012
-
-
Chandler Carruth authored
missed in the first pass because the script didn't yet handle include guards. Note that the script is now able to handle all of these headers without manual edits. =] llvm-svn: 169224
-
- Nov 29, 2012
-
-
rdar://12100355Shuxin Yang authored
This revision attempts to recognize following population-count pattern: while(a) { c++; ... ; a &= a - 1; ... }, where <c> and <a>could be used multiple times in the loop body. TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent instruction sequence, which need to be improved in the following commits. Reviewed by Nadav, really appreciate! llvm-svn: 168931
-
- Nov 11, 2012
-
-
Craig Topper authored
llvm-svn: 167696
-
- Nov 10, 2012
-
-
Craig Topper authored
llvm-svn: 167670
-
Craig Topper authored
llvm-svn: 167669
-
Craig Topper authored
llvm-svn: 167652
-
- Nov 08, 2012
-
-
Michael Liao authored
- Add RTM code generation support throught 3 X86 intrinsics: xbegin()/xend() to start/end a transaction region, and xabort() to abort a tranaction region llvm-svn: 167573
-
- Nov 06, 2012
-
-
Nadav Rotem authored
llvm-svn: 167480
-
Nadav Rotem authored
llvm-svn: 167421
-
- Nov 03, 2012
-
-
Nadav Rotem authored
llvm-svn: 167347
-
Nadav Rotem authored
Add a stub for the x86 cost model impl. Implement a basic cost rule for inserting/extracting from XMM registers. llvm-svn: 167333
-
- Oct 31, 2012
-
-
Michael Liao authored
llvm-svn: 167104
-
- Oct 30, 2012
-
-
Manman Ren authored
We used to generate a store (movq) + a load. Now we use movd. rdar://9946746 llvm-svn: 167056
-
- Oct 23, 2012
-
-
Michael Liao authored
- Replace v4i8/v8i8 -> v8f32 DAG combine with custom lowering to reduce DAG combine overhead. - Extend the support to v4i16/v8i16 as well. llvm-svn: 166487
-
Michael Liao authored
llvm-svn: 166486
-
- Oct 19, 2012
-
-
Michael Liao authored
- If INSERT_VECTOR_ELT is supported (above SSE2, either by custom sequence of legal insn), transform BUILD_VECTOR into SHUFFLE + INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few (so far 1) elements being inserted. llvm-svn: 166288
-
- Oct 16, 2012
-
-
Michael Liao authored
- Add custom FP_TO_SINT on v8i16 (and v8i8 which is legalized as v8i16 due to vector element-wise widening) to reduce DAG combiner and its overhead added in X86 backend. llvm-svn: 166036
-
Michael Liao authored
- Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also used as a light-weight replacement of setjmp/longjmp which are used to implementation continuation, user-level threading, and etc. The support added in this patch ONLY addresses this usage and is NOT intended to support SjLj exception handling as zero-cost DWARF exception handling is used by default in X86. llvm-svn: 165989
-
- Oct 10, 2012
-
-
Michael Liao authored
- Due to the current matching vector elements constraints in ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from v2f32) is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPROUND to work around this constraints. llvm-svn: 165631
-
Michael Liao authored
- Due to the current matching vector elements constraints in ISD::FP_EXTEND, rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPEXT to work around this constraints. This patch also reverts a previous attempt to fix this issue by recovering the scalarized ISD::FP_EXTEND pattern and thus significantly reduces the overhead of supporting non-power-2 vector FP extend. llvm-svn: 165625
-
- Oct 08, 2012
-
-
Micah Villmow authored
llvm-svn: 165402
-
- Sep 25, 2012
-
-
Michael Liao authored
- Turn on atomic6432.ll and add specific test case as well llvm-svn: 164616
-
Evan Cheng authored
Fix an illegal tailcall opt where the callee returns a double via xmm while caller returns x86_fp80 via st0. rdar://12229511 llvm-svn: 164588
-