Commits · 3b341901004660c0e1ea4bf48081a3cd3725fdb4 · Roger Ferrer / llvm-epi-0.8

Dec 27, 2012
- AVX/AVX2: Move the SEXT lowering code from a target specific DAGco to a lowering function. · 3b341901
  Nadav Rotem authored Dec 27, 2012
```
llvm-svn: 171170
```
  3b341901
Dec 21, 2012
- X86: Match the SSE/AVX min/max vector ops using a custom node instead of intrinsics · 4669d188
  Benjamin Kramer authored Dec 21, 2012
```
This is very mechanical, no functionality change. Preparation for PR14667.

llvm-svn: 170898
```
  4669d188
- Add a missing "virtual" keyword. · eacbb731
  Nadav Rotem authored Dec 21, 2012
```
llvm-svn: 170842
```
  eacbb731
- Improve the X86 cost model for loads and stores. · 6d4fdd6d
  Nadav Rotem authored Dec 21, 2012
```
llvm-svn: 170830
```
  6d4fdd6d
Dec 19, 2012
- Change TargetLowering::getTypeForExtArgOrReturn to take and return · e09cac9a
  Patrik Hagglund authored Dec 19, 2012
```
MVTs, instead of EVTs.

llvm-svn: 170537
```
  e09cac9a
- Change TargetLowering::findRepresentativeClass to take an MVT, instead · f9eb168e
  Patrik Hagglund authored Dec 19, 2012
```
of EVT.

llvm-svn: 170532
```
  f9eb168e
Dec 17, 2012
- Simplify BMI ANDN matching to use patterns instead of a DAG combine. Also add... · f3ff6ae0
  Craig Topper authored Dec 17, 2012
```
Simplify BMI ANDN matching to use patterns instead of a DAG combine. Also add ANDN to isDefConvertible.

llvm-svn: 170305
```
  f3ff6ae0
Dec 15, 2012

X86: Add a couple of target-specific dag combines that turn VSELECTS into psubus if possible. · b16ccde7

Benjamin Kramer authored Dec 15, 2012

We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases
if y is a constant. DAGCombiner canonicalizes those so we first have to undo the
canonicalization for those cases. The pattern occurs in gzip when the loop
vectorizer is enabled. Part of PR14613.

llvm-svn: 170273

b16ccde7

Dec 12, 2012

Sorry about the churn. One more change to getOptimalMemOpType() hook. Did I · 962711ee

Evan Cheng authored Dec 12, 2012

mention the inline memcpy / memset expansion code is a mess?

This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset.
The first indicates whether it is expanding a memset or a memcpy / memmove.
The later is whether the memset is a memset of zero. It's totally possible
(likely even) that targets may want to do different things for memcpy and
memset of zero.

llvm-svn: 169959

962711ee

- Rename isLegalMemOpType to isSafeMemOpType. "Legal" is a very overloade term. · c3d1aca6

Evan Cheng authored Dec 12, 2012

Also added more comments to explain why it is generally ok to return true.
- Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to
be true for loaded source (memcpy) or zero constants (memset). The poor name
choice is probably some kind of legacy issue.

llvm-svn: 169954

c3d1aca6

Avoid using lossy load / stores for memcpy / memset expansion. e.g. · 04e55187
Evan Cheng authored Dec 12, 2012
```
f64 load / store on non-SSE2 x86 targets.

llvm-svn: 169944
```
04e55187

Dec 11, 2012

Revert EVT->MVT changes, r169836-169851, due to buildbot failures. · e98b7a03
Patrik Hagglund authored Dec 11, 2012
```
llvm-svn: 169854
```
e98b7a03
Change TargetLowering::getTypeForExtArgOrReturn to take and return · ad432a8e
Patrik Hagglund authored Dec 11, 2012
```
MVTs, instead of EVTs.

Accordingly, add bitsLT (and similar) to MVT.

llvm-svn: 169850
```
ad432a8e
Change TargetLowering::findRepresentativeClass to take an MVT, instead · 8d2e7cf5
Patrik Hagglund authored Dec 11, 2012
```
of EVT.

llvm-svn: 169845
```
8d2e7cf5

Some enhancements for memcpy / memset inline expansion. · 79e2ca90

Evan Cheng authored Dec 10, 2012

1. Teach it to use overlapping unaligned load / store to copy / set the trailing
   bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
   x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
   if it's not possible to materialize an integer immediate with a single
   instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
   are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
   Also increase the threshold to something reasonable (8 for memset, 4 pairs
   for memcpy).

This significantly improves Dhrystone, up to 50% on ARM iOS devices.

rdar://12760078

llvm-svn: 169791

79e2ca90

Dec 09, 2012
- - Re-enable population count loop idiom recognization · 95de7c37
  Shuxin Yang authored Dec 09, 2012
```
- fix a bug which cause sigfault.
- add two testing cases which was causing crash

llvm-svn: 169687
```
  95de7c37
Dec 08, 2012

Revert the patches adding a popcount loop idiom recognition pass. · 91e47532

Chandler Carruth authored Dec 08, 2012

There are still bugs in this pass, as well as other issues that are
being worked on, but the bugs are crashers that occur pretty easily in
the wild. Test cases have been sent to the original commit's review
thread.

This reverts the commits:
  r169671: Fix a logic error.
  r169604: Move the popcnt tests to an X86 subdirectory.
  r168931: Initial commit adding the pass.

llvm-svn: 169683

91e47532

Dec 06, 2012

Replace r169459 with something safer. Rather than having computeMaskedBits to · 9ec512d7

Evan Cheng authored Dec 06, 2012

understand target implementation of any_extend / extload, just generate
zero_extend in place of any_extend for liveouts when the target knows the
zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz).

rdar://12771555

llvm-svn: 169536

9ec512d7

Let targets provide hooks that compute known zero and ones for any_extend · 5213139f

Evan Cheng authored Dec 06, 2012

and extload's. If they are implemented as zero-extend, or implicitly
zero-extend, then this can enable more demanded bits optimizations. e.g.

define void @foo(i16* %ptr, i32 %a) nounwind {
entry:
  %tmp1 = icmp ult i32 %a, 100
  br i1 %tmp1, label %bb1, label %bb2
bb1:
  %tmp2 = load i16* %ptr, align 2
  br label %bb2
bb2:
  %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ]
  %cmp = icmp ult i16 %tmp3, 24
  br i1 %cmp, label %bb3, label %exit
bb3:
  call void @bar() nounwind
  br label %exit
exit:
  ret void
}

This compiles to the followings before:
        push    {lr}
        mov     r2, #0
        cmp     r1, #99
        bhi     LBB0_2
@ BB#1:                                 @ %bb1
        ldrh    r2, [r0]
LBB0_2:                                 @ %bb2
        uxth    r0, r2
        cmp     r0, #23
        bhi     LBB0_4
@ BB#3:                                 @ %bb3
        bl      _bar
LBB0_4:                                 @ %exit
        pop     {lr}
        bx      lr

The uxth is not needed since ldrh implicitly zero-extend the high bits. With
this change it's eliminated.

rdar://12771555

llvm-svn: 169459

5213139f

Dec 05, 2012
- Simplified BLEND pattern matching for shuffles. · cd3c1c4a
  Elena Demikhovsky authored Dec 05, 2012
```
Generate VPBLENDD for AVX2 and VPBLENDW for v16i16 type on AVX2.

llvm-svn: 169366
```
  cd3c1c4a
Dec 04, 2012

Sort includes for all of the .h files under the 'lib' tree. These were · 802d7555

Chandler Carruth authored Dec 04, 2012

missed in the first pass because the script didn't yet handle include
guards.

Note that the script is now able to handle all of these headers without
manual edits. =]

llvm-svn: 169224

802d7555

Nov 29, 2012

rdar://12100355 (part 1) · abcc3704

Shuxin Yang authored Nov 29, 2012

This revision attempts to recognize following population-count pattern:

 while(a) { c++; ... ; a &= a - 1; ... },
  where <c> and <a>could be used multiple times in the loop body.

 TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent 
instruction sequence, which need to be improved in the following commits.

Reviewed by Nadav, really appreciate!

llvm-svn: 168931

abcc3704

Nov 11, 2012
- Move some helper methods to being static functions in the implementation file. · dd13d3fd
  Craig Topper authored Nov 11, 2012
```
llvm-svn: 167696
```
  dd13d3fd
Nov 10, 2012
- Removed unimplemented method declaration. · 2dfc1a4d
  Craig Topper authored Nov 10, 2012
```
llvm-svn: 167670
```
  2dfc1a4d
- Simplify custom emitter code for pcmp(e/i)str(i/m) and make the helper functions static. · f5d52740
  Craig Topper authored Nov 10, 2012
```
llvm-svn: 167669
```
  f5d52740
- Cleanup pcmp(e/i)str(m/i) instruction definitions and load folding support. · 9268c94b
  Craig Topper authored Nov 10, 2012
```
llvm-svn: 167652
```
  9268c94b
Nov 08, 2012

Add support of RTM from TSX extension · 73cffddb

Michael Liao authored Nov 08, 2012

- Add RTM code generation support throught 3 X86 intrinsics:
  xbegin()/xend() to start/end a transaction region, and xabort() to abort a
  tranaction region

llvm-svn: 167573

73cffddb

Nov 06, 2012
- Cost Model: add tables for some avx type-conversion hacks. · 0914f0b2
  Nadav Rotem authored Nov 06, 2012
```
llvm-svn: 167480
```
  0914f0b2
- CostModel: Add tables for the common x86 compares. · c378a806
  Nadav Rotem authored Nov 05, 2012
```
llvm-svn: 167421
```
  c378a806
Nov 03, 2012
- X86 CostModel: Add support for a some of the common arithmetic instructions for SSE4, AVX and AVX2. · c2345cbe
  Nadav Rotem authored Nov 03, 2012
```
llvm-svn: 167347
```
  c2345cbe
- Add a stub for the x86 cost model impl. Implement a basic cost rule for... · 23848f8f
  Nadav Rotem authored Nov 02, 2012
```
Add a stub for the x86 cost model impl. Implement a basic cost rule for inserting/extracting from XMM registers.

llvm-svn: 167333
```
  23848f8f
Oct 31, 2012
- Clean up redundant SP register maintained in X86 TLI · e2d7e4e8
  Michael Liao authored Oct 31, 2012
```
llvm-svn: 167104
```
  e2d7e4e8
Oct 30, 2012

X86 MMX: optimize transfer from mmx to i32 · acb8becc

Manman Ren authored Oct 30, 2012

We used to generate a store (movq) + a load.
Now we use movd.

rdar://9946746

llvm-svn: 167056

acb8becc

Oct 23, 2012
- Add custom UINT_TO_FP from v4i8/v4i16/v8i8/v8i16 to v4f32/v8f32 · c03c03d5
  Michael Liao authored Oct 23, 2012
```
- Replace v4i8/v8i8 -> v8f32 DAG combine with custom lowering to reduce
  DAG combine overhead.
- Extend the support to v4i16/v8i16 as well.

llvm-svn: 166487
```
  c03c03d5
- Enable lowering ZERO_EXTEND/ANY_EXTEND to PMOVZX from SSE4.1 · 1be96bb5
  Michael Liao authored Oct 23, 2012
```
llvm-svn: 166486
```
  1be96bb5
Oct 19, 2012

Lower BUILD_VECTOR to SHUFFLE + INSERT_VECTOR_ELT for X86 · 4b7ccfca

Michael Liao authored Oct 19, 2012

- If INSERT_VECTOR_ELT is supported (above SSE2, either by custom
  sequence of legal insn), transform BUILD_VECTOR into SHUFFLE +
  INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few
  (so far 1) elements being inserted.

llvm-svn: 166288

4b7ccfca

Oct 16, 2012

Support v8f32 to v8i8/vi816 conversion through custom lowering · 02ca3454

Michael Liao authored Oct 16, 2012

- Add custom FP_TO_SINT on v8i16 (and v8i8 which is legalized as v8i16 due to
  vector element-wise widening) to reduce DAG combiner and its overhead added
  in X86 backend.

llvm-svn: 166036

02ca3454

Add __builtin_setjmp/_longjmp supprt in X86 backend · 97bf363a

Michael Liao authored Oct 15, 2012

- Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also
  used as a light-weight replacement of setjmp/longjmp which are used to
  implementation continuation, user-level threading, and etc. The support added
  in this patch ONLY addresses this usage and is NOT intended to support SjLj
  exception handling as zero-cost DWARF exception handling is used by default
  in X86.

llvm-svn: 165989

97bf363a

Oct 10, 2012

Add support for FP_ROUND from v2f64 to v2f32 · e999b865

Michael Liao authored Oct 10, 2012

- Due to the current matching vector elements constraints in
  ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from
  v2f32) is scalarized. Add a customized v2f32 widening to convert it
  into a target-specific X86ISD::VFPROUND to work around this
  constraints.

llvm-svn: 165631

e999b865

Add alternative support for FP_ROUND from v2f32 to v2f64 · effae0c8

Michael Liao authored Oct 10, 2012

- Due to the current matching vector elements constraints in ISD::FP_EXTEND,
  rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening
  to convert it into a target-specific X86ISD::VFPEXT to work around this
  constraints. This patch also reverts a previous attempt to fix this issue by
  recovering the scalarized ISD::FP_EXTEND pattern and thus significantly
  reduces the overhead of supporting non-power-2 vector FP extend.

llvm-svn: 165625

effae0c8