Commits · 03f9ad0e6763482e1abe6c2c03a2d58a6353489e · Roger Ferrer / llvm-epi-0.8

Mar 26, 2013
- Add XTEST codegen support · 03f9ad0e
  Michael Liao authored Mar 26, 2013
```
llvm-svn: 178083
```
  03f9ad0e
Mar 01, 2013

Fix PR10475 · 6af16fc3

Michael Liao authored Mar 01, 2013

- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364

6af16fc3

Feb 15, 2013
- The operand listing is very much outdated. · a1c6635c
  Eli Bendersky authored Feb 14, 2013
```
llvm-svn: 175220
```
  a1c6635c
Jan 29, 2013

Teach SDISel to combine fsin / fcos into a fsincos node if the following · 0e88c7d8

Evan Cheng authored Jan 29, 2013

conditions are met:
1. They share the same operand and are in the same BB.
2. Both outputs are used.
3. The target has a native instruction that maps to ISD::FSINCOS node or
   the target provides a sincos library call.

Implemented the generic optimization in sdisel and enabled it for
Mac OSX. Also added an additional optimization for x86_64 Mac OSX by
using an alternative entry point __sincos_stret which returns the two
results in xmm0 / xmm1.

rdar://13087969
PR13204

llvm-svn: 173755

0e88c7d8

Jan 28, 2013
- Fix inconsistent usage of PALIGN and PALIGNR when referring to the same instruction. · 8fb09f0a
  Craig Topper authored Jan 28, 2013
```
llvm-svn: 173667
```
  8fb09f0a
Jan 21, 2013
- Make helper method static. · 2cd37589
  Craig Topper authored Jan 21, 2013
```
llvm-svn: 173005
```
  2cd37589
Jan 20, 2013
- Capitalize lowerTRUNCATE so that it matches the other lower functions in this... · e65a08be
  Craig Topper authored Jan 20, 2013
```
Capitalize lowerTRUNCATE so that it matches the other lower functions in this file despite it not matching coding standards.

llvm-svn: 172994
```
  e65a08be
- Make LowerVSETCC a static function and use MVT instead of EVT. · ce61fdf0
  Craig Topper authored Jan 20, 2013
```
llvm-svn: 172969
```
  ce61fdf0
- Make some helper methods static. · 9976974c
  Craig Topper authored Jan 20, 2013
```
llvm-svn: 172936
```
  9976974c
- Capitalize LowerVectorIntExtend to be consistent with all the other lower functions in this file. · bb772d27
  Craig Topper authored Jan 19, 2013
```
llvm-svn: 172927
```
  bb772d27
Jan 09, 2013

Efficient lowering of vector sdiv when the divisor is a splatted power of two constant. · 977e0be4

Nadav Rotem authored Jan 09, 2013

PR 14848. The lowered sequence is based on the existing sequence the target-independent
DAG Combiner creates for the scalar case.

Patch by Zvi Rackover.

llvm-svn: 171953

977e0be4

Jan 07, 2013

Switch TargetTransformInfo from an immutable analysis pass that requires · 664e354d

Chandler Carruth authored Jan 07, 2013

a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.

The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.

The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.

The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.

The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.

The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.

The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.

The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.

Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.

Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.

Commits to update DragonEgg and Clang will be made presently.

llvm-svn: 171681

664e354d

Jan 04, 2013

LoopVectorizer: · e1d5c4b8

Nadav Rotem authored Jan 04, 2013

1. Add code to estimate register pressure.
2. Add code to select the unroll factor based on register pressure.
3. Add bits to TargetTransformInfo to provide the number of registers.

llvm-svn: 171469

e1d5c4b8

Jan 03, 2013

Add a subtype parameter to VTTI::getShuffleCost · 95de3f30

Hal Finkel authored Jan 03, 2013

In order to cost subvector insertion and extraction, we need to know
the type of the subvector being extracted.

No functionality change.

llvm-svn: 171453

95de3f30

Dec 28, 2012

CostModel: initial checkin for code that estimates the cost of special shuffles. · 9785f519
Nadav Rotem authored Dec 28, 2012
```
llvm-svn: 171180
```
9785f519

AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these... · 3da9ac72

Nadav Rotem authored Dec 28, 2012

AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these optimizations. The old test cases still cover all of these lowering/optimizations. The single change that we have is that now anyext does not need to zero a register, because it does not use the exact code path as the zero_extend.

llvm-svn: 171178

3da9ac72

Dec 27, 2012
- AVX/AVX2: Move the SEXT lowering code from a target specific DAGco to a lowering function. · 3b341901
  Nadav Rotem authored Dec 27, 2012
```
llvm-svn: 171170
```
  3b341901
Dec 21, 2012
- X86: Match the SSE/AVX min/max vector ops using a custom node instead of intrinsics · 4669d188
  Benjamin Kramer authored Dec 21, 2012
```
This is very mechanical, no functionality change. Preparation for PR14667.

llvm-svn: 170898
```
  4669d188
- Add a missing "virtual" keyword. · eacbb731
  Nadav Rotem authored Dec 21, 2012
```
llvm-svn: 170842
```
  eacbb731
- Improve the X86 cost model for loads and stores. · 6d4fdd6d
  Nadav Rotem authored Dec 21, 2012
```
llvm-svn: 170830
```
  6d4fdd6d
Dec 19, 2012
- Change TargetLowering::getTypeForExtArgOrReturn to take and return · e09cac9a
  Patrik Hagglund authored Dec 19, 2012
```
MVTs, instead of EVTs.

llvm-svn: 170537
```
  e09cac9a
- Change TargetLowering::findRepresentativeClass to take an MVT, instead · f9eb168e
  Patrik Hagglund authored Dec 19, 2012
```
of EVT.

llvm-svn: 170532
```
  f9eb168e
Dec 17, 2012
- Simplify BMI ANDN matching to use patterns instead of a DAG combine. Also add... · f3ff6ae0
  Craig Topper authored Dec 17, 2012
```
Simplify BMI ANDN matching to use patterns instead of a DAG combine. Also add ANDN to isDefConvertible.

llvm-svn: 170305
```
  f3ff6ae0
Dec 15, 2012

X86: Add a couple of target-specific dag combines that turn VSELECTS into psubus if possible. · b16ccde7

Benjamin Kramer authored Dec 15, 2012

We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases
if y is a constant. DAGCombiner canonicalizes those so we first have to undo the
canonicalization for those cases. The pattern occurs in gzip when the loop
vectorizer is enabled. Part of PR14613.

llvm-svn: 170273

b16ccde7

Dec 12, 2012

Sorry about the churn. One more change to getOptimalMemOpType() hook. Did I · 962711ee

Evan Cheng authored Dec 12, 2012

mention the inline memcpy / memset expansion code is a mess?

This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset.
The first indicates whether it is expanding a memset or a memcpy / memmove.
The later is whether the memset is a memset of zero. It's totally possible
(likely even) that targets may want to do different things for memcpy and
memset of zero.

llvm-svn: 169959

962711ee

- Rename isLegalMemOpType to isSafeMemOpType. "Legal" is a very overloade term. · c3d1aca6

Evan Cheng authored Dec 12, 2012

Also added more comments to explain why it is generally ok to return true.
- Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to
be true for loaded source (memcpy) or zero constants (memset). The poor name
choice is probably some kind of legacy issue.

llvm-svn: 169954

c3d1aca6

Avoid using lossy load / stores for memcpy / memset expansion. e.g. · 04e55187
Evan Cheng authored Dec 12, 2012
```
f64 load / store on non-SSE2 x86 targets.

llvm-svn: 169944
```
04e55187

Dec 11, 2012

Revert EVT->MVT changes, r169836-169851, due to buildbot failures. · e98b7a03
Patrik Hagglund authored Dec 11, 2012
```
llvm-svn: 169854
```
e98b7a03
Change TargetLowering::getTypeForExtArgOrReturn to take and return · ad432a8e
Patrik Hagglund authored Dec 11, 2012
```
MVTs, instead of EVTs.

Accordingly, add bitsLT (and similar) to MVT.

llvm-svn: 169850
```
ad432a8e
Change TargetLowering::findRepresentativeClass to take an MVT, instead · 8d2e7cf5
Patrik Hagglund authored Dec 11, 2012
```
of EVT.

llvm-svn: 169845
```
8d2e7cf5

Some enhancements for memcpy / memset inline expansion. · 79e2ca90

Evan Cheng authored Dec 10, 2012

1. Teach it to use overlapping unaligned load / store to copy / set the trailing
   bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
   x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
   if it's not possible to materialize an integer immediate with a single
   instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
   are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
   Also increase the threshold to something reasonable (8 for memset, 4 pairs
   for memcpy).

This significantly improves Dhrystone, up to 50% on ARM iOS devices.

rdar://12760078

llvm-svn: 169791

79e2ca90

Dec 09, 2012
- - Re-enable population count loop idiom recognization · 95de7c37
  Shuxin Yang authored Dec 09, 2012
```
- fix a bug which cause sigfault.
- add two testing cases which was causing crash

llvm-svn: 169687
```
  95de7c37
Dec 08, 2012

Revert the patches adding a popcount loop idiom recognition pass. · 91e47532

Chandler Carruth authored Dec 08, 2012

There are still bugs in this pass, as well as other issues that are
being worked on, but the bugs are crashers that occur pretty easily in
the wild. Test cases have been sent to the original commit's review
thread.

This reverts the commits:
  r169671: Fix a logic error.
  r169604: Move the popcnt tests to an X86 subdirectory.
  r168931: Initial commit adding the pass.

llvm-svn: 169683

91e47532

Dec 06, 2012

Replace r169459 with something safer. Rather than having computeMaskedBits to · 9ec512d7

Evan Cheng authored Dec 06, 2012

understand target implementation of any_extend / extload, just generate
zero_extend in place of any_extend for liveouts when the target knows the
zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz).

rdar://12771555

llvm-svn: 169536

9ec512d7

Let targets provide hooks that compute known zero and ones for any_extend · 5213139f

Evan Cheng authored Dec 06, 2012

and extload's. If they are implemented as zero-extend, or implicitly
zero-extend, then this can enable more demanded bits optimizations. e.g.

define void @foo(i16* %ptr, i32 %a) nounwind {
entry:
  %tmp1 = icmp ult i32 %a, 100
  br i1 %tmp1, label %bb1, label %bb2
bb1:
  %tmp2 = load i16* %ptr, align 2
  br label %bb2
bb2:
  %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ]
  %cmp = icmp ult i16 %tmp3, 24
  br i1 %cmp, label %bb3, label %exit
bb3:
  call void @bar() nounwind
  br label %exit
exit:
  ret void
}

This compiles to the followings before:
        push    {lr}
        mov     r2, #0
        cmp     r1, #99
        bhi     LBB0_2
@ BB#1:                                 @ %bb1
        ldrh    r2, [r0]
LBB0_2:                                 @ %bb2
        uxth    r0, r2
        cmp     r0, #23
        bhi     LBB0_4
@ BB#3:                                 @ %bb3
        bl      _bar
LBB0_4:                                 @ %exit
        pop     {lr}
        bx      lr

The uxth is not needed since ldrh implicitly zero-extend the high bits. With
this change it's eliminated.

rdar://12771555

llvm-svn: 169459

5213139f

Dec 05, 2012
- Simplified BLEND pattern matching for shuffles. · cd3c1c4a
  Elena Demikhovsky authored Dec 05, 2012
```
Generate VPBLENDD for AVX2 and VPBLENDW for v16i16 type on AVX2.

llvm-svn: 169366
```
  cd3c1c4a
Dec 04, 2012

Sort includes for all of the .h files under the 'lib' tree. These were · 802d7555

Chandler Carruth authored Dec 04, 2012

missed in the first pass because the script didn't yet handle include
guards.

Note that the script is now able to handle all of these headers without
manual edits. =]

llvm-svn: 169224

802d7555

Nov 29, 2012

rdar://12100355 (part 1) · abcc3704

Shuxin Yang authored Nov 29, 2012

This revision attempts to recognize following population-count pattern:

 while(a) { c++; ... ; a &= a - 1; ... },
  where <c> and <a>could be used multiple times in the loop body.

 TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent 
instruction sequence, which need to be improved in the following commits.

Reviewed by Nadav, really appreciate!

llvm-svn: 168931

abcc3704

Nov 11, 2012
- Move some helper methods to being static functions in the implementation file. · dd13d3fd
  Craig Topper authored Nov 11, 2012
```
llvm-svn: 167696
```
  dd13d3fd
Nov 10, 2012
- Removed unimplemented method declaration. · 2dfc1a4d
  Craig Topper authored Nov 10, 2012
```
llvm-svn: 167670
```
  2dfc1a4d