Commits · 7dee697faa5361fc953909b8d4547f2d08af5cab · Roger Ferrer / llvm-epi-0.8

Aug 13, 2013
- AVX-512: Added CMP and BLEND instructions. · 60b1f289
  Elena Demikhovsky authored Aug 13, 2013
```
Lowering for SETCC.

llvm-svn: 188265
```
  60b1f289
Aug 11, 2013
- AVX-512: Added VPERM* instructons and MOV* zmm-to-zmm instructions. · cf5b1458
  Elena Demikhovsky authored Aug 11, 2013
```
Added a test for shuffles using VPERM.

llvm-svn: 188147
```
  cf5b1458
Aug 08, 2013
- Fix the comment. · b5ab81d5
  Jakub Staszak authored Aug 08, 2013
```
llvm-svn: 187984
```
  b5ab81d5
Aug 07, 2013
- AVX-512 set: Added BROADCAST instructions · 45c54ad8
  Elena Demikhovsky authored Aug 07, 2013
```
with lowering logic and a test.

llvm-svn: 187884
```
  45c54ad8
Aug 06, 2013

Refactor isInTailCallPosition handling · a4415854

Tim Northover authored Aug 06, 2013

This change came about primarily because of two issues in the existing code.
Niether of:

define i64 @test1(i64 %val) {
  %in = trunc i64 %val to i32
  tail call i32 @ret32(i32 returned %in)
  ret i64 %val
}

define i64 @test2(i64 %val) {
  tail call i32 @ret32(i32 returned undef)
  ret i32 42
}

should be tail calls, and the function sameNoopInput is responsible. The main
problem is that it is completely symmetric in the "tail call" and "ret" value,
but in reality different things are allowed on each side.

For these cases:
1. Any truncation should lead to a larger value being generated by "tail call"
   than needed by "ret".
2. Undef should only be allowed as a source for ret, not as a result of the
   call.

Along the way I noticed that a mismatch between what this function treats as a
valid truncation and what the backends see can lead to invalid calls as well
(see x86-32 test case).

This patch refactors the code so that instead of being based primarily on
values which it recurses into when necessary, it starts by inspecting the type
and considers each fundamental slot that the backend will see in turn. For
example, given a pathological function that returned {{}, {{}, i32, {}}, i32}
we would consider each "real" i32 in turn, and ask if it passes through
unchanged. This is much closer to what the backend sees as a result of
ComputeValueVTs.

Aside from the bug fixes, this eliminates the recursion that's going on and, I
believe, makes the bulk of the code significantly easier to understand. The
trade-off is the nasty iterators needed to find the real types inside a
returned value.

llvm-svn: 187787

a4415854

Aug 05, 2013
- AVX-512 set: added mask operations, lowering BUILD_VECTOR for i1 vector types. · 40864b69
  Elena Demikhovsky authored Aug 05, 2013
```
Added intrinsics and tests.

llvm-svn: 187717
```
  40864b69
Aug 04, 2013

X86: Turn fp selects into mask operations. · 5bc180c1

Benjamin Kramer authored Aug 04, 2013

double test(double a, double b, double c, double d) { return a<b ? c : d; }

before:
_test:
	ucomisd	%xmm0, %xmm1
	ja	LBB0_2
	movaps	%xmm3, %xmm2
LBB0_2:
	movaps	%xmm2, %xmm0

after:
_test:
	cmpltsd	%xmm1, %xmm0
	andpd	%xmm0, %xmm2
	andnpd	%xmm3, %xmm0
	orpd	%xmm2, %xmm0

Small speedup on Benchmarks/SmallPT

llvm-svn: 187706

5bc180c1

Jul 31, 2013

Added INSERT and EXTRACT intructions from AVX-512 ISA. · 67b05fc0

Elena Demikhovsky authored Jul 31, 2013

All insertf*/extractf* functions replaced with insert/extract since we have insertf and inserti forms.
Added lowering for INSERT_VECTOR_ELT / EXTRACT_VECTOR_ELT for 512-bit vectors.
Added lowering for EXTRACT/INSERT subvector for 512-bit vectors.
Added a test.

llvm-svn: 187491

67b05fc0

Jul 09, 2013

AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all · 73de7bf5

Stephen Lin authored Jul 09, 2013

in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:

1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.

2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.

3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.

The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.

llvm-svn: 185956

73de7bf5

Jun 22, 2013
- The getRegForInlineAsmConstraint function should only accept MVT value types. · 295bd43a
  Chad Rosier authored Jun 22, 2013
```
llvm-svn: 184642
```
  295bd43a
Jun 07, 2013
- Don't cache the instruction and register info from the TargetMachine, because · 8f26840c
  Bill Wendling authored Jun 07, 2013
```
the internals of TargetMachine could change.

No functionality change intended.

llvm-svn: 183571
```
  8f26840c
May 25, 2013

Track IR ordering of SelectionDAG nodes 2/4. · ef9de2a7

Andrew Trick authored May 25, 2013

Change SelectionDAG::getXXXNode() interfaces as well as call sites of
these functions to pass in SDLoc instead of DebugLoc.

llvm-svn: 182703

ef9de2a7

May 18, 2013
- Add LLVMContext argument to getSetCCResultType · 75865923
  Matt Arsenault authored May 18, 2013
```
llvm-svn: 182180
```
  75865923
Apr 05, 2013

Use the target options specified on a function to reset the back-end. · eb108bad

Bill Wendling authored Apr 05, 2013

During LTO, the target options on functions within the same Module may
change. This would necessitate resetting some of the back-end. Do this for X86,
because it's a Friday afternoon.

llvm-svn: 178917

eb108bad

Mar 29, 2013
- Add support of RDSEED defined in AVX2 extension · a486a11d
  Michael Liao authored Mar 28, 2013
```
llvm-svn: 178314
```
  a486a11d
Mar 26, 2013
- Add XTEST codegen support · 03f9ad0e
  Michael Liao authored Mar 26, 2013
```
llvm-svn: 178083
```
  03f9ad0e
Mar 01, 2013

Fix PR10475 · 6af16fc3

Michael Liao authored Mar 01, 2013

- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364

6af16fc3

Feb 15, 2013
- The operand listing is very much outdated. · a1c6635c
  Eli Bendersky authored Feb 14, 2013
```
llvm-svn: 175220
```
  a1c6635c
Jan 29, 2013

Teach SDISel to combine fsin / fcos into a fsincos node if the following · 0e88c7d8

Evan Cheng authored Jan 29, 2013

conditions are met:
1. They share the same operand and are in the same BB.
2. Both outputs are used.
3. The target has a native instruction that maps to ISD::FSINCOS node or
   the target provides a sincos library call.

Implemented the generic optimization in sdisel and enabled it for
Mac OSX. Also added an additional optimization for x86_64 Mac OSX by
using an alternative entry point __sincos_stret which returns the two
results in xmm0 / xmm1.

rdar://13087969
PR13204

llvm-svn: 173755

0e88c7d8

Jan 28, 2013
- Fix inconsistent usage of PALIGN and PALIGNR when referring to the same instruction. · 8fb09f0a
  Craig Topper authored Jan 28, 2013
```
llvm-svn: 173667
```
  8fb09f0a
Jan 21, 2013
- Make helper method static. · 2cd37589
  Craig Topper authored Jan 21, 2013
```
llvm-svn: 173005
```
  2cd37589
Jan 20, 2013
- Capitalize lowerTRUNCATE so that it matches the other lower functions in this... · e65a08be
  Craig Topper authored Jan 20, 2013
```
Capitalize lowerTRUNCATE so that it matches the other lower functions in this file despite it not matching coding standards.

llvm-svn: 172994
```
  e65a08be
- Make LowerVSETCC a static function and use MVT instead of EVT. · ce61fdf0
  Craig Topper authored Jan 20, 2013
```
llvm-svn: 172969
```
  ce61fdf0
- Make some helper methods static. · 9976974c
  Craig Topper authored Jan 20, 2013
```
llvm-svn: 172936
```
  9976974c
- Capitalize LowerVectorIntExtend to be consistent with all the other lower functions in this file. · bb772d27
  Craig Topper authored Jan 19, 2013
```
llvm-svn: 172927
```
  bb772d27
Jan 09, 2013

Efficient lowering of vector sdiv when the divisor is a splatted power of two constant. · 977e0be4

Nadav Rotem authored Jan 09, 2013

PR 14848. The lowered sequence is based on the existing sequence the target-independent
DAG Combiner creates for the scalar case.

Patch by Zvi Rackover.

llvm-svn: 171953

977e0be4

Jan 07, 2013

Switch TargetTransformInfo from an immutable analysis pass that requires · 664e354d

Chandler Carruth authored Jan 07, 2013

a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.

The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.

The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.

The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.

The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.

The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.

The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.

The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.

Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.

Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.

Commits to update DragonEgg and Clang will be made presently.

llvm-svn: 171681

664e354d

Jan 04, 2013

LoopVectorizer: · e1d5c4b8

Nadav Rotem authored Jan 04, 2013

1. Add code to estimate register pressure.
2. Add code to select the unroll factor based on register pressure.
3. Add bits to TargetTransformInfo to provide the number of registers.

llvm-svn: 171469

e1d5c4b8

Jan 03, 2013

Add a subtype parameter to VTTI::getShuffleCost · 95de3f30

Hal Finkel authored Jan 03, 2013

In order to cost subvector insertion and extraction, we need to know
the type of the subvector being extracted.

No functionality change.

llvm-svn: 171453

95de3f30

Dec 28, 2012

CostModel: initial checkin for code that estimates the cost of special shuffles. · 9785f519
Nadav Rotem authored Dec 28, 2012
```
llvm-svn: 171180
```
9785f519

AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these... · 3da9ac72

Nadav Rotem authored Dec 28, 2012

AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these optimizations. The old test cases still cover all of these lowering/optimizations. The single change that we have is that now anyext does not need to zero a register, because it does not use the exact code path as the zero_extend.

llvm-svn: 171178

3da9ac72

Dec 27, 2012
- AVX/AVX2: Move the SEXT lowering code from a target specific DAGco to a lowering function. · 3b341901
  Nadav Rotem authored Dec 27, 2012
```
llvm-svn: 171170
```
  3b341901
Dec 21, 2012
- X86: Match the SSE/AVX min/max vector ops using a custom node instead of intrinsics · 4669d188
  Benjamin Kramer authored Dec 21, 2012
```
This is very mechanical, no functionality change. Preparation for PR14667.

llvm-svn: 170898
```
  4669d188
- Add a missing "virtual" keyword. · eacbb731
  Nadav Rotem authored Dec 21, 2012
```
llvm-svn: 170842
```
  eacbb731
- Improve the X86 cost model for loads and stores. · 6d4fdd6d
  Nadav Rotem authored Dec 21, 2012
```
llvm-svn: 170830
```
  6d4fdd6d
Dec 19, 2012
- Change TargetLowering::getTypeForExtArgOrReturn to take and return · e09cac9a
  Patrik Hagglund authored Dec 19, 2012
```
MVTs, instead of EVTs.

llvm-svn: 170537
```
  e09cac9a
- Change TargetLowering::findRepresentativeClass to take an MVT, instead · f9eb168e
  Patrik Hagglund authored Dec 19, 2012
```
of EVT.

llvm-svn: 170532
```
  f9eb168e
Dec 17, 2012
- Simplify BMI ANDN matching to use patterns instead of a DAG combine. Also add... · f3ff6ae0
  Craig Topper authored Dec 17, 2012
```
Simplify BMI ANDN matching to use patterns instead of a DAG combine. Also add ANDN to isDefConvertible.

llvm-svn: 170305
```
  f3ff6ae0
Dec 15, 2012

X86: Add a couple of target-specific dag combines that turn VSELECTS into psubus if possible. · b16ccde7

Benjamin Kramer authored Dec 15, 2012

We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases
if y is a constant. DAGCombiner canonicalizes those so we first have to undo the
canonicalization for those cases. The pattern occurs in gzip when the loop
vectorizer is enabled. Part of PR14613.

llvm-svn: 170273

b16ccde7

Dec 12, 2012

Sorry about the churn. One more change to getOptimalMemOpType() hook. Did I · 962711ee

Evan Cheng authored Dec 12, 2012

mention the inline memcpy / memset expansion code is a mess?

This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset.
The first indicates whether it is expanding a memset or a memcpy / memmove.
The later is whether the memset is a memset of zero. It's totally possible
(likely even) that targets may want to do different things for memcpy and
memset of zero.

llvm-svn: 169959

962711ee