Commits · 5f14ff75ec8035bc311a9df5d1445dd13709f4e9 · Roger Ferrer / llvm-epi

Sep 05, 2016

[X86][SSE] Regenerate odd shuffle tests with common prefixes · 5f14ff75
Simon Pilgrim authored Sep 05, 2016
```
llvm-svn: 280661
```
5f14ff75
[AVX512] Fix v8i1 /v16i1 zext + bitcast lowering pattern. Explicitly zero upper bits. · a2f8ca9a
Igor Breger authored Sep 05, 2016
```
Differential Revision: http://reviews.llvm.org/D23983

llvm-svn: 280650
```
a2f8ca9a

[AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with... · d9ca3d97

Craig Topper authored Sep 05, 2016

[AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space.

Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available.

llvm-svn: 280648

d9ca3d97

[X86] Add AVX and AVX512 command lines to the vec_ss_load_fold test. · 42b69dd9
Craig Topper authored Sep 05, 2016
```
llvm-svn: 280645
```
42b69dd9

Sep 04, 2016
- [X86] Regenerate x64 mmx/f64 return value tests · cded03f1
  Simon Pilgrim authored Sep 04, 2016
```
llvm-svn: 280634
```
  cded03f1
- [AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div... · 4177345d
  Craig Topper authored Sep 04, 2016
```
[AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div intrinsics and upgrade to native IR.

llvm-svn: 280633
```
  4177345d
- [X86] Regenerate trunc-store legalization test · 128047fd
  Simon Pilgrim authored Sep 04, 2016
```
llvm-svn: 280631
```
  128047fd
- [X86][SSE] Regenerate fcmp/uitofp combine tests · c228f5c1
  Simon Pilgrim authored Sep 04, 2016
```
llvm-svn: 280629
```
  c228f5c1
- revert r279960. · 7e2a0dfa
  Igor Breger authored Sep 04, 2016
```
https://llvm.org/bugs/show_bug.cgi?id=30249

llvm-svn: 280625
```
  7e2a0dfa
- EOL fixes · 9a36318c
  Simon Pilgrim authored Sep 04, 2016
```
llvm-svn: 280624
```
  9a36318c
- [AVX-512] Remove masked integer add/sub/mull intrinsics and upgrade to native IR. · af0d63d2
  Craig Topper authored Sep 04, 2016
```
llvm-svn: 280611
```
  af0d63d2
Sep 03, 2016

[Profile] preserve branch metadata lowering select in CGP · 241e6c70

Xinliang David Li authored Sep 03, 2016

CGP currently drops select's MD_prof profile data when
generating conditional branch which can lead to bad
code layout. The patch fixes the issue.

Differential Revision: http://reviews.llvm.org/D24169

llvm-svn: 280600

241e6c70

[AVX-512] Add integer ADD/SUB instructions to load folding tables. Add an... · 907b580d
Craig Topper authored Sep 03, 2016
```
[AVX-512] Add integer ADD/SUB instructions to load folding tables. Add an AVX512 stack folding test.

llvm-svn: 280593
```
907b580d
[AVX-512] Add EVEX encoded VPCMPEQ and VPCMPGT to the load folding tables. · 892ce569
Craig Topper authored Sep 03, 2016
```
llvm-svn: 280581
```
892ce569

Fix buildbot error. · c37307a5

Wei Mi authored Sep 03, 2016

Add -mtriple=x86_64-unknown-linux-gnu for the test and move it to CodeGen/X86.

llvm-svn: 280568

c37307a5

Sep 02, 2016

[DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift. · fd503e5a

Andrea Di Biagio authored Sep 02, 2016

This fixes a regression introduced by revision 268094.
Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2

That rule converts a truncate of a shift-by-constant into a shift of a truncated
value. We do this only if the shift count is less than half the size in bits of
the truncated value (K < vt.size / 2).

The problem is that the constraint on the shift count is incorrect, so the rule
doesn't work well in some cases involving vector types. The combine rule should
have been written instead like this:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()

Basically, if K is smaller than the "scalar size in bits" of the truncated value
then we know that by "sinking" the truncate into the operand of the shift we
would never accidentally make the shift undefined.

This patch fixes the check on the shift count, and adds test cases to make sure
that we don't regress the behavior.

Differential Revision: https://reviews.llvm.org/D24154

llvm-svn: 280482

fd503e5a

[AVX-512] Move tests for masked floating point logical operations to... · ad79bf47

Craig Topper authored Sep 02, 2016

[AVX-512] Move tests for masked floating point logical operations to avx512dqvl-intrinsics-upgrade.ll since they have now been autoupgraded.

llvm-svn: 280467

ad79bf47

[AVX-512] Add more patterns for masked and broadcasted logical operations... · 45d65030

Craig Topper authored Sep 02, 2016

[AVX-512] Add more patterns for masked and broadcasted logical operations where the select or broadcast has a floating point type.

These are needed in order to remove the masked floating point logical operation intrinsics and use native IR.

llvm-svn: 280465

45d65030

[AVX-512] Add execution domain fixing for logical operations with broadcast... · 00aecd97

Craig Topper authored Sep 02, 2016

[AVX-512] Add execution domain fixing for logical operations with broadcast loads. This builds on the handling of masked ops since we need to keep element size the same.

llvm-svn: 280464

00aecd97

[Legalizer] Don't throw away false low half when expanding GT/LT SETCC · 7bc54ceb

Michael Kuperstein authored Sep 01, 2016

When expanding a SETCC for which the low half is known to evaluate to false,
we can only throw it away for LT/GT comparisons, not LE/GE.

This fixes PR29170.

Differential Revision: https://reviews.llvm.org/D24151

llvm-svn: 280424

7bc54ceb

Sep 01, 2016

[SelectionDAG] Generate vector_shuffle nodes for undersized result vector sizes · 5f17d08f

Michael Kuperstein authored Sep 01, 2016

Prior to this, we could generate a vector_shuffle from an IR shuffle when the
size of the result was exactly the sum of the sizes of the input vectors.
If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle
with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts
and inserts.

Instead, we can form a larger vector_shuffle, and then extract a subvector
of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8>
and then extract a <12 x i8>.

This also includes a target-specific X86 combine that in the presence of
AVX2 combines:
(vector_shuffle <mask> (concat_vectors t1, undef)
                       (concat_vectors t2, undef))
into:
(vector_shuffle <mask> (concat_vectors t1, t2), undef)
in cases where this allows us to form VPERMD/VPERMQ.

(This is not a separate commit, as that pattern does not appear without
the DAGBuilder change.)

llvm-svn: 280418

5f17d08f

[X86] Loosen memory folding requirements for cvtdq2pd and cvtps2pd instructions. · cde38b6a

Andrey Turetskiy authored Sep 01, 2016

According to spec cvtdq2pd and cvtps2pd instructions don't require memory operand to be aligned
to 16 bytes. This patch removes this requirement from the memory folding table.

Differential Revision: https://reviews.llvm.org/D23919

llvm-svn: 280402

cde38b6a

[DAGCombine] Don't fold a trunc if it feeds an anyext · 65bc3c89

Michael Kuperstein authored Sep 01, 2016

Legalization tends to create anyext(trunc) patterns. This should always be
combined - into either a single trunc, a single ext, or nothing if the
types match exactly. But if we happen to combine the trunc first, we may pull
the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
-> extract(bitcast) fold).

To prevent this, we can avoid doing the fold, similarly to how we already handle
fpround(fpextend).

Differential Revision: https://reviews.llvm.org/D23893

llvm-svn: 280386

65bc3c89

Optimized FMA intrinsic + FNEG , like · 4d7738df

Elena Demikhovsky authored Sep 01, 2016

-(a*b+c)

and FNEG + FMA, like
a*b-c or (-a)*b+c.

The bug description is here :  https://llvm.org/bugs/show_bug.cgi?id=28892

Differential revision: https://reviews.llvm.org/D23313

llvm-svn: 280368

4d7738df

[XRay] Detect and emit sleds for sibling/tail calls · e8ae5baa

Dean Michael Berris authored Sep 01, 2016

Summary:
This change promotes the 'isTailCall(...)' member function to
TargetInstrInfo as a query interface for determining on a per-target
basis whether a given MachineInstr is a tail call instruction. We build
upon this in the XRay instrumentation pass to emit special sleds for
tail call optimisations, where we emit the correct kind of sled.

The tail call sleds look like a mix between the function entry and
function exit sleds. Form-wise, the sled comes before the "jmp"
instruction that implements the tail call similar to how we do it for
the function entry sled. Functionally, because we know this is a tail
call, it behaves much like an exit sled -- i.e. at runtime we may use
the exit trampolines instead of a different kind of trampoline.

A follow-up change to recognise these sleds will be done in compiler-rt,
so that we can start intercepting these initially as exits, but also
have the option to have different log entries to more accurately reflect
that this is actually a tail call.

Reviewers: echristo, rSerge, majnemer

Subscribers: mehdi_amini, dberris, llvm-commits

Differential Revision: https://reviews.llvm.org/D23986

llvm-svn: 280334

e8ae5baa

Aug 31, 2016

GlobalISel: use G_TYPE to annotate physregs with a type. · 11a23546

Tim Northover authored Aug 31, 2016

More preparation for dropping source types from MachineInstrs: regsters coming
out of already-selected code (i.e. non-generic instructions) don't have a type,
but that information is needed so we must add it manually.

This is done via a new G_TYPE instruction.

llvm-svn: 280292

11a23546

[statepoints][experimental] Add support for live-in semantics of values in deopt bundles · 2b1084ac

Philip Reames authored Aug 31, 2016

This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.

The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.

Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)

My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.

Differential Revision: https://reviews.llvm.org/D24000

llvm-svn: 280250

2b1084ac

[X86][SSE] Improve awareness of (v)cvtpd2ps implicit zeroing of upper 64-bits of xmm result · 6199b4fd

Simon Pilgrim authored Aug 31, 2016

Associate x86_sse2_cvtpd2ps with X86ISD::VFPROUND to avoid inserting unnecessary zeroing shuffles.

Differential Revision: https://reviews.llvm.org/D23797

llvm-svn: 280249

6199b4fd

[X86][SSE] Improve awareness of fptrunc implicit zeroing of upper 64-bits of xmm result · 7b09af19

Simon Pilgrim authored Aug 31, 2016

Add patterns to avoid inserting unnecessary zeroing shuffles when lowering fptrunc to (v)cvtpd2ps

Differential Revision: https://reviews.llvm.org/D23797

llvm-svn: 280214

7b09af19

[AVX-512] Add patterns to select masked logical operations if the select has a floating point type. · 8f6827c9
Craig Topper authored Aug 31, 2016
```
This is needed in order to replace the masked floating point logical op intrinsics with native IR.

llvm-svn: 280195
```
8f6827c9

[AVX-512] Add test cases for masked floating point logic operations with... · 0f8fb476

Craig Topper authored Aug 31, 2016

[AVX-512] Add test cases for masked floating point logic operations with bitcasts between the logic ops and the select. We don't currently select masked operations for these cases.

Test cases taken from optimized clang output after trying to convert the masked floating point logical op intrinsics to native IR.

llvm-svn: 280194

0f8fb476

[X86] Regenerate a test using update_llc_test_checks.py. · de8b1a00
Craig Topper authored Aug 31, 2016
```
llvm-svn: 280193
```
de8b1a00
[XRay] Support multiple return instructions in a single basic block · 047669f1
Dean Michael Berris authored Aug 31, 2016
```
Add a .mir test to catch this case, and fix the xray-instrumentation
pass to handle it appropriately.

llvm-svn: 280192
```
047669f1

Aug 29, 2016

Make vec_fabs.ll pass with MSVC 2013 · cfec5ff1
Reid Kleckner authored Aug 29, 2016
```
We should revert this change once we drop support for MSVC 2013.

llvm-svn: 279979
```
cfec5ff1

[TargetLowering] remove fdiv and frem from canOpTrap() (PR29114) · b57d0a2f

Sanjay Patel authored Aug 29, 2016

Assuming the default FP env, we should not treat fdiv and frem any differently in terms of 
trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env.

This matches how we treat these ops in IR with isSafeToSpeculativelyExecute(). There's a 
similar bug in Constant::canTrap().

This bug manifests in PR29114:
https://llvm.org/bugs/show_bug.cgi?id=29114
...as a sequence of scalar divisions instead of a vector division on x86 for a <3 x float> 
type.

Differential Revision: https://reviews.llvm.org/D23974

llvm-svn: 279970

b57d0a2f

Do not use MRI::getMaxLaneMaskForVReg as a mask covering whole register · 0a955d6d

Krzysztof Parzyszek authored Aug 29, 2016

MRI::getMaxLaneMaskForVReg does not always cover the whole register.
For example, on X86 the upper 16 bits of EAX cannot be accessed via
any subregister. Consequently, there is no lane mask that only covers
that part of EAX. The getMaxLaneMaskForVReg will return the union of
the lane masks for all subregisters, and in case of EAX, that union
will not cover the upper 16 bits.

This fixes https://llvm.org/bugs/show_bug.cgi?id=29132

llvm-svn: 279969

0a955d6d

Fixed a bug in type legalizer for masked gather. · 24281b47

Igor Breger authored Aug 29, 2016

The problem occurs when the Node doesn't updated in place , UpdateNodeOperation() return the node that already exist.
In this case assert fail in PromoteIntegerOperand() , N have 2 results ( val + chain).

Differential Revision: http://reviews.llvm.org/D23756

llvm-svn: 279961

24281b47

[AVX512] In some cases KORTEST instruction may be used instead of ZEXT + TEST sequence. · 1a388871
Igor Breger authored Aug 29, 2016
```
Differential Revision: http://reviews.llvm.org/D23490

llvm-svn: 279960
```
1a388871

[X86] Don't lower FABS/FNEG masking directly to a ConstantPool load. Just... · 713085e6

Craig Topper authored Aug 29, 2016

[X86] Don't lower FABS/FNEG masking directly to a ConstantPool load. Just create a ConstantFPSDNode and let that be lowered.

This allows broadcast loads to used when available.

llvm-svn: 279958

713085e6

[AVX-512] Add 512-bit fabs tests with and without AVX512DQ. · 71584cd0
Craig Topper authored Aug 29, 2016
```
llvm-svn: 279956
```
71584cd0