Commits · b3f802265e048991c43fc004bf0645c7156fb83e · Roger Ferrer / llvm-epi

Jan 08, 2018
- [llvm-readobj] Support -needed-libs option for Mach-O files · b3f80226
  Petr Hosek authored Jan 08, 2018
```
This implements the -needed-libs option in Mach-O dumper.

Differential Revision: https://reviews.llvm.org/D41527

llvm-svn: 321980
```
  b3f80226
- [X86] Simplify some code in lower1BitVectorShuffle by relying on getNode's... · 9f5859e3
  Craig Topper authored Jan 07, 2018
```
[X86] Simplify some code in lower1BitVectorShuffle by relying on getNode's ability to constant fold vector SIGN_EXTEND.

llvm-svn: 321979
```
  9f5859e3
- [X86] Add VSHUFF32X4 and similar instructions to load folding tables. · 03d8e516
  Craig Topper authored Jan 07, 2018
```
llvm-svn: 321978
```
  03d8e516
Jan 07, 2018

Revert "[SCCP] Manually fold branches on undef." · e15bffe9

Davide Italiano authored Jan 07, 2018

I thought this was responsible for PR35723, but I was
wrong, the issue lies elsewhere. Revert while I debug.

llvm-svn: 321975

e15bffe9

[SLPVectorizer] Reintroduce std::stable_sort(properlyDominates()). · 4c39758a

Davide Italiano authored Jan 07, 2018

The approach was never discussed, I wasn't able to reproduce this
non-determinism, and the original author went AWOL.
After a discussion on the ML, Philip suggested to revert this.

llvm-svn: 321974

4c39758a

[X86] Revert accidental change to CMakeLists.txt in r321952 · e9f44e1b

Craig Topper authored Jan 07, 2018

I had removed the qualifiers around the autogenerated folding table so I could compare with the manual table, but didn't intend to commit the change.

llvm-svn: 321971

e9f44e1b

X86 Tests: Add Tests for PMADDWD selection. NFC. · 93b8bd49
Zvi Rackover authored Jan 07, 2018
```
Support for ISel to be added.

llvm-svn: 321970
```
93b8bd49

[DAG] Fix for Bug PR34620 - Allow SimplifyDemandedBits to look through bitcasts · 998180da

Simon Pilgrim authored Jan 07, 2018

Allow SimplifyDemandedBits to use TargetLoweringOpt::computeKnownBits to look through bitcasts. This can help simplifying in some cases where bitcasts of constants generated during or after legalization can't be folded away, and thus didn't get picked up by SimplifyDemandedBits. This fixes PR34620, where a redundant pand created during legalization from lowering and lshr <16xi8> wasn't being simplified due to the presence of a bitcasted build_vector as an operand.

Committed on the behalf of @sameconrad (Sam Conrad)

Differential Revision: https://reviews.llvm.org/D41643

llvm-svn: 321969

998180da

[X86] Remove unneeded code from combineGatherScatter that used to delte... · c1ec57c3

Craig Topper authored Jan 07, 2018

[X86] Remove unneeded code from combineGatherScatter that used to delte SIGN_EXTEND_INREG nodes created during legalization of v2i1/v4i1 masks on KNL.

v2i1/v4i1 are now legal on KNL so no sign_extend_inreg is generated.

llvm-svn: 321968

c1ec57c3

[X86] Make v2i1 and v4i1 legal types without VLX · d58c1655

Craig Topper authored Jan 07, 2018

Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.

It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.

This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.

We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.

I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.

There's definitely room for improvement with some follow up patches.

Reviewers: RKSimon, zvi, guyblank

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41560

llvm-svn: 321967

d58c1655

[LV][VPlan] NFC patch to move LoopVectorizationPlanner class out of LoopVectorize.cpp · 0f1314c5

Hal Finkel authored Jan 07, 2018

Another small step forward to move VPlan stuff outside of LoopVectorize.cpp.

VPlanBuilder.h is renamed to LoopVectorizationPlanner.h
LoopVectorizationPlanner class is moved from LoopVectorize.cpp to
LoopVectorizationPlanner.h LoopVectorizationCostModel::VectorizationFactor
class is moved to LoopVectorizationPlanner.h (used by the planner class) ---
this needs further streamlining work in later patches and thus all I did was
take it out of the CostModel class and moved to the header file. The callback
function had to stay inside LoopVectorize.cpp since it calls an
InnerLoopVectorizer member function declared in it. Next Steps: Make
InnerLoopVectorizer, LoopVectorizationCostModel, and other classes more modular
and more aligned with VPlan direction, in small increments.

Previous step was: r320900 (https://reviews.llvm.org/D41045)

Patch by Hideki Saito, thanks!

Differential Revision: https://reviews.llvm.org/D41420

llvm-svn: 321962

0f1314c5

[CodeExtractor] Use subset of function attributes for extracted function. · 55be37e7

Florian Hahn authored Jan 07, 2018

In addition to target-dependent attributes, we can also preserve a
white-listed subset of target independent function attributes. The white-list
excludes problematic attributes, most prominently:

* attributes related to memory accesses, as alloca instructions
  could be moved in/out of the extracted block

* control-flow dependent attributes, like no_return or thunk, as the
  relerelevant instructions might or might not get extracted.

Thanks @efriedma and @aemerson for providing a set of attributes that cannot be
propagated.


Reviewers: efriedma, davidxl, davide, silvas

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D41334

llvm-svn: 321961

55be37e7

[PowerPC] Add an ISD::TRUNCATE to the legalization for ppc_is_decremented_ctr_nonzero · d461aefe

Craig Topper authored Jan 07, 2018

Summary:
I believe legalization is really expecting that ReplaceNodeResults will return something with the same type as the thing that's being legalized. Ultimately, it uses the output to replace the uses in the DAG so the type should match to make that work.

There are two relevant cases here. When crbits are enabled, then i1 is a legal type and getSetCCResultType should return i1. In this case, the truncate will be between i1 and i1 and should be removed (SelectionDAG::getNode does this). Otherwise, getSetCCResultType will be i32 and the legalizer will promote the truncate to be i32 -> i32 which will be similarly removed.

With this fixed we can remove some code from PromoteIntRes_SETCC that seemed to only exist to deal with the intrinsic being replaced with a larger type without changing the other operand. With the truncate being used for connectivity this doesn't happen anymore.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: nemanjai, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D41654

llvm-svn: 321959

d461aefe

[X86] Add the 16 and 8-bit CRC32 instructions to the load folding tables. · a21f5511
Craig Topper authored Jan 07, 2018
```
llvm-svn: 321958
```
a21f5511

[X86] Correct the load folding flags for xmm fp->mmx conversion instructions. · d0859a03

Craig Topper authored Jan 07, 2018

The instructions that load 64-bits or an xmm register should be TB_NO_REVERSE to avoid the load being widened during unfold. The instructions that load 128-bits need to ensure 128-bit alignment.

llvm-svn: 321956

d0859a03

[X86] Add TB_NO_REVERSE to some scalar intrinsic instructions in the load folding table. · aa739411
Craig Topper authored Jan 07, 2018
```
llvm-svn: 321955
```
aa739411
[X86] Don't put any EVEX_B instructions in the tablegen generated load folding tables. · 85657d59
Craig Topper authored Jan 07, 2018
```
EVEX_B means different things for memory and register forms. The instructions should not be considered equivalent.

llvm-svn: 321954
```
85657d59
[X86] Add 128 and 256-bit VPOPCNTD/Q instructions to load folding tables. · 89293a2a
Craig Topper authored Jan 07, 2018
```
llvm-svn: 321953
```
89293a2a
[X86] Add some 8 and 16-bit instructions to the load folding tables. · a124ab10
Craig Topper authored Jan 07, 2018
```
llvm-svn: 321952
```
a124ab10
[X86] Add EVEX vcvtph2ps to the load folding tables. · 11aede13
Craig Topper authored Jan 07, 2018
```
llvm-svn: 321951
```
11aede13

[X86] Remove cvtps2ph xmm->xmm from store folding tables. Add the evex... · 40cc8338

Craig Topper authored Jan 07, 2018

[X86] Remove cvtps2ph xmm->xmm from store folding tables. Add the evex versions of cvtps2ph to the store folding tables.

The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros.

llvm-svn: 321950

40cc8338

[X86] Add CMP8ri8 to load folding tables. · 8fa800b8
Craig Topper authored Jan 07, 2018
```
llvm-svn: 321949
```
8fa800b8

Jan 06, 2018

[X86] Remove assembler predicates from all AVX512 related feature flags. · cf93feb9

Craig Topper authored Jan 06, 2018

We don't do fine grained feature control like this on features prior to AVX512.

We do still have checks in place in the assembly parser itself that prevents %zmm references or %xmm16-31 from being parsed without at least -mattr=avx512f. Same for rounding control and mask operands. That will prevent the table matcher from matching for any instructions that need those features and that's probably good enough.

llvm-svn: 321947

cf93feb9

[X86] Remove memory forms of EVEX encoded vcvttss2si/vcvttsd2si from asm matcher table. · 61d8a60e
Craig Topper authored Jan 06, 2018
```
This is also needed to fix PR35837.

llvm-svn: 321946
```
61d8a60e
[X86] Add load folding pattern to EVEX vcvttss2si/vcvtsd2si. · 0f4ccb78
Craig Topper authored Jan 06, 2018
```
llvm-svn: 321945
```
0f4ccb78

[X86] Remove an unnecessary VCVTTSD2SIrrb/VCVTSS2SIrrb instruction with no... · 90353a9f

Craig Topper authored Jan 06, 2018

[X86] Remove an unnecessary VCVTTSD2SIrrb/VCVTSS2SIrrb instruction with no isel pattern that only existed for the assembler. Use VCVTTSD2SIrrb_Int instead.

For consistency use the _Int version of VCVTTSD2SIrr_Int and VCVTTSD2SIrm_Int for the assembler as well.

llvm-svn: 321944

90353a9f

[InlineFunction] Preserve calling convention when forwarding VarArgs. · a82eef23

Florian Hahn authored Jan 06, 2018

Reviewers: efriedma, rnk, davide

Reviewed By: rnk, davide

Differential Revision: https://reviews.llvm.org/D41556

llvm-svn: 321943

a82eef23

[InlineFunction] Preserve attributes when forwarding VarArgs. · de10e6e0

Florian Hahn authored Jan 06, 2018

Reviewers: rnk, efriedma

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D41555

llvm-svn: 321942

de10e6e0

[ORC] Remove AsynchronousSymbolQuery while I debug an issue on one of the · 0b93cd73
Lang Hames authored Jan 06, 2018
```
builders.

llvm-svn: 321941
```
0b93cd73

[InlineFunction] Inline vararg functions that do not access varargs. · 80788d80

Florian Hahn authored Jan 06, 2018

If the varargs are not accessed by a function, we can inline the
function.

Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D41335

llvm-svn: 321940

80788d80

[X86] Remove memory forms of EVEX encoded vcvtsd2si/vcvtss2si from the assembler matcher table · a49c354a

Craig Topper authored Jan 06, 2018

We should always prefer the VEX encoded version of these instructions. There is no advantage to the EVEX version.

Fixes PR35837.

llvm-svn: 321939

a49c354a

[TableGen] Make the ambiguous match debug messages from the AsmMatcherEmitter slightly more useful. · ad89541a
Craig Topper authored Jan 06, 2018
```
Don't report ambiguous matches on different variants. Print the variant number in the output.

llvm-svn: 321938
```
ad89541a

[InstCombine] relax use constraint for min/max (~a, ~b) --> ~min/max(a, b) · 26a6fcde

Sanjay Patel authored Jan 06, 2018

In the minimal case, this won't remove instructions, but it still improves
uses of existing values.

In the motivating example from PR35834, it does remove instructions, and
sets that case up to be optimized by something like D41603:
https://reviews.llvm.org/D41603

llvm-svn: 321936

26a6fcde

[InstCombine] add more tests for max(~a, ~b) and PR35834; NFC · f7e77529
Sanjay Patel authored Jan 06, 2018
```
llvm-svn: 321935
```
f7e77529

[x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325) · 5a48aef3

Sanjay Patel authored Jan 06, 2018

This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325

We're trading branch and compares for loads and logic ops. 
This makes the code smaller and hopefully faster in most cases.

The 24-byte test shows an interesting construct: we load the trailing scalar 
elements into vector registers and generate the same pcmpeq+movmsk code that 
we expected for a pair of full vector elements (see the 32- and 64-byte tests).

Differential Revision: https://reviews.llvm.org/D41714

llvm-svn: 321934

5a48aef3

[X86] Rename the EVEX encoded GFNI instructions to start with a 'V'. NFC · b18d6221
Craig Topper authored Jan 06, 2018
```
This makes the names consistent with the mnemonics like every other instruction.

llvm-svn: 321931
```
b18d6221

[X86] When parsing rounding mode operands, provide a proper end location so we... · 36d8da33

Craig Topper authored Jan 06, 2018

[X86] When parsing rounding mode operands, provide a proper end location so we don't crash when trying to print an error message using it.

llvm-svn: 321930

36d8da33

[X86] Call lowerShuffleAsRepeatedMaskAndLanePermute from lowerV4I64VectorShuffle. · 8c2ea74e
Craig Topper authored Jan 06, 2018
```
llvm-svn: 321929
```
8c2ea74e
[X86] Run dos2unix on a test file. NFC · af1d2575
Craig Topper authored Jan 06, 2018
```
llvm-svn: 321928
```
af1d2575
[ORC] Yet more debugging output to diagnose test failures. · 4b6cae19
Lang Hames authored Jan 06, 2018
```
llvm-svn: 321927
```
4b6cae19