Commits · 1d68112c4b9ec7502ed9776555fd499cb2483347 · Lorenzo Albano / LLVM bpEVL

Jan 25, 2018

[InstCombine] narrow masked zexted binops (PR35792) · 1d68112c

Sanjay Patel authored Jan 25, 2018

This is guarded by shouldChangeType(), so the tests show that
we don't do the fold if the narrower type is not legal. Note
that there is a proposal (D42424) that would change the results
for the specific cases shown in these tests. That difference is
also discussed in PR35792:
https://bugs.llvm.org/show_bug.cgi?id=35792

Alive proofs for the cases handled here as well as the bitwise 
logic binops that we should already do better on:
https://rise4fun.com/Alive/c97
https://rise4fun.com/Alive/Lc5E
https://rise4fun.com/Alive/kdf

llvm-svn: 323437

1d68112c

[InstCombine] add tests for PR35792; NFC · 0f95dd23
Sanjay Patel authored Jan 25, 2018
```
llvm-svn: 323436
```
0f95dd23
Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." · a0b2c78e
Alexey Bataev authored Jan 25, 2018
```
This reverts commit r323430 to fix buildbots.

llvm-svn: 323432
```
a0b2c78e

[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. · ad51fe36

Alexey Bataev authored Jan 25, 2018

Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323430

ad51fe36

[X86][SSE] Add tests for vector truncation with signed saturation · fb01d066
Simon Pilgrim authored Jan 25, 2018
```
AVX512 isn't using X86ISD::VTRUNCS and SSE/AVX isn't using PACKSS/PACKUS

llvm-svn: 323428
```
fb01d066
[X86][SSE] Add tests for vector truncation with unsigned saturation · e59bf81e
Simon Pilgrim authored Jan 25, 2018
```
AVX512 tends to do a good job, but there are some missed opportunities with SSE/AVX

llvm-svn: 323422
```
e59bf81e

X86 Tests: Add AVX+XOP config to SDIV combine tests · 0fb9638e

Zvi Rackover authored Jan 25, 2018

As pointed out in D42479, XOP also needs to be covered as it supports
vector shifts with variable shift amount.

llvm-svn: 323418

0fb9638e

Another try to commit 323321 (aggressive instruction combine). · f1f57a31
Amjad Aboud authored Jan 25, 2018
```
llvm-svn: 323416
```
f1f57a31

[GlobalOpt] Emit fragments using field offsets from struct layout · 886edf8f

Mikael Holmen authored Jan 25, 2018

Summary:
When creating the debug fragments for a SRA'd struct, use the fields'
offsets, taken from the struct layout, as the offsets for the resulting
fragments. This fixes an issue where GlobalOpt would emit fragments with
incorrect offsets for padded fields.

This should solve PR36016.

Patch by David Stenberg.

Reviewers: aprantl

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42489

llvm-svn: 323411

886edf8f

[IRMover] Add comment and fix test case · 41e45955
Eugene Leviant authored Jan 25, 2018
```
llvm-svn: 323407
```
41e45955

[X86] Expand IMUL/MUL instregexs in Intel scheduler models. Add load latency... · b369cdba

Craig Topper authored Jan 25, 2018

[X86] Expand IMUL/MUL instregexs in Intel scheduler models. Add load latency to some of them in SkylakeClient model.

The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results.

This changes them all to use explicit instrs instead.

While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too.

llvm-svn: 323406

b369cdba

[X86] Remove 64/128/256 from MMX/SSE/AVX instruction names for overall consistency. NFC · dbddac09

Craig Topper authored Jan 25, 2018

MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation.
SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128.
AVX2 instructions should use a Y to indicate 256-bits.

llvm-svn: 323402

dbddac09

Jan 24, 2018

[GlobalISel] Add a requires: asserts to a test. · 5ee03988
Amara Emerson authored Jan 24, 2018
```
llvm-svn: 323384
```
5ee03988

[InstCombine] fix datalayout in test file · 60c13c77

Sanjay Patel authored Jan 24, 2018

The only part of the datalayout that should matter for these tests
is the part that specifies the legal int widths ('n*'). But there
was a bug - that part of the string was not correctly separated with
the expected '-' character, so we were testing as if there were no
legal int widths at all. Removed the leading cruft so we have some 
legal ints to test with.

I noticed this while testing a potential change to the way we 
transform shifts and sexts in D42424.

llvm-svn: 323377

60c13c77

[AArch64][GlobalISel] Fall back during AArch64 isel if we have a volatile load. · 4f84f886

Amara Emerson authored Jan 24, 2018

The tablegen imported patterns for sext(load(a)) don't check for single uses
of the load or delete the original after matching. As a result two loads are
left in the generated code. This particular issue will be fixed by adding
support for a G_SEXTLOAD opcode in future.

There are however other potential issues around this that wouldn't be fixed by
a G_SEXTLOAD, so until we have a proper solution we don't try to handle volatile
loads at all in the AArch64 selector.

Fixes/works around PR36018.

llvm-svn: 323371

4f84f886

[GlobalISel] Don't fall back to FastISel. · f386e2b0

Amara Emerson authored Jan 24, 2018

Apparently checking the pass structure isn't enough to ensure that we don't fall
back to FastISel, as it's set up as part of the SelectionDAGISel.

llvm-svn: 323369

f386e2b0

[X86][SSE] Aggressively use PMADDWD for v4i32 multiplies with 17 or more leading zeros · 9f551ad6

Simon Pilgrim authored Jan 24, 2018

As discussed in D41484, PMADDWD for 'zero extended' vXi32 is nearly always a better option than PMULLD:
On SNB it will result in code that isn't any faster, but not any slower so we may as well keep it.
On KNL it only has half the throughput, so I've disabled it on there - ideally there'd be a better way than this.

Differential Revision: https://reviews.llvm.org/D42258

llvm-svn: 323367

9f551ad6

[X86][SSE] Add slow-pmulld attribute (silvermont-style) test · 21f17d40
Simon Pilgrim authored Jan 24, 2018
```
Requested by @zvi on D42258

llvm-svn: 323364
```
21f17d40
Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." · 0affccc8
Alexey Bataev authored Jan 24, 2018
```
This reverts commit r323348 because of the broken buildbots.

llvm-svn: 323359
```
0affccc8
Revert "[ThinLTO] Add call edges' relative block frequency to per-module summary." · bf38deef
Easwaran Raman authored Jan 24, 2018
```
Causes buildbot regressions.

llvm-svn: 323358
```
bf38deef

Revert r321751, "StructurizeCFG: Fix broken backedge detection" · 4afb64e4

Nicolai Haehnle authored Jan 24, 2018

It causes regressions in various OpenGL test suites.

Keep the test cases introduced by r321751 as XFAIL, and add a test case
for the regression.

Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015
llvm-svn: 323355

4afb64e4

[ARM] Expand long shifts for Thumb1 to __aeabi_ calls · 665784f1

Weiming Zhao authored Jan 24, 2018

Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.

Reviewers: samparker

Reviewed By: samparker

Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42401

llvm-svn: 323354

665784f1

[X86] Fix some inconsistencies in the itineraries and Sched for (V)PEXTRW/(V)PINSRW · 05af43fb
Craig Topper authored Jan 24, 2018
```
The weirdest being that PEXTRWrr was tagged as a memory operation.

llvm-svn: 323353
```
05af43fb

[X86] Adjust names of PINSRW/PEXTRW intructions between MMX/SSE/AVX/AVX512 for... · b85b484f

Craig Topper authored Jan 24, 2018

[X86] Adjust names of PINSRW/PEXTRW intructions between MMX/SSE/AVX/AVX512 for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI

llvm-svn: 323352

b85b484f

[ThinLTO] Add call edges' relative block frequency to per-module summary. · 5f7aff9a

Easwaran Raman authored Jan 24, 2018

Summary:
This allows relative block frequency of call edges to be passed to the
thinlink stage where it will be used to compute synthetic entry counts
of functions.

Reviewers: tejohnson, pcc

Subscribers: mehdi_amini, llvm-commits, inglorion

Differential Revision: https://reviews.llvm.org/D42212

llvm-svn: 323349

5f7aff9a

[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. · 4bd8e533

Alexey Bataev authored Jan 24, 2018

Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38697

llvm-svn: 323348

4bd8e533

[Hexagon] Run late copy propagation and dead code elimination passes · cf3ad584
Krzysztof Parzyszek authored Jan 24, 2018
```
llvm-svn: 323346
```
cf3ad584

InstSimplify: If divisor element is undef simplify to undef · 51f0d64b

Zvi Rackover authored Jan 24, 2018

Summary:
If any vector divisor element is undef, we can arbitrarily choose it be
zero which would make the div/rem an undef value by definition.

Reviewers: spatel, reames

Reviewed By: spatel

Subscribers: magabari, llvm-commits

Differential Revision: https://reviews.llvm.org/D42485

llvm-svn: 323343

51f0d64b

[ValueTracking] add recursion depth param to matchSelectPattern · 1d91ec34

Sanjay Patel authored Jan 24, 2018

We're getting bug reports:
https://bugs.llvm.org/show_bug.cgi?id=35807
https://bugs.llvm.org/show_bug.cgi?id=35840
https://bugs.llvm.org/show_bug.cgi?id=36045
...where we blow up the stack in value tracking because other passes are sending 
in selects that have an operand that is itself the select.

We don't currently have a reliable way to avoid analyzing dead code that may take 
non-standard forms, so bail out when things go too far.

This mimics the recursion depth limitations in other parts of value tracking.

Unfortunately, this pushes the underlying problems for other passes (jump-threading,
simplifycfg, correlated-propagation) into hiding. If someone wants to uncover those
again, the first draft of this patch on Phab would do that (it would assert rather
than bail out).

Differential Revision: https://reviews.llvm.org/D42442

llvm-svn: 323331

1d91ec34

X86 Tests: Add more sdiv combine cases. NFC · 22bfa7e5
Zvi Rackover authored Jan 24, 2018
```
Add cases with vector non-splat pow2 contant divider.

llvm-svn: 323329
```
22bfa7e5
Regenerate shuffle sink test · f15886eb
Simon Pilgrim authored Jan 24, 2018
```
llvm-svn: 323328
```
f15886eb
Reverted 323321. · d53504e3
Amjad Aboud authored Jan 24, 2018
```
llvm-svn: 323326
```
d53504e3

[AArch64] Avoid unnecessary vector byte-swapping in big-endian · 9b3d4c01

Pablo Barrio authored Jan 24, 2018

Summary:
Loads/stores of some NEON vector types are promoted to other vector
types with different lane sizes but same vector size. This is not a
problem in little-endian but, when in big-endian, it requires
additional byte reversals required to preserve the lane ordering
while keeping the right endianness of the data inside each lane.
For example:

%1 = load <4 x half>, <4 x half>* %p

results in the following assembly:

ld1 { v0.2s }, [x1]
rev32 v0.4h, v0.4h

This patch changes the promotion of these loads/stores so that the
actual vector load/store (LD1/ST1) takes care of the endianness
correctly and there is no need for further byte reversals. The
previous code now results in the following assembly:

ld1 { v0.4h }, [x1]

Reviewers: olista01, SjoerdMeijer, efriedma

Reviewed By: efriedma

Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D42235

llvm-svn: 323325

9b3d4c01

[DebugInfo] Emit DWARF reference for DIVariable 'count' in DISubrange · dc00becd

Sander de Smalen authored Jan 24, 2018

Summary:
This patch implements the codegen of DWARF debug info for non-constant
'count' fields for DISubrange.

This is patch [2/3] in a series to extend LLVM's DISubrange Metadata
node to support debugging of C99 variable length arrays and vectors with
runtime length like the Scalable Vector Extension for AArch64. It is
also a first step towards representing more complex cases like arrays
in Fortran.

Reviewers: echristo, pcc, aprantl, dexonsmith, clayborg, kristof.beyls, dblaikie

Reviewed By: aprantl

Subscribers: fhahn, aemerson, rengolin, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D41696

llvm-svn: 323323

dc00becd

[InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive-instcombine). · e4453233

Amjad Aboud authored Jan 24, 2018

Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.

For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.

It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.

Differential Revision: https://reviews.llvm.org/D38313

llvm-svn: 323321

e4453233

[Metadata] Extend 'count' field of DISubrange to take a metadata node · fdf40917

Sander de Smalen authored Jan 24, 2018

Summary:
This patch extends the DISubrange 'count' field to take either a
(signed) constant integer value or a reference to a DILocalVariable
or DIGlobalVariable.

This is patch [1/3] in a series to extend LLVM's DISubrange Metadata
node to support debugging of C99 variable length arrays and vectors with
runtime length like the Scalable Vector Extension for AArch64. It is
also a first step towards representing more complex cases like arrays
in Fortran.

Reviewers: echristo, pcc, aprantl, dexonsmith, clayborg, kristof.beyls, dblaikie

Reviewed By: aprantl

Subscribers: rnk, probinson, fhahn, aemerson, rengolin, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D41695

llvm-svn: 323313

fdf40917

[DAGCombiner] Bail out if vector size is not a multiple · e8404780

Sven van Haastregt authored Jan 24, 2018

For the included test case, the DAG transformation
  concat_vectors(scalar, undef) -> scalar_to_vector(sclr)
would attempt to create a v2i32 vector for a v9i8
concat_vector.  Bail out to avoid creating a bitcast with
mismatching sizes later on.

Differential Revision: https://reviews.llvm.org/D42379

llvm-svn: 323312

e8404780

[NFC] Remove overconfident assert from IRCE · 0f720e12

Max Kazantsev authored Jan 24, 2018

This patch removes assert that SCEV is able to prove that a value is
non-negative. In fact, SCEV can sometimes be unable to do this because
its cache does not update properly. This assert will be returned once this
problem is resolved.

llvm-svn: 323309

0f720e12

[ARM] Call __chkstk for dynamic stack allocation in all windows environments · 4ed94a06

Martin Storsjö authored Jan 24, 2018

This matches what MSVC does for alloca() function calls on ARM.
Even if MSVC doesn't support VLAs at the language level, it does
support the alloca function.

On the clang level, both the _alloca() (when emulating MSVC, which is
what the alloca() function expands to) and __builtin_alloca() builtin
functions, and VLAs, map to the same LLVM IR "alloca" function - so
within LLVM they're not distinguishable from each other.

Differential Revision: https://reviews.llvm.org/D42292

llvm-svn: 323308

4ed94a06

[GlobalMerge] Don't merge dllexport globals · e8248f2e

Martin Storsjö authored Jan 24, 2018

Merging such globals loses the dllexport attribute. Add a test
to check that normal globals still are merged.

Differential Revision: https://reviews.llvm.org/D42127

llvm-svn: 323307

e8248f2e