Commits · 3fbacd4964edb44bce797de8fe248512a835524c · Lorenzo Albano / LLVM bpEVL

Feb 11, 2019

[NFC][ARM] Simplify loop-indexing codegen test · 3fbacd49

Sam Parker authored Feb 11, 2019

Remove unnecessary offset checks, CHECK-BASE checks and add some
extra -NOT checks and TODO comments.

llvm-svn: 353689

3fbacd49

[DWARF] LLVM ERROR: Broken function found, while removing Debug Intrinsics. · e848d426

Carlos Alberto Enciso authored Feb 11, 2019

Check that when SimplifyCFG is flattening a 'br', all their debug intrinsic instructions are removed, including any dbg.label referencing a label associated with the basic blocks being removed.

As the test case involves a CFG transformation, move it to the correct location.

Differential Revision: https://reviews.llvm.org/D57444

llvm-svn: 353682

e848d426

[ARM] LoadStoreOptimizer: reoder limit · 150ccb88

Sjoerd Meijer authored Feb 11, 2019

The whole design of generating LDMs/STMs is fragile and unreliable: it depends on
rescheduling here in the LoadStoreOptimizer that isn't register pressure aware
and regalloc that isn't aware of generating LDMs/STMs.
This patch adds a (hidden) option to control the total number of instructions that
can be re-ordered. I appreciate this looks only a tiny bit better than a hard-coded
constant, but at least it allows more easy experimentation with different values
for now. Ideally we calculate this reorder limit based on some heuristics, and take
register pressure into account. I might be looking into that next.

Differential Revision: https://reviews.llvm.org/D57954

llvm-svn: 353678

150ccb88

Feb 10, 2019

[GlobalISel] Regex the opcodes in unit test to fix non-deterministic ordering · ea246114
Mandeep Singh Grang authored Feb 10, 2019
```
Differential Revision: https://reviews.llvm.org/D57988

llvm-svn: 353652
```
ea246114

[CodeGen][X86] Don't scalarize vector saturating add/sub · a0e96bd5

Nikita Popov authored Feb 10, 2019

Now that we have vector support for [US](ADD|SUB)O we no longer
need to scalarize when expanding [US](ADD|SUB)SAT.

This matches what the cost model already does.

Differential Revision: https://reviews.llvm.org/D57348

llvm-svn: 353651

a0e96bd5

[AArch64] Regenerate bswap tests · a303186e
Simon Pilgrim authored Feb 10, 2019
```
llvm-svn: 353648
```
a303186e
[X86] Add basic bitreverse/bswap combine tests · ce103129
Simon Pilgrim authored Feb 10, 2019
```
Shows missing SimplifyDemandedBits support

llvm-svn: 353647
```
ce103129

[DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts · 5a82a788

Simon Pilgrim authored Feb 10, 2019

Now that we have SimplifyDemandedBits support for funnel shifts (rL353539), we need to simplify funnel shifts back to bitshifts in cases where either argument has been folded to undef/zero.

Differential Revision: https://reviews.llvm.org/D58009

llvm-svn: 353645

5a82a788

[X86] Add masked variable tests for funnel undef/zero argument combines · 06a61b0b

Simon Pilgrim authored Feb 10, 2019

I've avoided 'modulo' masks as we'll SimplifyDemandedBits those in the future, and we just need to check that the shift variable is 'in range'

llvm-svn: 353644

06a61b0b

[x86] narrow 256-bit horizontal ops via demanded elements · 833550fc

Sanjay Patel authored Feb 10, 2019

256-bit horizontal math ops are an x86 monstrosity (and thankfully have
not been extended to 512-bit AFAIK).

The two 128-bit halves operate on separate halves of the inputs. So if we
don't demand anything in the upper half of the result, we can extract the
low halves of the inputs, do the math, and then insert that result into a
256-bit output.

All of the extract/insert is free (ymm<-->xmm), so we're left with a
narrower (cheaper) version of the original op.

In the affected tests based on:
https://bugs.llvm.org/show_bug.cgi?id=33758
https://bugs.llvm.org/show_bug.cgi?id=38971
...we see that the h-op narrowing can result in further narrowing of other
math via existing generic transforms.

I originally drafted this patch as an exact pattern match starting from
extract_vector_elt, but I thought we might see diffs starting from
extract_subvector too, so I changed it to a more general demanded elements
solution. There are no extra existing regression test improvements from
that switch though, so we could go back.

Differential Revision: https://reviews.llvm.org/D57841

llvm-svn: 353641

833550fc

[X86] Add additional tests for funnel undef/zero argument combines · 76683e7b
Simon Pilgrim authored Feb 10, 2019
```
As suggested on D58009 

llvm-svn: 353640
```
76683e7b

[TargetLowering] refactor setcc folds to fix another miscompile (PR40657) · 2f319420

Sanjay Patel authored Feb 10, 2019

SimplifySetCC still has much room for improvement, but this should
fix the remaining problem examples from:
https://bugs.llvm.org/show_bug.cgi?id=40657

The initial fix for this problem was rL353615.

llvm-svn: 353639

2f319420

[X86][SSE] Add SimplifyDemandedBits test for BLENDVPD · fd541e9a
Simon Pilgrim authored Feb 10, 2019
```
llvm-svn: 353638
```
fd541e9a

Feb 09, 2019

[X86] Add tests for funnel undef argument combines · a561d466
Simon Pilgrim authored Feb 09, 2019
```
If one of the shifted arguments is undef we should be folding to a regular shift.

llvm-svn: 353628
```
a561d466

[X86] CombineOr - fold to generic funnel shifts · 6bf7b30b

Simon Pilgrim authored Feb 09, 2019

As discussed on D57389, this is a first step towards moving the SHLD/SHRD matching code to DAGCombiner using FSHL/FSHR instead.

There's a bit of work to do before I can do that, so this just folds to FSHL/FSHR in the existing code (handling the different SHRD/FSHR argument ordering), which fixes the issue we had with i16 shift amounts not being correctly masked.

llvm-svn: 353626

6bf7b30b

[x86] add another test for setcc miscompile (PR40657); NFC · 586ad01f
Sanjay Patel authored Feb 09, 2019
```
llvm-svn: 353625
```
586ad01f

[TargetLowering] add tests to show effect of setcc sub->shift; NFC · 74675104

Sanjay Patel authored Feb 09, 2019

There's effectively no difference for the cases with variables.
We just trade a sub for an add on those. But the case with a
subtract from constant would require an extra move instruction
on x86, so this looks like a reasonable generic combine.

llvm-svn: 353619

74675104

[x86] add test for setcc sub->shift transform; NFC · f31cf49c
Sanjay Patel authored Feb 09, 2019
```
llvm-svn: 353618
```
f31cf49c
[X86] Regenerate test. · ab283217
Simon Pilgrim authored Feb 09, 2019
```
llvm-svn: 353616
```
ab283217
[TargetLowering] avoid miscompile in setcc transform (PR40657) · 887ac1b3
Sanjay Patel authored Feb 09, 2019
```
llvm-svn: 353615
```
887ac1b3

[X86][SSE] Generalize X86ISD::BLENDI support to more value types · 690a2889

Simon Pilgrim authored Feb 09, 2019

D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required.

With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors.

This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts.

I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case).

The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV.

Differential Revision: https://reviews.llvm.org/D57888

llvm-svn: 353610

690a2889

[AMDGPU] Split idot4/8 signed and unsigned tests. NFC. · 344968fd
Stanislav Mekhanoshin authored Feb 09, 2019
```
llvm-svn: 353593
```
344968fd
Recommit "[GlobalISel] Introduce a generic floating point floor opcode, G_FFLOOR"" · c230c13d
Jessica Paquette authored Feb 09, 2019
```
After r353586, we won't fail on the AMDGPU floor pattern that was killing the
importer before.

llvm-svn: 353589
```
c230c13d
[x86] add test for miscompiling setcc transform (PR40657); NFC · 1386d99c
Sanjay Patel authored Feb 08, 2019
```
llvm-svn: 353580
```
1386d99c
Re-apply r353553 "[GISel][NFC]: Add missing call to record CSE hits in the CSEMIRBuilder" · 8bc57953
Francis Visoiu Mistrih authored Feb 08, 2019
```
With a fix after r353563 that adds some more opcodes.

llvm-svn: 353579
```
8bc57953

Feb 08, 2019

Revert r353553 "[GISel][NFC]: Add missing call to record CSE hits in the CSEMIRBuilder" · decba8aa

Francis Visoiu Mistrih authored Feb 08, 2019

This reverts commit r353553.

This breaks CodeGen/AArch64/GlobalISel/legalize-ext-csedebug-output.mir:

http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/57963/console

llvm-svn: 353575

decba8aa

[X86] Add FPCW as an implicit use on floating point load instructions. · fcb63c4c

Craig Topper authored Feb 08, 2019

These instructions can generate a stack overflow exception so technically they read the stack overflow exception mask bit.

llvm-svn: 353564

fcb63c4c

Implementation of asm-goto support in LLVM · 784929d0

Craig Topper authored Feb 08, 2019

This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563

784929d0

AMDGPU/GlobalISel: Fix broken tests · ca9583a7
Matt Arsenault authored Feb 08, 2019
```
llvm-svn: 353559
```
ca9583a7

[DAGCombine] Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X)) · 92a8c367

Nemanja Ivanovic authored Feb 08, 2019

The sqrt case is faster and we already do this for the case where
the exponent is 0.25. This adds the 0.75 case which is also not
sensitive to signed zeros.

Patch by Whitney Tsang (Whitney)

Differential revision: https://reviews.llvm.org/D57434

llvm-svn: 353557

92a8c367

[GISel][NFC]: Add missing call to record CSE hits in the CSEMIRBuilder · 01e818a9

Aditya Nandakumar authored Feb 08, 2019

https://reviews.llvm.org/D57932

Add some logging + tests to make sure CSEInfo prints debug output.

reviewed by: arsenm

llvm-svn: 353553

01e818a9

AMDGPU: Remove GCN features and predicates · d7047276

Matt Arsenault authored Feb 08, 2019

These are no longer necessary since the R600 tablegen files are split
out now.

llvm-svn: 353548

d7047276

[TargetLowering] Use ISD::FSHR in expandFixedPointMul · eb6a47a4

Simon Pilgrim authored Feb 08, 2019

Replace OR(SHL,SRL) pattern with ISD::FSHR (legalization expands this later if necessary) - this helps with the scale == 0 'undefined' drop-through case that was discussed on D55720.

llvm-svn: 353546

eb6a47a4

[TargetLowering] Add SimplifyDemandedBits funnel shift support · 478bb907
Simon Pilgrim authored Feb 08, 2019
```
llvm-svn: 353539
```
478bb907
[X86] Add basic funnel shift demanded bits tests · 68457c1e
Simon Pilgrim authored Feb 08, 2019
```
llvm-svn: 353534
```
68457c1e

[AMDGPU] Fix CS scratch setup on pre-GCN3 ASICs · 494b8ac9

Carl Ritson authored Feb 08, 2019

Summary:
Prior to GCN3 s_load_dword offsets are in dwords rather than bytes.
Thus the scratch buffer descriptor offset must be adjusted for pre-GCN3 ASICs.

Reviewers: nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: sheredom, arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D56496

llvm-svn: 353530

494b8ac9

AMDGPU/GlobalISel: Fix shift legalization for non-power-of-2 · b0a22704

Matt Arsenault authored Feb 08, 2019

clampScalar doesn't do anything for non-power-of-2 in range.
There should probably be a combination rule to reduce the number
of matching rules.

llvm-svn: 353526

b0a22704

AMDGPU/GlobalISel: Fix non-power-of-2 implicit_def · 0f2debb1
Matt Arsenault authored Feb 08, 2019
```
llvm-svn: 353522
```
0f2debb1

[MIPS GlobalISel] Select any extending load and truncating store · c98b26d3

Petar Avramovic authored Feb 08, 2019

Make behavior of G_LOAD in widenScalar same as for G_ZEXTLOAD and
G_SEXTLOAD. That is perform widenScalarDst to size given by the target
and avoid additional checks in common code. Targets can reorder or add
additional rules in LegalizeRuleSet for the opcode to achieve desired
behavior.

Select extending load that does not have specified type of extension
into zero extending load.

Select truncating store that stores number of bytes indicated by size
in MachineMemoperand.

Differential Revision: https://reviews.llvm.org/D57454

llvm-svn: 353520

c98b26d3

AMDGPU/GlobalISel: Don't use a copy in addrspacecast lowering · dc88a2ce
Matt Arsenault authored Feb 08, 2019
```
llvm-svn: 353516
```
dc88a2ce