Commits · 3fbacd4964edb44bce797de8fe248512a835524c · Lorenzo Albano / LLVM bpEVL

Feb 11, 2019

[NFC][ARM] Simplify loop-indexing codegen test · 3fbacd49

Sam Parker authored Feb 11, 2019

Remove unnecessary offset checks, CHECK-BASE checks and add some
extra -NOT checks and TODO comments.

llvm-svn: 353689

3fbacd49

[TEST] Add failing test from PR40454 · 8ec0c5e0
Max Kazantsev authored Feb 11, 2019
```
llvm-svn: 353688
```
8ec0c5e0

[DWARF] LLVM ERROR: Broken function found, while removing Debug Intrinsics. · e848d426

Carlos Alberto Enciso authored Feb 11, 2019

Check that when SimplifyCFG is flattening a 'br', all their debug intrinsic instructions are removed, including any dbg.label referencing a label associated with the basic blocks being removed.

As the test case involves a CFG transformation, move it to the correct location.

Differential Revision: https://reviews.llvm.org/D57444

llvm-svn: 353682

e848d426

[ARM] LoadStoreOptimizer: reoder limit · 150ccb88

Sjoerd Meijer authored Feb 11, 2019

The whole design of generating LDMs/STMs is fragile and unreliable: it depends on
rescheduling here in the LoadStoreOptimizer that isn't register pressure aware
and regalloc that isn't aware of generating LDMs/STMs.
This patch adds a (hidden) option to control the total number of instructions that
can be re-ordered. I appreciate this looks only a tiny bit better than a hard-coded
constant, but at least it allows more easy experimentation with different values
for now. Ideally we calculate this reorder limit based on some heuristics, and take
register pressure into account. I might be looking into that next.

Differential Revision: https://reviews.llvm.org/D57954

llvm-svn: 353678

150ccb88

Feb 10, 2019

[GlobalISel] Regex the opcodes in unit test to fix non-deterministic ordering · ea246114
Mandeep Singh Grang authored Feb 10, 2019
```
Differential Revision: https://reviews.llvm.org/D57988

llvm-svn: 353652
```
ea246114

[CodeGen][X86] Don't scalarize vector saturating add/sub · a0e96bd5

Nikita Popov authored Feb 10, 2019

Now that we have vector support for [US](ADD|SUB)O we no longer
need to scalarize when expanding [US](ADD|SUB)SAT.

This matches what the cost model already does.

Differential Revision: https://reviews.llvm.org/D57348

llvm-svn: 353651

a0e96bd5

[AArch64] Regenerate bswap tests · a303186e
Simon Pilgrim authored Feb 10, 2019
```
llvm-svn: 353648
```
a303186e
[X86] Add basic bitreverse/bswap combine tests · ce103129
Simon Pilgrim authored Feb 10, 2019
```
Shows missing SimplifyDemandedBits support

llvm-svn: 353647
```
ce103129

[DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts · 5a82a788

Simon Pilgrim authored Feb 10, 2019

Now that we have SimplifyDemandedBits support for funnel shifts (rL353539), we need to simplify funnel shifts back to bitshifts in cases where either argument has been folded to undef/zero.

Differential Revision: https://reviews.llvm.org/D58009

llvm-svn: 353645

5a82a788

[X86] Add masked variable tests for funnel undef/zero argument combines · 06a61b0b

Simon Pilgrim authored Feb 10, 2019

I've avoided 'modulo' masks as we'll SimplifyDemandedBits those in the future, and we just need to check that the shift variable is 'in range'

llvm-svn: 353644

06a61b0b

[x86] narrow 256-bit horizontal ops via demanded elements · 833550fc

Sanjay Patel authored Feb 10, 2019

256-bit horizontal math ops are an x86 monstrosity (and thankfully have
not been extended to 512-bit AFAIK).

The two 128-bit halves operate on separate halves of the inputs. So if we
don't demand anything in the upper half of the result, we can extract the
low halves of the inputs, do the math, and then insert that result into a
256-bit output.

All of the extract/insert is free (ymm<-->xmm), so we're left with a
narrower (cheaper) version of the original op.

In the affected tests based on:
https://bugs.llvm.org/show_bug.cgi?id=33758
https://bugs.llvm.org/show_bug.cgi?id=38971
...we see that the h-op narrowing can result in further narrowing of other
math via existing generic transforms.

I originally drafted this patch as an exact pattern match starting from
extract_vector_elt, but I thought we might see diffs starting from
extract_subvector too, so I changed it to a more general demanded elements
solution. There are no extra existing regression test improvements from
that switch though, so we could go back.

Differential Revision: https://reviews.llvm.org/D57841

llvm-svn: 353641

833550fc

[X86] Add additional tests for funnel undef/zero argument combines · 76683e7b
Simon Pilgrim authored Feb 10, 2019
```
As suggested on D58009 

llvm-svn: 353640
```
76683e7b

[TargetLowering] refactor setcc folds to fix another miscompile (PR40657) · 2f319420

Sanjay Patel authored Feb 10, 2019

SimplifySetCC still has much room for improvement, but this should
fix the remaining problem examples from:
https://bugs.llvm.org/show_bug.cgi?id=40657

The initial fix for this problem was rL353615.

llvm-svn: 353639

2f319420

[X86][SSE] Add SimplifyDemandedBits test for BLENDVPD · fd541e9a
Simon Pilgrim authored Feb 10, 2019
```
llvm-svn: 353638
```
fd541e9a

[yaml2obj] - Fix .dynamic section entries writing for 32bit targets. · 5cb31731

George Rimar authored Feb 10, 2019

This was introduced by me in r353613.

I tried to fix Big-endian bot and replaced
uintX_t -> ELFT::Xword. But ELFT::Xword is a packed<uint64_t>,
so it is always 8 bytes and that was obviously incorrect.

My intention was to use something like packed<uint> actually, which
size is target dependent.

Patch fixes this bug and adds a test case, since no bots seems reported this.

llvm-svn: 353636

5cb31731

Feb 09, 2019

[X86] Add tests for funnel undef argument combines · a561d466
Simon Pilgrim authored Feb 09, 2019
```
If one of the shifted arguments is undef we should be folding to a regular shift.

llvm-svn: 353628
```
a561d466

[X86] CombineOr - fold to generic funnel shifts · 6bf7b30b

Simon Pilgrim authored Feb 09, 2019

As discussed on D57389, this is a first step towards moving the SHLD/SHRD matching code to DAGCombiner using FSHL/FSHR instead.

There's a bit of work to do before I can do that, so this just folds to FSHL/FSHR in the existing code (handling the different SHRD/FSHR argument ordering), which fixes the issue we had with i16 shift amounts not being correctly masked.

llvm-svn: 353626

6bf7b30b

[x86] add another test for setcc miscompile (PR40657); NFC · 586ad01f
Sanjay Patel authored Feb 09, 2019
```
llvm-svn: 353625
```
586ad01f
llvm-lib: Implement /list flag · a2f60933
Nico Weber authored Feb 09, 2019
```
Differential Revision: https://reviews.llvm.org/D57952

llvm-svn: 353620
```
a2f60933

[TargetLowering] add tests to show effect of setcc sub->shift; NFC · 74675104

Sanjay Patel authored Feb 09, 2019

There's effectively no difference for the cases with variables.
We just trade a sub for an add on those. But the case with a
subtract from constant would require an extra move instruction
on x86, so this looks like a reasonable generic combine.

llvm-svn: 353619

74675104

[x86] add test for setcc sub->shift transform; NFC · f31cf49c
Sanjay Patel authored Feb 09, 2019
```
llvm-svn: 353618
```
f31cf49c
[X86] Regenerate test. · ab283217
Simon Pilgrim authored Feb 09, 2019
```
llvm-svn: 353616
```
ab283217
[TargetLowering] avoid miscompile in setcc transform (PR40657) · 887ac1b3
Sanjay Patel authored Feb 09, 2019
```
llvm-svn: 353615
```
887ac1b3

[X86][SSE] Generalize X86ISD::BLENDI support to more value types · 690a2889

Simon Pilgrim authored Feb 09, 2019

D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required.

With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors.

This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts.

I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case).

The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV.

Differential Revision: https://reviews.llvm.org/D57888

llvm-svn: 353610

690a2889

[yaml2obj][obj2yaml] - Add support for dumping/parsing .dynamic sections. · 0e7ed912

George Rimar authored Feb 09, 2019

This teaches the tools to parse and dump
the .dynamic section and its dynamic tags.

Differential revision: https://reviews.llvm.org/D57691

llvm-svn: 353606

0e7ed912

[GlobalOpt] Simplify __cxa_atexit elimination · 6e679f8b

Fangrui Song authored Feb 09, 2019

cxxDtorIsEmpty checks callers recursively to determine if the
__cxa_atexit-registered function is empty, and eliminates the
__cxa_atexit call accordingly.

This recursive check is unnecessary as redundant instructions and
function calls can be removed by early-cse and inliner. In addition,
cxxDtorIsEmpty does not mark visited function and it may visit a
function exponential times (multiplication principle).

llvm-svn: 353603

6e679f8b

Extra processing for BitCast + PHI in InstCombine · 53980b24

Gabor Buella authored Feb 09, 2019

For some specific cases with bitcast A->B->A with intervening PHI nodes InstCombiner::optimizeBitCastFromPhi transformation creates extra PHI nodes, which are actually a copy of already created PHI or in another words, they are redundant. These extra PHI nodes could lead to extra move instructions generated after DeSSA transformation. This happens when several conditions are met

- SROA kicks in and creates new alloca;
- there is a simple assignment L = R, which falls under 'canonicalize loads' done by combineLoadToOperationType (this transformation is by default). Exactly this transformation is the reason of bitcasts generated;
- the alloca is then used in A->B->A + PHI chain;
- there is a loop unrolling.

As a result optimizeBitCastFromPhi creates as many of PHI nodes for each new SROA alloca as loop unrolling factor is. These new extra PHI nodes are redundant actually except of one and should not be created. Moreover the idea of optimizeBitCastFromPhi is to get rid of the cast (when possible) but that doesn't happen in these conditions.

The proposed fix is to do the cast replacement for the whole calculated/accumulated PHI closure not for one cast only, which is an argument to the optimizeBitCastFromPhi. These will help to accomplish several things: 1) avoid extra PHI nodes generated as all casts which may trigger optimizeBitCastFromPhi transformation will be replaced, 3) bitcasts will be replaced, and 3) create more opportunities to remove dead code, which appears after the replacement.

A new test case shows that it's possible to get rid of all bitcasts completely and get quite good code reduction.

Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>

Reviewed By: Carrot

Differential Revision: https://reviews.llvm.org/D57053

llvm-svn: 353595

53980b24

[AMDGPU] Split idot4/8 signed and unsigned tests. NFC. · 344968fd
Stanislav Mekhanoshin authored Feb 09, 2019
```
llvm-svn: 353593
```
344968fd
Recommit "[GlobalISel] Introduce a generic floating point floor opcode, G_FFLOOR"" · c230c13d
Jessica Paquette authored Feb 09, 2019
```
After r353586, we won't fail on the AMDGPU floor pattern that was killing the
importer before.

llvm-svn: 353589
```
c230c13d

[GlobalISel] Skip patterns that define complex suboperands twice instead of dying · 1ed1dd6d

Jessica Paquette authored Feb 09, 2019

If we run into a pattern that looks like this:

add
  (complex $x, $y)
  (complex $x, $z)

We should skip the pattern instead of asserting/doing something unpredictable.

This makes us return an Error in that case, and adds a testcase for skipped
patterns.

Differential Revision: https://reviews.llvm.org/D57980

llvm-svn: 353586

1ed1dd6d

[x86] add test for miscompiling setcc transform (PR40657); NFC · 1386d99c
Sanjay Patel authored Feb 08, 2019
```
llvm-svn: 353580
```
1386d99c
Re-apply r353553 "[GISel][NFC]: Add missing call to record CSE hits in the CSEMIRBuilder" · 8bc57953
Francis Visoiu Mistrih authored Feb 08, 2019
```
With a fix after r353563 that adds some more opcodes.

llvm-svn: 353579
```
8bc57953

Feb 08, 2019

Revert r353553 "[GISel][NFC]: Add missing call to record CSE hits in the CSEMIRBuilder" · decba8aa

Francis Visoiu Mistrih authored Feb 08, 2019

This reverts commit r353553.

This breaks CodeGen/AArch64/GlobalISel/legalize-ext-csedebug-output.mir:

http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/57963/console

llvm-svn: 353575

decba8aa

[X86] Add FPCW as an implicit use on floating point load instructions. · fcb63c4c

Craig Topper authored Feb 08, 2019

These instructions can generate a stack overflow exception so technically they read the stack overflow exception mask bit.

llvm-svn: 353564

fcb63c4c

Implementation of asm-goto support in LLVM · 784929d0

Craig Topper authored Feb 08, 2019

This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563

784929d0

AMDGPU/GlobalISel: Fix broken tests · ca9583a7
Matt Arsenault authored Feb 08, 2019
```
llvm-svn: 353559
```
ca9583a7

[DAGCombine] Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X)) · 92a8c367

Nemanja Ivanovic authored Feb 08, 2019

The sqrt case is faster and we already do this for the case where
the exponent is 0.25. This adds the 0.75 case which is also not
sensitive to signed zeros.

Patch by Whitney Tsang (Whitney)

Differential revision: https://reviews.llvm.org/D57434

llvm-svn: 353557

92a8c367

[GISel][NFC]: Add missing call to record CSE hits in the CSEMIRBuilder · 01e818a9

Aditya Nandakumar authored Feb 08, 2019

https://reviews.llvm.org/D57932

Add some logging + tests to make sure CSEInfo prints debug output.

reviewed by: arsenm

llvm-svn: 353553

01e818a9

AMDGPU: Remove GCN features and predicates · d7047276

Matt Arsenault authored Feb 08, 2019

These are no longer necessary since the R600 tablegen files are split
out now.

llvm-svn: 353548

d7047276

[InstrProf] Implement static profdata registration · 987d331f

Reid Kleckner authored Feb 08, 2019

Summary:
The motivating use case is eliminating duplicate profile data registered
for the same inline function in two object files. Before this change,
users would observe multiple symbol definition errors with VC link, but
links with LLD would succeed.

Users (Mozilla) have reported that PGO works well with clang-cl and LLD,
but when using LLD without this static registration, we would get into a
"relocation against a discarded section" situation. I'm not sure what
happens in that situation, but I suspect that duplicate, unused profile
information was retained. If so, this change will reduce the size of
such binaries with LLD.

Now, Windows uses static registration and is in line with all the other
platforms.

Reviewers: davidxl, wmi, inglorion, void, calixte

Subscribers: mgorny, krytarowski, eraman, fedor.sergeev, hiraditya, #sanitizers, dmajor, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D57929

llvm-svn: 353547

987d331f