Commits · 5b41fe58deb330c28e0421b96b52c7eadbf073ed · Roger Ferrer / llvm-epi

Jun 04, 2019

[AARCH64][ELF][llvm-readobj] Support for AArch64 .note.gnu.property · 580c6d31

Peter Smith authored Jun 04, 2019

    
ELF for the 64-bit Arm Architecture defines a processor specific property
type GNU_PROPERTY_AARCH64_FEATURE_1_AND as GNU_PROPERTY_LOPROC. This
property works in a similar way to the existing X86 processor specific
property GNU_PROPERTY_GNU_X86_FEATURE_1_AND.

Two feature bits are defined for GNU_PROPERTY_AARCH64_FEATURE_1_AND:
- GNU_PROPERTY_AARCH64_FEATURE_1_BTI 0x1
- GNU_PROPERTY_AARCH64_FEATURE_1_PAC 0x2

This patch defines the property, feature bits and implements support for
printing in llvm-readobj.

Differential Revision: https://reviews.llvm.org/D62595

llvm-svn: 362490

580c6d31

[DAGCombine][X86][AArch64][MIPS][LANAI] (C - x) - y -> C - (x + y) fold (PR41952) · 3dce0326

Roman Lebedev authored Jun 04, 2019

Summary:
This *might* be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet.

As far as i can tell, there are no regressions here (ignoring x86-32),
all changes are either good or neutral.

This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`)
`@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/vMd3

Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma

Reviewed By: RKSimon

Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62774

llvm-svn: 362488

3dce0326

[DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold · be6ce7b3

Roman Lebedev authored Jun 04, 2019

Summary:
All changes except ARM look **great**.
https://rise4fun.com/Alive/R2M

The regression `test/CodeGen/ARM/addsubcarry-promotion.ll`
is recovered fully by D62392 + D62450.

Reviewers: RKSimon, craig.topper, spatel, rogfer01, efriedma

Reviewed By: efriedma

Subscribers: dmgreen, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62266

llvm-svn: 362487

be6ce7b3

[SelectionDAG] ComputeNumSignBits - support constant pool values from target · ad298f86

Simon Pilgrim authored Jun 04, 2019

As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits.

The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select.

It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup.

Differential Revision: https://reviews.llvm.org/D62777

llvm-svn: 362486

ad298f86

[SelectionDAG] ComputeNumSignBits - clang-format + improve *EXTLOAD comments. NFCI. · 3178546a
Simon Pilgrim authored Jun 04, 2019
```
Pre-commit requested for D62777.

llvm-svn: 362485
```
3178546a

[llvm-ar] Reapply Fix relative thin archive path handling · 5d5078e3

Owen Reynolds authored Jun 04, 2019

Includes a fix for an introduced build failure due to a post c++11 use of std::mismatch.

This fixes some thin archive relative path issues, paths are shortened where possible and paths are output correctly when using the display table command.

Differential Revision: https://reviews.llvm.org/D59491

llvm-svn: 362484

5d5078e3

[SelectionDAG] Add fpto[us]i(undef) --> undef constant fold · 3018d505
Simon Pilgrim authored Jun 04, 2019
```
Follow up to D62807.

Differential Revision: https://reviews.llvm.org/D62811

llvm-svn: 362483
```
3018d505

[ARM] Add FP16 vector insert/extract patterns · 08da01b4

Mikhail Maltsev authored Jun 04, 2019

This change adds two FP16 extraction and two insertion patterns
(one per possible vector length).
Extractions are handled by copying a Q/D register into one of VFP2
class registers, where single FP32 sub-registers can be accessed. Then
the extraction of even lanes are simple sub-register extractions
(because we don't care about the top parts of registers for FP16
operations). Odd lanes need an additional VMOVX instruction.

Unfortunately, insertions cannot be handled in the same way, because:
* There is no instruction to insert FP16 into an even lane (VINS only
  works with odd lanes)
* The patterns for odd lanes will have a form of a DAG (not a tree),
  and will not be implementable in pure tablegen

Because of this insertions are handled in the same way as 16-bit
integer insertions (with conversions between FP registers and GPRs
using VMOVHR instructions).

Without these patterns the ARM backend would sometimes fail during
instruction selection.

This patch also adds patterns which combine:
* an FP16 element extraction and a store into a single VST1
  instruction
* an FP16 load and insertion into a single VLD1 instruction

Differential Revision: https://reviews.llvm.org/D62651

llvm-svn: 362482

08da01b4

Silenced a warning "implicit conversion turns string literal into bool" introduced in r362473 · 63846039
Dmitri Gribenko authored Jun 04, 2019
```
llvm-svn: 362480
```
63846039
Include what you use in PPC.h · 73a15d4b
Dmitri Gribenko authored Jun 04, 2019
```
llvm-svn: 362477
```
73a15d4b
Include what you use in PPCMachineScheduler.cpp · 067a17b5
Dmitri Gribenko authored Jun 04, 2019
```
llvm-svn: 362476
```
067a17b5
Include what you use in PPCRegisterInfo.h · 9d1c5ea1
Dmitri Gribenko authored Jun 04, 2019
```
llvm-svn: 362475
```
9d1c5ea1
[HWASAN][CMake] Allow instrumenting LLVM/clang · 3e39961e
Eugene Leviant authored Jun 04, 2019
```
Differential revision: https://reviews.llvm.org/D62813

llvm-svn: 362474
```
3e39961e

Make SwitchInstProfUpdateWrapper safer · 4f9e6814

Yevgeny Rouban authored Jun 04, 2019

While prof branch_weights inconsistencies are being fixed patch
by patch (pass by pass) we need SwitchInstProfUpdateWrapper to
be safe with respect to inconsistent metadata that can come from
passes that have not been fixed yet. See the bug found by @nikic
in https://reviews.llvm.org/D62126.

This patch introduces one more state (called Invalid) to the
wrapper class that allows users to work with the underlying
SwitchInst ignoring the prof metadata changes.

Created a unit test for the SwitchInstProfUpdateWrapper class.

Reviewers: davidx, nikic, eraman, reames, chandlerc
Reviewed By: davidx
Differential Revision: https://reviews.llvm.org/D62656

llvm-svn: 362473

4f9e6814

[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores · 11de0e71

QingShan Zhang authored Jun 04, 2019

This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c

static void store64(u64 x, unsigned char* y)
{
    for(int i = 0; i != 8; ++i)
        y[i] = (x >> ((7-i) * 8)) & 255;
}

static u64 load64(const unsigned char* y)
{
    u64 res = 0;
    for(int i = 0; i != 8; ++i)
        res |= (u64)(y[i]) << ((7-i) * 8);
    return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.

Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.

Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;

>
*((i32)p) = val;

i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;

>
*((i32)p) = BSWAP(val);

Differential Revision: https://reviews.llvm.org/D61843

llvm-svn: 362472

11de0e71

[NFC] Update the test to check the endianness after the CodeGenPrepare instead... · 72667b4e

QingShan Zhang authored Jun 04, 2019

[NFC] Update the test to check the endianness after the CodeGenPrepare instead of checking the assembly instructions.

llvm-svn: 362471

72667b4e

[ARM] Turn some undefined encoding bits into 0s. · ac024455

Simon Tatham authored Jun 04, 2019

The family of 32-bit Thumb instruction encodings that include t2ORR,
t2AND and t2EOR are all listed in the ArmARM as having (0) in bit 15.
The Tablegen descriptions of those instructions listed them as ?. This
change tightens that up by making them into 0 + Unpredictable.

In the specific case of t2ORR, we tighten it up still further by
making the zero bit mandatory. This change comes from Arm v8.1-M, in
which encodings with that bit equal to 1 will now be used for
different instructions.


Reviewers: dmgreen, samparker, SjoerdMeijer, efriedma

Reviewed By: dmgreen, efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60705

llvm-svn: 362470

ac024455

[PowerPC] add testcases for reordering LSR and PPCCTRLoops - NFC · a050b255
Chen Zheng authored Jun 04, 2019
```
llvm-svn: 362468
```
a050b255
[NFC][X86] Fixup FileCheck prefixes - drop duplicates · b3650868
Roman Lebedev authored Jun 03, 2019
```
llvm-svn: 362460
```
b3650868
[X86] Add test cases for 32 and 64 bit versions of PR42118. NFC · ac062bba
Craig Topper authored Jun 03, 2019
```
llvm-svn: 362457
```
ac062bba

[NFC][Codegen] Add tests for hoisting and-by-const from "logical shift", when... · 6dc8ce32

Roman Lebedev authored Jun 03, 2019

[NFC][Codegen] Add tests for hoisting and-by-const from "logical shift", when then eq-comparing with 0

This was initially reported as: https://reviews.llvm.org/D62818

https://rise4fun.com/Alive/oPH

llvm-svn: 362455

6dc8ce32

Fix DWARF DebugInfo unit test errors when cross-compiling · 552fda83

Jason Liu authored Jun 03, 2019

Summary:
When building with a Default Target set we can experience issues
in the DWARF DebugInfo unit tests because:

They assume we can generate object files for the host platform.
Some tests assume the endianess of the target we are generating
DWARF for and the host match.

This patch correct these issues by ensuring the tests which
generate objects in memory are run with respect to
LVM_DEFAULT_TARGET_TRIPLE and it's endianess.

We also make sure we don't use the hosts address size for line test
and split the triple util function in DwarfUtils into a version
that takes an address size and one that doesn't.

See also for discussion:
http://lists.llvm.org/pipermail/llvm-dev/2019-March/131212.html

Patch by: daltenty

Differential Revision: https://reviews.llvm.org/D62084

llvm-svn: 362454

552fda83

Revert r362451 "foo" and r362452 "[X86] Add test cases for 32 and 64 bit versions of PR42118. NFC" · 099f4a9f
Craig Topper authored Jun 03, 2019
```
I failed to squash these properly

llvm-svn: 362453
```
099f4a9f
[X86] Add test cases for 32 and 64 bit versions of PR42118. NFC · 17728e7c
Craig Topper authored Jun 03, 2019
```
llvm-svn: 362452
```
17728e7c
foo · 27a54661
Craig Topper authored Jun 03, 2019
```
llvm-svn: 362451
```
27a54661

[ORC] Use uint8_t for bitfields in SymbolTableEntry. · 357e8a39

Lang Hames authored Jun 03, 2019

This allows for better struct packing on MSVC, and as a bonus will eliminate a
warning on GCC builds.

llvm-svn: 362450

357e8a39

Jun 03, 2019

[SCCP] Add UnaryOperator visitor to SCCP for unary FNeg · 89f9af54
Cameron McInally authored Jun 03, 2019
```
Differential Revision: https://reviews.llvm.org/D62819

llvm-svn: 362449
```
89f9af54
Propagate fmf for setcc in SDAG for select folds · 6ff978ee
Michael Berg authored Jun 03, 2019
```
llvm-svn: 362448
```
6ff978ee

AMDGPU: Disable stack realignment for kernels · 0ceda9fb

Matt Arsenault authored Jun 03, 2019

This is something of a workaround, and the state of stack realignment
controls is kind of a mess. Ideally, we would be able to specify the
stack is infinitely aligned on entry to a kernel.

TargetFrameLowering provides multiple controls which apply at
different points. The StackRealignable field is used during
SelectionDAG, and for some reason distinct from this
hook. StackAlignment is a single field not dependent on the
function. It would probably be better to make that dependent on the
calling convention, and the maximum value for kernels.

Currently this doesn't really change anything, since the frame
lowering mostly does its own thing. This helps avoid regressions in a
future change which will rely more heavily on hasFP.

llvm-svn: 362447

0ceda9fb

[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp · 7500c97c

Jessica Paquette authored Jun 03, 2019

Instead of emitting all of the test stuff for a compare when it's only used by
a select, instead, just emit the compare + select. The select will use the
value of NZCV correctly, so we don't need to emit all of the test instructions
etc.

For now, only support fp selects which use G_FCMP. Also only support condition
codes which will only require one select to represent.

Also add a test.

Differential Revision: https://reviews.llvm.org/D62695

llvm-svn: 362446

7500c97c

gn build: Merge r361896. · bddab42f
Peter Collingbourne authored Jun 03, 2019
```
llvm-svn: 362445
```
bddab42f
CFLAA: reflow comments; NFC · c24a2f4a
George Burgess IV authored Jun 03, 2019
```
llvm-svn: 362442
```
c24a2f4a

[CFLGraph] Add FAdd to visitConstantExpr. · 7a4eabef

Craig Topper authored Jun 03, 2019

This looks like an oversight as all the other binary operators are present.

Accidentally noticed while auditing places that need FNeg handling.

No test because as noted in the review it would be contrived and amount to "don't crash"

Differential Revision: https://reviews.llvm.org/D62790

llvm-svn: 362441

7a4eabef

[X86] Fix the pattern for merge masked vcvtps2pd. · dcf865f0

Craig Topper authored Jun 03, 2019

r362199 fixed it for zero masking, but not zero masking. The load
folding in the peephole pass hid the bug. This patch turns off
the peephole pass on the relevant test to ensure coverage.

llvm-svn: 362440

dcf865f0

Propagate fmf for setcc/select folds · 0b7f98da

Michael Berg authored Jun 03, 2019

Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.

Reviewers: qcolombet, spatel

Reviewed By: qcolombet

Subscribers: nemanjai, jsji

Differential Revision: https://reviews.llvm.org/D62552

llvm-svn: 362439

0b7f98da

[PowerPC] Look through copies for compare elimination · bad43d8f

Nemanja Ivanovic authored Jun 03, 2019

We currently miss the opportunities for optmizing comparisons in the peephole
optimizer if the input is the result of a COPY since we look for record-form
versions of the producing instruction.

This patch simply lets the optimization peek through copies.

Differential revision: https://reviews.llvm.org/D59633

llvm-svn: 362438

bad43d8f

TTI: Improve default costs for addrspacecast · 8dbeb925

Matt Arsenault authored Jun 03, 2019

For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.

Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.

llvm-svn: 362436

8dbeb925

gn build: Merge r362371 · 6f83c75d
Nico Weber authored Jun 03, 2019
```
llvm-svn: 362433
```
6f83c75d
Add ScalarEvolutionsTest::SCEVExpandInsertCanonicalIV tests · 786a85dc
Artur Pilipenko authored Jun 03, 2019
```
Test insertion of canonical IV in canonical expansion mode.

llvm-svn: 362432
```
786a85dc

[ConstantRange] Add sdiv() support · c061b99c

Nikita Popov authored Jun 03, 2019

The implementation is conceptually simple: We separate the LHS and
RHS into positive and negative components and then also compute the
positive and negative components of the result, taking into account
that e.g. only pos/pos and neg/neg will give a positive result.

However, there's one significant complication: SignedMin / -1 is UB
for sdiv, and we can't just ignore it, because the APInt result of
SignedMin would break the sign segregation. Instead we drop SignedMin
or -1 from the corresponding ranges, taking into account some edge
cases with wrapped ranges.

Because of the sign segregation, the implementation ends up being
nearly fully precise even for wrapped ranges (the remaining
imprecision is due to ranges that are both signed and unsigned
wrapping and are divided by a trivial divisor like 1). This means
that the testing cannot just check the signed envelope as we
usually do. Instead we collect all possible results in a bitvector
and construct a better sign wrapped range (than the full envelope).

Differential Revision: https://reviews.llvm.org/D61238

llvm-svn: 362430

c061b99c