Commits · 11ed1c0239fd51fd2f064311dc7725277ed0a994 · Lorenzo Albano / LLVM bpEVL

Nov 08, 2019

[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI) · 11ed1c02

Gil Rapaport authored Oct 07, 2019

This recommits 100e797a (reverted in
009e0326 for failing an assert). While the
root cause was independently reverted in eaff3004,
this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not
try to sink branch instructions.

11ed1c02

BinaryStream - fix static analyzer warnings. NFCI. · ef459ded
Simon Pilgrim authored Nov 08, 2019
```
 - uninitialized variables
 - documention warnings
 - shadow variable names
```
ef459ded

Reland: [TII] Use optional destination and source pair as a return value; NFC · 8d2ccd1a

Djordje Todorovic authored Nov 08, 2019

Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
to return optional machine operand pair of destination and source
registers.

Patch by Nikola Prica

Differential Revision: https://reviews.llvm.org/D69622

8d2ccd1a

[cmake] Enable thin lto cache when building with lld-link · 0a8bd77e

Russell Gallop authored Nov 08, 2019

This was enabled for other platforms. Added option for Windows/lld-link.

Differential Revision: https://reviews.llvm.org/D69941

0a8bd77e

Revert "[codeview] Reference types in type parent scopes" · ff3b5134

Hans Wennborg authored Nov 08, 2019

This triggered asserts in the Chromium build, see https://crbug.com/1022729 for
details and reproducer.

> Without this change, when a nested tag type of any kind (enum, class,
> struct, union) is used as a variable type, it is emitted without
> emitting the parent type. In CodeView, parent types point to their inner
> types, and inner types do not point back to their parents. We already
> walk over all of the parent scopes to build the fully qualified name.
> This change simply requests their type indices as we go along to enusre
> they are all emitted.
>
> Fixes PR43905
>
> Reviewers: akhuang, amccarth
>
> Differential Revision: https://reviews.llvm.org/D69924

ff3b5134

[RAGreedy] Enable -consider-local-interval-cost for AArch64 · f649f24d

Sanne Wouda authored Nov 08, 2019

Summary:
The greedy register allocator occasionally decides to insert a large number of
unnecessary copies, see below for an example.  The -consider-local-interval-cost
option (which X86 already enables by default) fixes this.  We enable this option
for AArch64 only after receiving feedback that this change is not beneficial for
PowerPC.

We evaluated the impact of this change on compile time, code size and
performance benchmarks.

This option has a small impact on compile time, measured on CTMark. A 0.1%
geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5%
on individual benchmarks.

The effect on both code size and performance on AArch64 for the LLVM test suite
is nil on the geomean with individual outliers (ignoring short exec_times)
between:

                 best     worst
  size..text     -3.3%    +0.0%
  exec_time      -5.8%    +2.3%

On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at
most) in code size on some benchmarks, with a tiny movement (-0.01%) on the
geomean.  Neither intrate nor fprate show any change in performance.

This patch makes the following changes.

- For the AArch64 target, enableAdvancedRASplitCost() now returns true.

- Ensures that -consider-local-interval-cost=false can disable the new
  behaviour if necessary.

This matrix multiply example:

   $ cat test.c
   long A[8][8];
   long B[8][8];
   long C[8][8];

   void run_test() {
     for (int k = 0; k < 8; k++) {
       for (int i = 0; i < 8; i++) {
	 for (int j = 0; j < 8; j++) {
	   C[i][j] += A[i][k] * B[k][j];
	 }
       }
     }
   }

results in the following generated code on AArch64:

  $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o -
  [...]
                                        // %for.cond1.preheader
                                        // =>This Inner Loop Header: Depth=1
        add     x14, x11, x9
        str     q0, [sp, #16]           // 16-byte Folded Spill
        ldr     q0, [x14]
        mov     v2.16b, v15.16b
        mov     v15.16b, v14.16b
        mov     v14.16b, v13.16b
        mov     v13.16b, v12.16b
        mov     v12.16b, v11.16b
        mov     v11.16b, v10.16b
        mov     v10.16b, v9.16b
        mov     v9.16b, v8.16b
        mov     v8.16b, v31.16b
        mov     v31.16b, v30.16b
        mov     v30.16b, v29.16b
        mov     v29.16b, v28.16b
        mov     v28.16b, v27.16b
        mov     v27.16b, v26.16b
        mov     v26.16b, v25.16b
        mov     v25.16b, v24.16b
        mov     v24.16b, v23.16b
        mov     v23.16b, v22.16b
        mov     v22.16b, v21.16b
        mov     v21.16b, v20.16b
        mov     v20.16b, v19.16b
        mov     v19.16b, v18.16b
        mov     v18.16b, v17.16b
        mov     v17.16b, v16.16b
        mov     v16.16b, v7.16b
        mov     v7.16b, v6.16b
        mov     v6.16b, v5.16b
        mov     v5.16b, v4.16b
        mov     v4.16b, v3.16b
        mov     v3.16b, v1.16b
        mov     x12, v0.d[1]
        fmov    x15, d0
        ldp     q1, q0, [x14, #16]
        ldur    x1, [x10, #-256]
        ldur    x2, [x10, #-192]
        add     x9, x9, #64             // =64
        mov     x13, v1.d[1]
        fmov    x16, d1
        ldr     q1, [x14, #48]
        mul     x3, x15, x1
        mov     x14, v0.d[1]
        fmov    x17, d0
        mov     x18, v1.d[1]
        fmov    x0, d1
        mov     v1.16b, v3.16b
        mov     v3.16b, v4.16b
        mov     v4.16b, v5.16b
        mov     v5.16b, v6.16b
        mov     v6.16b, v7.16b
        mov     v7.16b, v16.16b
        mov     v16.16b, v17.16b
        mov     v17.16b, v18.16b
        mov     v18.16b, v19.16b
        mov     v19.16b, v20.16b
        mov     v20.16b, v21.16b
        mov     v21.16b, v22.16b
        mov     v22.16b, v23.16b
        mov     v23.16b, v24.16b
        mov     v24.16b, v25.16b
        mov     v25.16b, v26.16b
        mov     v26.16b, v27.16b
        mov     v27.16b, v28.16b
        mov     v28.16b, v29.16b
        mov     v29.16b, v30.16b
        mov     v30.16b, v31.16b
        mov     v31.16b, v8.16b
        mov     v8.16b, v9.16b
        mov     v9.16b, v10.16b
        mov     v10.16b, v11.16b
        mov     v11.16b, v12.16b
        mov     v12.16b, v13.16b
        mov     v13.16b, v14.16b
        mov     v14.16b, v15.16b
        mov     v15.16b, v2.16b
        ldr     q2, [sp]                // 16-byte Folded Reload
        fmov    d0, x3
        mul     x3, x12, x1
  [...]

With -consider-local-interval-cost the same section of code results in the
following:

  $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o -
  [...]
  .LBB0_1:                              // %for.cond1.preheader
                                        // =>This Inner Loop Header: Depth=1
        add     x14, x11, x9
        ldp     q0, q1, [x14]
        ldur    x1, [x10, #-256]
        ldur    x2, [x10, #-192]
        add     x9, x9, #64             // =64
        mov     x12, v0.d[1]
        fmov    x15, d0
        mov     x13, v1.d[1]
        fmov    x16, d1
        ldp     q0, q1, [x14, #32]
        mul     x3, x15, x1
        cmp     x9, #512                // =512
        mov     x14, v0.d[1]
        fmov    x17, d0
        fmov    d0, x3
        mul     x3, x12, x1
  [...]

Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet

Reviewed By: dmgreen

Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69437

f649f24d

[RISCV] Fix evaluation of %pcrel_lo · 41449c58

Roger Ferrer authored Nov 08, 2019

The following testcase

  function:
  .Lpcrel_label1:
  	auipc	a0, %pcrel_hi(other_function)
  	addi	a1, a0, %pcrel_lo(.Lpcrel_label1)
  	.p2align	2          # Causes a new fragment to be emitted

  	.type	other_function,@function
  other_function:
  	ret

exposes an odd behaviour in which only the %pcrel_hi relocation is
evaluated but not the %pcrel_lo.

  $ llvm-mc -triple riscv64 -filetype obj t.s | llvm-objdump  -d -r -

  <stdin>:	file format ELF64-riscv

  Disassembly of section .text:
  0000000000000000 function:
         0:	17 05 00 00	auipc	a0, 0
         4:	93 05 05 00	mv	a1, a0
  		0000000000000004:  R_RISCV_PCREL_LO12_I	other_function+4

  0000000000000008 other_function:
         8:	67 80 00 00	ret

The reason seems to be that in RISCVAsmBackend::shouldForceRelocation we
only consider the fragment but in RISCVMCExpr::evaluatePCRelLo we
consider the section. This usually works but there are cases where the
section may still be the same but the fragment may be another one. In
that case we end forcing a %pcrel_lo relocation without any %pcrel_hi.

This patch makes RISCVAsmBackend::shouldForceRelocation use the section,
if any, to determine if the relocation must be forced or not.

Differential Revision: https://reviews.llvm.org/D60657

41449c58

[NFC][IndVarS] Adjust a comment · 7b9f5401
Daniil Suchkov authored Nov 08, 2019
```
(test commit)
```
7b9f5401
[CR] ConstantRange::sshl_sat(): check sigdness of the min/max, not ranges · 72a21ad6
Roman Lebedev authored Nov 08, 2019
```
This was pointed out in review,
but forgot to stage this change into the commit itself..
```
72a21ad6

[ConstantRange] Add `ushl_sat()`/`sshl_sat()` methods. · e0ea842b

Roman Lebedev authored Nov 08, 2019

Summary:
To be used in `ConstantRange::shlWithNoOverflow()`,
may in future be useful for when saturating shift/mul ops are added.

Unlike `ConstantRange::shl()`, these are precise.

Reviewers: nikic, spatel, reames

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69960

e0ea842b

[BPF] turn on -mattr=+alu32 for cpu version v3 and later · 6b8baf30

Yonghong Song authored Nov 07, 2019

-mattr=+alu32 has shown good performance vs. without this attribute.
Based on discussion at
  https://lore.kernel.org/bpf/1ec37838-966f-ec0b-5223-ca9b6eb0860d@fb.com/T/#t
cpu version v3 should support -mattr=+alu32.
This patch enabled alu32 if cpu version is v3, either specified by user
or probed by the llvm.

Differential Revision: https://reviews.llvm.org/D69957

6b8baf30

[PowerPC] Option for enabling absolute jumptables with command line · 9af28400

Nemanja Ivanovic authored Nov 07, 2019

This option allows the user to specify the use of absolute jumptables instead
of relative which is the default on most PPC subtargets.

Patch by Kamauu Bridgeman

Differential revision: https://reviews.llvm.org/D69108

9af28400

[llvm/test] Update test comments · 79367983
Shu-Chun Weng authored Nov 07, 2019

79367983
[MC] Delete defaulted constructor llvm::AsmCond::AsmCond · ddff808e
Fangrui Song authored Nov 07, 2019

ddff808e

[InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement · 6749dc34

Craig Topper authored Nov 07, 2019

x86_mmx is conceptually a vector already. Don't introduce an extra conversion between it and scalar i64.

I'm using VectorType::isValidElementType which checks for floating point, integer, and pointers to hopefully make this more readable than just blacklisting x86_mmx.

Differential Revision: https://reviews.llvm.org/D69964

6749dc34

[InstCombine] auto-generate complete checks; NFC · 2f32da3d
Sanjay Patel authored Nov 07, 2019

2f32da3d

Nov 07, 2019

gn build: Merge 25ee8613 · b4237db2
LLVM GN Syncbot authored Nov 07, 2019

b4237db2

[debugify] Move the Debugify pass from tools/opt to lib/Transform/Utils · 25ee8613

Daniel Sanders authored Nov 06, 2019

Summary:
I need to make use of this pass from a driver program that isn't opt.
Therefore this patch moves this pass into the LLVM library so that it is
available for use elsewhere.

There was one function I kept in tools/opt which is exportDebugifyStats()
this is because it's serializing the statistics into a human readable
format and this seemed more in keeping with opt than a library function

Reviewers: vsk, aprantl

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69926

25ee8613

Revert "[MachineVerifier] Improve verification of live-in lists. · ad3c9d46

Galina Kistanova authored Nov 07, 2019

This reverts commit b7b170c9 to give the author more time to address failing tests on the expensive checks buildbots.

ad3c9d46

[codeview] Reference types in type parent scopes · d91ed80e

Reid Kleckner authored Nov 06, 2019

Without this change, when a nested tag type of any kind (enum, class,
struct, union) is used as a variable type, it is emitted without
emitting the parent type. In CodeView, parent types point to their inner
types, and inner types do not point back to their parents. We already
walk over all of the parent scopes to build the fully qualified name.
This change simply requests their type indices as we go along to enusre
they are all emitted.

Fixes PR43905

Reviewers: akhuang, amccarth

Differential Revision: https://reviews.llvm.org/D69924

d91ed80e

[InstCombine] Add test cases to show bad canonicalization of bitcasts between... · 96119586

Craig Topper authored Nov 07, 2019

[InstCombine] Add test cases to show bad canonicalization of bitcasts between x86_mmx and <1 x i64>.

As the test cases show, we end up with an insert/extract and a
bitcast to/from i64. x86_mmx is for some purposes conceptually a
vector. We shouldn't be adding scalar conversions around it.

Since _m64 is defined as <1 x i64> and intrinsics use x86_mmx
as their input/output these extra scalar operations prevent
the X86 backend from generating good code especially on 32-bit
targets where i64 gets split.

96119586

Wrong debug info generated at -O2 (-O0 is correct) · a087b78b

Vedant Kumar authored Nov 07, 2019

Instcombiner pass was erasing trivially dead instruction without updating dependent llvm.dbg.value.
which was not showing programmer current state of variables while debugging.
As a part of this fix I did following,
Iterate throught all the users (llvm.dbg) of a instruction which is trivially dead and set each if them undef, Before deleting the instruction.
Now user will see optimized out, when try to print those variables.
This fixes
https://bugs.llvm.org/show_bug.cgi?id=43893

This is my first fix to llvm.

Patch by kamlesh kumar!

Differential Revision: https://reviews.llvm.org/D69809

a087b78b

FDRRecords - fix uninitialized variable warnings. NFCI. · 08b5b553
Simon Pilgrim authored Nov 07, 2019

08b5b553
ImutAVLTree::validateTree - fix null dereference typo warning. NFCI. · 4525a43c
Simon Pilgrim authored Nov 07, 2019
```
Noticed by static analyzer.
```
4525a43c
canFoldMergeOpcode returns a bool result not an unsigned. NFCI. · 65c5f4e9
Simon Pilgrim authored Nov 07, 2019

65c5f4e9

[AsmWritter] Fixed "null check after dereferencing" warning · 6e655e58

Dávid Bolvanský authored Nov 06, 2019

Summary: The 'BB->getParent()' pointer was utilized before it was verified against nullptr. Check lines: 3567, 3581.

Reviewers: jyknight, RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69751

6e655e58

[XCOFF] Add back extern template declarations · 03495a98

Reid Kleckner authored Nov 07, 2019

The extern template declarations were fine. The duplicate explicit
instantiations were both in the .cpp file.

03495a98

Revert "[XCOFF] Fix link errors from explicit template instantiation" · 2cb3bfe9

Reid Kleckner authored Nov 07, 2019

This reverts commit c989993b.

maskray already fixed the explicit instantiation definition in the .cpp
file, and these extern template declarations seem to be causing
warnings that I don't understand.

2cb3bfe9

[XCOFF] Fix link errors from explicit template instantiation · c989993b

Reid Kleckner authored Nov 07, 2019

I happen to be using clang-cl+lld-link locally, and I get these link
errors:

lld-link: error: undefined symbol: public: unsigned short __cdecl llvm::object::XCOFFSectionHeader<struct llvm::object::XCOFFSectionHeader64>::getSectionType(void) const
>>> referenced by C:\src\llvm-project\llvm\tools\llvm-readobj\XCOFFDumper.cpp:106
>>>               tools\llvm-readobj\CMakeFiles\llvm-readobj.dir\XCOFFDumper.cpp.obj:(public: virtual void __cdecl `anonymous namespace'::XCOFFDumper::printSectionHeaders(void))

I suspect this is because the explicit template instaniation appears
before the inline method definitions in the .cpp file, so they aren't
available at the point of instantiation. Move the explicit instantiation
later.

Also, forward declare the explicit instantiation for good measure.

c989993b

[llvm-ar] Support verbose mode for operation 'x' · 7d2b0ec3
Fangrui Song authored Nov 06, 2019
```
Reviewed By: jhenderson, kongyi

Differential Revision: https://reviews.llvm.org/D69911
```
7d2b0ec3
[XCOFF] Move explicit instantions after member function definitions to fix clang builds · f8622543
Fangrui Song authored Nov 07, 2019

f8622543

[InstCombine] canonicalize shift+logic+shift to reduce dependency chain · d9ccb636

Sanjay Patel authored Nov 07, 2019

shift (logic (shift X, C0), Y), C1 --> logic (shift X, C0+C1), (shift Y, C1)

This is an IR translation of an existing SDAG transform added here:
rL370617

So we again have 9 possible patterns with a commuted IR variant of each pattern:
https://rise4fun.com/Alive/VlI
https://rise4fun.com/Alive/n1m
https://rise4fun.com/Alive/1Vn

Part of the motivation is to allow easier recognition and subsequent
canonicalization of bswap patterns as discussed in PR43146:
https://bugs.llvm.org/show_bug.cgi?id=43146

We had to delay this transform because it used to allow the SLP vectorizer
to create awful reductions out of simple load-combines.
That problem was fixed with:
rL375025
(we'll bring back load combining in IR someday...)

The backend is also better equipped to deal with these patterns now
using hooks like TLI.getShiftAmountThreshold().

The only remaining potential controversy is that the -reassociate pass
tends to reverse this kind of pattern (to help GVN?). But since -reassociate
doesn't do anything with these specific patterns, there is no conflict currently.

Finally, there's a new pass proposal at D67383 for general tree-height-reduction
reassociation, and it could use a cost model to decide how to optimally rearrange
these kinds of ops for a target. That patch appears to be stalled.

Differential Revision: https://reviews.llvm.org/D69842

d9ccb636

X86FrameLowering - fix bool to unsigned cast static analyzer warnings. NFCI. · 05299c7d
Simon Pilgrim authored Nov 07, 2019

05299c7d
MachineMemOperand::getBaseAlignment() - fix "shift of i32 then extended to... · 205c84dc
Simon Pilgrim authored Nov 07, 2019
```
MachineMemOperand::getBaseAlignment() - fix "shift of i32 then extended to i64" static analyzer warning. NFCI.
```
205c84dc
TypeRecord - fix uninitialized variable warnings. NFCI. · 0e9b5760
Simon Pilgrim authored Nov 07, 2019

0e9b5760
PostRAScheduler - fix uninitialized variable warning. NFCI. · 77cfe83f
Simon Pilgrim authored Nov 07, 2019

77cfe83f
ManagedStringPool - pre-increment iterator. NFC. · d5c4881a
Simon Pilgrim authored Nov 07, 2019

d5c4881a
X86CondBrFolding - remove non-existent fixBranchProb function. NFC. · f0832406
Simon Pilgrim authored Nov 07, 2019

f0832406
AsmWriterOperand - fix uninitialized variable warning. NFCI. · bcd7674e
Simon Pilgrim authored Nov 07, 2019

bcd7674e

Using crtp to refactor the xcoff section header · c63c1a72

diggerlin authored Nov 07, 2019

SUMMARY:
According to https://reviews.llvm.org/D68575#inline-617586, Create a NFC patch for it.

Using crtp to refactor the xcoff section header
Move the define of SectionFlagsReservedMask and SectionFlagsTypeMask from XCOFFDumper.cpp to XCOFFObjectFile.h

Reviewers: hubert.reinterpretcast,jasonliu
Subscribers: rupprecht, seiyai,hiraditya

Differential Revision: https://reviews.llvm.org/D69131

c63c1a72