Commits · 9f08ce0d2197d4f163dfa4633eae2347ce8fc881 · Lorenzo Albano / LLVM bpEVL

Nov 08, 2019

Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)" · 9f08ce0d
Gil Rapaport authored Nov 08, 2019
```
This reverts commit 11ed1c02 - causes an assert failure.
```
9f08ce0d

Reapply [LVI] Normalize pointer behavior · 885a05f4

Nikita Popov authored Nov 08, 2019

Fix cache invalidation by not guarding the dereferenced pointer cache
erasure by SeenBlocks. SeenBlocks is only populated when actually
caching a value in the block, which doesn't necessarily have to happen
just because dereferenced pointers were calculated.

-----

Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.

The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.

This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this change not fully NFC, because we can now fold
terminator icmps based on the dereferencability information in the
same block. This is the reason why I changed one JumpThreading test
(it would optimize the condition away without the change).

Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.

I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.

Differential Revision: https://reviews.llvm.org/D69914

885a05f4

[clang] Add VFS support for sanitizers' blacklists · 590f279c
Jan Korous authored Nov 08, 2019
```
Differential Revision: https://reviews.llvm.org/D69648
```
590f279c

[cmake] Remove LLVM_{BUILD,LINK}_LLVM_DYLIB options on Windows · 3ffbf972

Tom Stellard authored Nov 08, 2019

Summary: The options aren't supported so they can be removed.

Reviewers: beanz, smeenai, compnerd

Reviewed By: compnerd

Subscribers: mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69877

3ffbf972

[ThinLTO] Fix bug when importing writeonly variables · 7f92d66f

evgeny authored Nov 08, 2019

Patch enables import of write-only variables with non-trivial initializers
to fix linker errors. Initializers of imported variables are converted to
'zeroinitializer' to avoid promotion of referenced objects.

Differential revision: https://reviews.llvm.org/D70006

7f92d66f

[cmake] Remove SVN support from VersionFromVCS.cmake · caad2170

Tom Stellard authored Oct 31, 2019

Reviewers: phosek

Subscribers: mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69682

caad2170

[JumpThreading] Fix a comment typo (NFC) · 9aff5e1c

Kazu Hirata authored Nov 08, 2019

Reviewers: kazu

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70013

9aff5e1c

Revert "[LVI] Normalize pointer behavior" · 43ae5f43

Nikita Popov authored Nov 08, 2019

This reverts commit 15bc4dc9.

clang-cmake-x86_64-sde-avx512-linux buildbot reported quite a few
compile-time regressions in test-suite, will investigate.

43ae5f43

[LVI] Normalize pointer behavior · 15bc4dc9

Nikita Popov authored Nov 05, 2019

Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.

The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.

This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this change not fully NFC, because we can now fold
terminator icmps based on the dereferencability information in the
same block. This is the reason why I changed one JumpThreading test
(it would optimize the condition away without the change).

Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.

I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.

Differential Revision: https://reviews.llvm.org/D69914

15bc4dc9

PODSmallVector - fix uninitialized variable warnings. NFCI. · 1471a7dd
Simon Pilgrim authored Nov 08, 2019

1471a7dd
TimeTraceProfiler - fix uninitialized variable warning. NFCI. · abe9dd4e
Simon Pilgrim authored Nov 08, 2019

abe9dd4e
Obj2YamlError - fix uninitialized variable warning. NFCI. · 66f63d18
Simon Pilgrim authored Nov 08, 2019

66f63d18
CrashRecoveryContextCleanup - fix uninitialized variable warnings. NFCI. · 24d507f4
Simon Pilgrim authored Nov 08, 2019
```
Remove default values from constructor.
```
24d507f4

[LICM] Support hosting of dynamic allocas out of loops · 8d22100f

Philip Reames authored Nov 08, 2019

This patch implements a correct, but not terribly useful, transform. In particular, if we have a dynamic alloca in a loop which is guaranteed to execute, and provably not captured, we hoist the alloca out of the loop. The capture tracking is needed so that we can prove that each previous stack region dies before the next one is allocated. The transform decreases the amount of stack allocation needed by a linear factor (e.g. the iteration count of the loop).

Now, I really hope no one is actually using dynamic allocas. As such, why this patch?

Well, the actual problem I'm hoping to make progress on is allocation hoisting. There's a large draft patch out for review (https://reviews.llvm.org/D60056), and this patch was the smallest chunk of testable functionality I could come up with which takes a step vaguely in that direction.

Once this is in, it makes motivating the changes to capture tracking mentioned in TODOs testable. After that, I hope to extend this to trivial malloc free regions (i.e. free dominating all loop exits) and allocation functions for GCed languages.

Differential Revision: https://reviews.llvm.org/D69227

8d22100f

[LICM] Hoisting of widenable conditions out of loops · 787dba7a

Philip Reames authored Nov 08, 2019

The change itself is straight forward and obvious, but ... there's an existing test checking for exactly the opposite. Both I and Artur think this is simply conservatism in the initial implementation.  If anyone bisects a problem to this, a counter example will be very interesting.

Differential Revision: https://reviews.llvm.org/D69907

787dba7a

[CostModel] Fixed isExtractSubvectorMask for undef index off end · 0703db39

Tim Renouf authored Nov 08, 2019

ShuffleVectorInst::isExtractSubvectorMask, introduced in
  [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368)

erroneously thought that
%340 = shufflevector <4 x float> %339, <4 x float> undef, <3 x i32> <i32 2, i32 3, i32 undef>

is a subvector extract, even though it goes off the end of the parent
vector with the undef index. That then caused an assert in
BasicTTIImplBase::getExtractSubvectorOverhead.

This commit fixes that, by not considering the above a subvector
extract.

Differential Revision: https://reviews.llvm.org/D70005

Change-Id: I87b8b00b24bef19ffc9a1b82ef4eca3b8a246eaf

0703db39

[PowerPC] Remove redundant CRSET/CRUNSET in custom lowering of known CR bit spills · a3db9c08

Yi-Hong Lyu authored Sep 12, 2019

We lower known CR bit spills (CRSET/CRUNSET) to load and spill the known value
but forgot to remove the redundant spills.

e.g., This sequence was used to spill a CRUNSET:
    crclr   4*cr5+lt
    mfocrf  r3,4
    rlwinm  r3,r3,20,0,0
    stw     r3,132(r1)

Custom lowering of known CR bit spills lower it to:
    crxor 4*cr5+lt, 4*cr5+lt, 4*cr5+lt
    li  r3,0
    stw r3,132(r1)

crxor is redundant if there is no use of 4*cr5+lt so we should remove it

Differential revision: https://reviews.llvm.org/D67722

a3db9c08

raw_ostream - fix static analyzer warnings. NFCI. · 9ee76ab3
Simon Pilgrim authored Nov 08, 2019
```
 - uninitialized variables
 - make BufferKind a scoped enum class
```
9ee76ab3
YAMLTraits.h - fix uninitialized variable warning. NFCI. · c8f0bb40
Simon Pilgrim authored Nov 08, 2019

c8f0bb40
[NFC] ConstantRange::subWithNoWrap(): fixup comment · 7dddfa2a
Roman Lebedev authored Nov 08, 2019

7dddfa2a

[ConstantRange] Add umul_sat()/smul_sat() methods · 5a9fd76d

Roman Lebedev authored Nov 08, 2019

Summary:
To be used in `ConstantRange::mulWithNoOverflow()`,
may in future be useful for when saturating shift/mul ops are added.

These are precise as far as i can tell.

I initially though i will need `APInt::[us]mul_sat()` for these,
but it turned out much simpler to do what `ConstantRange::multiply()`
does - perform multiplication in twice the bitwidth, and then truncate.
Though here we want saturating signed truncation.

Reviewers: nikic, reames, spatel

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69994

5a9fd76d

[APInt] Add saturating truncation methods · 9ca363d8

Roman Lebedev authored Nov 08, 2019

Summary:
The signed one is needed for implementation of `ConstantRange::smul_sat()`,
unsigned is for completeness only.

Reviewers: nikic, RKSimon, spatel

Reviewed By: nikic

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69993

9ca363d8

find_interesting_reviews.py: avoid crash on non-ascii data. · 1f592ecf
Kristof Beyls authored Nov 08, 2019

1f592ecf
OutputStream - fix static analyzer warnings. NFCI. · 43eeaa14
Simon Pilgrim authored Nov 08, 2019
```
 - uninitialized variables
 - make getBufferCapacity() const
```
43eeaa14
directory_entry - fix uninitialized variable warning. NFCI. · c9021d74
Simon Pilgrim authored Nov 08, 2019

c9021d74
Timer - fix uninitialized variable warnings. NFCI. · b2a1593f
Simon Pilgrim authored Nov 08, 2019

b2a1593f
ReplacementItem - fix uninitialized variable warning. NFCI. · f6fa57cf
Simon Pilgrim authored Nov 08, 2019

f6fa57cf
Hashing - fix uninitialized variable warnings. NFCI. · 483ed646
Simon Pilgrim authored Nov 08, 2019

483ed646

[llvm-xray] Add AArch64 to llvm-xray extract · 1d321434

Aditya Kumar authored Nov 07, 2019

This required adding support for resolving R_AARCH64_ABS64 relocations to
get accurate addresses for function names to resolve.

Authored by: ianlevesque (Ian Levesque)
Reviewers: dberris, phosek, smeenai, tetsuo-cpp
Differential Revision: https://reviews.llvm.org/D69967

1d321434

gn build: Merge 0dc0572b · f96de257
LLVM GN Syncbot authored Nov 08, 2019

f96de257

[XCOFF][AIX] Differentiate usage of label symbol and csect symbol · 0dc0572b

Jason Liu authored Nov 08, 2019

Summary:
 We are using symbols to represent label and csect interchangeably before, and that could be a problem.
There are cases we would need to add storage mapping class to the symbol if that symbol is actually the name of a csect, but it's hard for us to figure out whether that symbol is a label or csect.

This patch intend to do the following:
    1. Construct a QualName (A name include the storage mapping class)
       MCSymbolXCOFF for every MCSectionXCOFF.
    2. Keep a pointer to that QualName inside of MCSectionXCOFF.
    3. Use that QualName whenever we need a symbol refers to that
       MCSectionXCOFF.
    4. Adapt the snowball effect from the above changes in
       XCOFFObjectWriter.cpp.

Reviewers: xingxue, DiggerLin, sfertile, daltenty, hubert.reinterpretcast

Reviewed By: DiggerLin, daltenty

Subscribers: wuzish, nemanjai, mgorny, hiraditya, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69633

0dc0572b

[AMDGPU][MC] Corrected src0 for v_movrelsd_b32 and v_movrelsd_2_b32 · e25bc5e0

Dmitry Preobrazhensky authored Nov 08, 2019

See https://bugs.llvm.org/show_bug.cgi?id=40903

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D69888

e25bc5e0

[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI) · 11ed1c02

Gil Rapaport authored Oct 07, 2019

This recommits 100e797a (reverted in
009e0326 for failing an assert). While the
root cause was independently reverted in eaff3004,
this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not
try to sink branch instructions.

11ed1c02

BinaryStream - fix static analyzer warnings. NFCI. · ef459ded
Simon Pilgrim authored Nov 08, 2019
```
 - uninitialized variables
 - documention warnings
 - shadow variable names
```
ef459ded

Reland: [TII] Use optional destination and source pair as a return value; NFC · 8d2ccd1a

Djordje Todorovic authored Nov 08, 2019

Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
to return optional machine operand pair of destination and source
registers.

Patch by Nikola Prica

Differential Revision: https://reviews.llvm.org/D69622

8d2ccd1a

[cmake] Enable thin lto cache when building with lld-link · 0a8bd77e

Russell Gallop authored Nov 08, 2019

This was enabled for other platforms. Added option for Windows/lld-link.

Differential Revision: https://reviews.llvm.org/D69941

0a8bd77e

Revert "[codeview] Reference types in type parent scopes" · ff3b5134

Hans Wennborg authored Nov 08, 2019

This triggered asserts in the Chromium build, see https://crbug.com/1022729 for
details and reproducer.

> Without this change, when a nested tag type of any kind (enum, class,
> struct, union) is used as a variable type, it is emitted without
> emitting the parent type. In CodeView, parent types point to their inner
> types, and inner types do not point back to their parents. We already
> walk over all of the parent scopes to build the fully qualified name.
> This change simply requests their type indices as we go along to enusre
> they are all emitted.
>
> Fixes PR43905
>
> Reviewers: akhuang, amccarth
>
> Differential Revision: https://reviews.llvm.org/D69924

ff3b5134

[RAGreedy] Enable -consider-local-interval-cost for AArch64 · f649f24d

Sanne Wouda authored Nov 08, 2019

Summary:
The greedy register allocator occasionally decides to insert a large number of
unnecessary copies, see below for an example.  The -consider-local-interval-cost
option (which X86 already enables by default) fixes this.  We enable this option
for AArch64 only after receiving feedback that this change is not beneficial for
PowerPC.

We evaluated the impact of this change on compile time, code size and
performance benchmarks.

This option has a small impact on compile time, measured on CTMark. A 0.1%
geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5%
on individual benchmarks.

The effect on both code size and performance on AArch64 for the LLVM test suite
is nil on the geomean with individual outliers (ignoring short exec_times)
between:

                 best     worst
  size..text     -3.3%    +0.0%
  exec_time      -5.8%    +2.3%

On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at
most) in code size on some benchmarks, with a tiny movement (-0.01%) on the
geomean.  Neither intrate nor fprate show any change in performance.

This patch makes the following changes.

- For the AArch64 target, enableAdvancedRASplitCost() now returns true.

- Ensures that -consider-local-interval-cost=false can disable the new
  behaviour if necessary.

This matrix multiply example:

   $ cat test.c
   long A[8][8];
   long B[8][8];
   long C[8][8];

   void run_test() {
     for (int k = 0; k < 8; k++) {
       for (int i = 0; i < 8; i++) {
	 for (int j = 0; j < 8; j++) {
	   C[i][j] += A[i][k] * B[k][j];
	 }
       }
     }
   }

results in the following generated code on AArch64:

  $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o -
  [...]
                                        // %for.cond1.preheader
                                        // =>This Inner Loop Header: Depth=1
        add     x14, x11, x9
        str     q0, [sp, #16]           // 16-byte Folded Spill
        ldr     q0, [x14]
        mov     v2.16b, v15.16b
        mov     v15.16b, v14.16b
        mov     v14.16b, v13.16b
        mov     v13.16b, v12.16b
        mov     v12.16b, v11.16b
        mov     v11.16b, v10.16b
        mov     v10.16b, v9.16b
        mov     v9.16b, v8.16b
        mov     v8.16b, v31.16b
        mov     v31.16b, v30.16b
        mov     v30.16b, v29.16b
        mov     v29.16b, v28.16b
        mov     v28.16b, v27.16b
        mov     v27.16b, v26.16b
        mov     v26.16b, v25.16b
        mov     v25.16b, v24.16b
        mov     v24.16b, v23.16b
        mov     v23.16b, v22.16b
        mov     v22.16b, v21.16b
        mov     v21.16b, v20.16b
        mov     v20.16b, v19.16b
        mov     v19.16b, v18.16b
        mov     v18.16b, v17.16b
        mov     v17.16b, v16.16b
        mov     v16.16b, v7.16b
        mov     v7.16b, v6.16b
        mov     v6.16b, v5.16b
        mov     v5.16b, v4.16b
        mov     v4.16b, v3.16b
        mov     v3.16b, v1.16b
        mov     x12, v0.d[1]
        fmov    x15, d0
        ldp     q1, q0, [x14, #16]
        ldur    x1, [x10, #-256]
        ldur    x2, [x10, #-192]
        add     x9, x9, #64             // =64
        mov     x13, v1.d[1]
        fmov    x16, d1
        ldr     q1, [x14, #48]
        mul     x3, x15, x1
        mov     x14, v0.d[1]
        fmov    x17, d0
        mov     x18, v1.d[1]
        fmov    x0, d1
        mov     v1.16b, v3.16b
        mov     v3.16b, v4.16b
        mov     v4.16b, v5.16b
        mov     v5.16b, v6.16b
        mov     v6.16b, v7.16b
        mov     v7.16b, v16.16b
        mov     v16.16b, v17.16b
        mov     v17.16b, v18.16b
        mov     v18.16b, v19.16b
        mov     v19.16b, v20.16b
        mov     v20.16b, v21.16b
        mov     v21.16b, v22.16b
        mov     v22.16b, v23.16b
        mov     v23.16b, v24.16b
        mov     v24.16b, v25.16b
        mov     v25.16b, v26.16b
        mov     v26.16b, v27.16b
        mov     v27.16b, v28.16b
        mov     v28.16b, v29.16b
        mov     v29.16b, v30.16b
        mov     v30.16b, v31.16b
        mov     v31.16b, v8.16b
        mov     v8.16b, v9.16b
        mov     v9.16b, v10.16b
        mov     v10.16b, v11.16b
        mov     v11.16b, v12.16b
        mov     v12.16b, v13.16b
        mov     v13.16b, v14.16b
        mov     v14.16b, v15.16b
        mov     v15.16b, v2.16b
        ldr     q2, [sp]                // 16-byte Folded Reload
        fmov    d0, x3
        mul     x3, x12, x1
  [...]

With -consider-local-interval-cost the same section of code results in the
following:

  $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o -
  [...]
  .LBB0_1:                              // %for.cond1.preheader
                                        // =>This Inner Loop Header: Depth=1
        add     x14, x11, x9
        ldp     q0, q1, [x14]
        ldur    x1, [x10, #-256]
        ldur    x2, [x10, #-192]
        add     x9, x9, #64             // =64
        mov     x12, v0.d[1]
        fmov    x15, d0
        mov     x13, v1.d[1]
        fmov    x16, d1
        ldp     q0, q1, [x14, #32]
        mul     x3, x15, x1
        cmp     x9, #512                // =512
        mov     x14, v0.d[1]
        fmov    x17, d0
        fmov    d0, x3
        mul     x3, x12, x1
  [...]

Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet

Reviewed By: dmgreen

Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69437

f649f24d

[RISCV] Fix evaluation of %pcrel_lo · 41449c58

Roger Ferrer authored Nov 08, 2019

The following testcase

  function:
  .Lpcrel_label1:
  	auipc	a0, %pcrel_hi(other_function)
  	addi	a1, a0, %pcrel_lo(.Lpcrel_label1)
  	.p2align	2          # Causes a new fragment to be emitted

  	.type	other_function,@function
  other_function:
  	ret

exposes an odd behaviour in which only the %pcrel_hi relocation is
evaluated but not the %pcrel_lo.

  $ llvm-mc -triple riscv64 -filetype obj t.s | llvm-objdump  -d -r -

  <stdin>:	file format ELF64-riscv

  Disassembly of section .text:
  0000000000000000 function:
         0:	17 05 00 00	auipc	a0, 0
         4:	93 05 05 00	mv	a1, a0
  		0000000000000004:  R_RISCV_PCREL_LO12_I	other_function+4

  0000000000000008 other_function:
         8:	67 80 00 00	ret

The reason seems to be that in RISCVAsmBackend::shouldForceRelocation we
only consider the fragment but in RISCVMCExpr::evaluatePCRelLo we
consider the section. This usually works but there are cases where the
section may still be the same but the fragment may be another one. In
that case we end forcing a %pcrel_lo relocation without any %pcrel_hi.

This patch makes RISCVAsmBackend::shouldForceRelocation use the section,
if any, to determine if the relocation must be forced or not.

Differential Revision: https://reviews.llvm.org/D60657

41449c58

[NFC][IndVarS] Adjust a comment · 7b9f5401
Daniil Suchkov authored Nov 08, 2019
```
(test commit)
```
7b9f5401