Commits · 05da2fe52162c80dfa18aedf70cf73cb11201811 · Lorenzo Albano / LLVM bpEVL

Nov 14, 2019

Sink all InitializePasses.h includes · 05da2fe5

Reid Kleckner authored Nov 13, 2019

This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211

05da2fe5

Forward declare Optional<T> in STLExtras.h · a36f3163
Reid Kleckner authored Nov 13, 2019
```
WIP stats
```
a36f3163

Sink MachineFunction private method out of line · 364d1785

Reid Kleckner authored Nov 13, 2019

This method is private and only called from this file and doesn't need
to be inline. Saves a TargetMachine.h include in MachineFunction.h, a
popular header. The include was introduced in 98603a81 despite the
forward decl of LLVMTargetMachine.

364d1785

[X86] Don't treat mxcsr as a register name when parsing MS inline assembly. · 188d92b9
Craig Topper authored Nov 13, 2019
```
No instruction takes mxcsr as a an operand so we should always
treat it as an identifier name.
```
188d92b9

Nov 13, 2019

[X86] Don't set the operation action for i16 SINT_TO_FP to Promote just because SSE1 is enabled. · f7e9d81a
Craig Topper authored Nov 13, 2019
```
Instead do custom promotion in the handler so that we can still
allow i16 to be used with fp80. And f64 without sse2.
```
f7e9d81a
[X86] Fix typo in comment. NFC · 787595b2
Craig Topper authored Nov 13, 2019

787595b2

[X86] Move all the FP_TO_XINT/XINT_TO_FP setOperationActions into the same... · fee90672

Craig Topper authored Nov 13, 2019

[X86] Move all the FP_TO_XINT/XINT_TO_FP setOperationActions into the same !useSoftFloat block. Qualify all of the Promote actions for these with !useSoftFloat too. NFCI

The Promote action doesn't apply until LegalizeDAG. By the time
we get there, we would have already softened all the FP operations
if useSoftFloat was true. So there wouldn't be any operation left
to Promote.

fee90672

[PGO][PGSO] Temporarily disable the large working set size behavior. · 3f0969da

Hiroshi Yamauchi authored Nov 13, 2019

Summary:
This temporarily disables the large working set size behavior in profile guided
size optimization due to internal benchmark regressions.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70207

3f0969da

[SLP] fix miscompile on min/max reductions with extra uses (PR43948) · a3e61946

Sanjay Patel authored Nov 13, 2019

The bug manifests as replacing a reduction operand with an undef
value.

The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.

In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.

For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.

Differential Revision: https://reviews.llvm.org/D70148

a3e61946

[TargetLowering] Increase the storage size of NumRegistersForVT to allow the... · 84e83b54

Craig Topper authored Nov 13, 2019

[TargetLowering] Increase the storage size of NumRegistersForVT to allow the type break down for v256i1 and other types to be stored correctly

v256i1 on X86 without avx512 breaks down to 256 i8 values when passed between basic blocks. But the NumRegistersForVT was sized at a byte for each VT. This results in 256 being stored as 0.

This patch enlarges the type to 16 bits and adds an assert to ensure that no information is lost when the entry is stored.

Differential Revision: https://reviews.llvm.org/D70138

84e83b54

[mips] Reduce number of nested `if` statements. NFC · 63bbbcde
Simon Atanasyan authored Nov 12, 2019

63bbbcde
[mips] Add tests to check `jal sym+offset`. NFC · 3216d284
Simon Atanasyan authored Nov 09, 2019

3216d284

[LiveInterval] Allow updating subranges with slightly out-dated IR · de94cda8

Quentin Colombet authored Nov 12, 2019

During register coalescing, we update the live-intervals on-the-fly.
To do that we are in this strange mode where the live-intervals can
be slightly out-of-sync (more precisely they are forward looking)
compared to what the IR actually represents.
This happens because the register coalescer only updates the IR when
it is done with updating the live-intervals and it has to do it this
way because updating the IR on-the-fly would actually clobber some
information on how the live-ranges that are being updated look like.

This is problematic for updates that rely on the IR to accurately
represents the state of the live-ranges. Right now, we have only
one of those: stripValuesNotDefiningMask.
To reconcile this need of out-of-sync IR, this patch introduces a
new argument to LiveInterval::refineSubRanges that allows the code
doing the live range updates to reason about how the code should
look like after the coalescer will have rewritten the registers.
Essentially this captures how a subregister index with be offseted
to match its position in a new register class.

E.g., let say we want to merge:
    V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32>

We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32>
overlap, i.e., by choosing a class where we can find "offset + 1 == 3".
Put differently we align V2's sub3 with V1's sub1:
    V2: sub0 sub1 sub2 sub3
    V1: <offset>  sub0 sub1

This offset will look like a composed subregidx in the the class:
     V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
 =>  V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>

Now if we didn't rewrite the uses and def of V1, all the checks for V1
need to account for this offset to match what the live intervals intend
to capture.

Prior to this patch, we would fail to recognize the uses and def of V1
and would end up with machine verifier errors: No live segment at def.
This could lead to miscompile as we would drop some live-ranges and
thus, miss some interferences.

For this problem to trigger, we need to reach stripValuesNotDefiningMask
while having a mismatch between the IR and the live-ranges (i.e.,
we have to apply a subreg offset to the IR.)

This requires the following three conditions:
1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1>
2. An update with Tuple registers with a possibility to coalesce the
   subreg index: e.g., v1.dsub_1 == v2.dsub_3
3. Subreg liveness enabled.

looking at the IR to decide what is alive and what is not, i.e., calling
stripValuesNotDefiningMask.
coalescer maintains for the live-ranges information.

None of the targets that currently use subreg liveness (i.e., the targets
that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and
and #2, so this patch also artificial enables subreg liveness for ARM,
so that a nice test case can be attached.

de94cda8

[AArch64][v8.3a] Add missing imp-defs on RETA*. · 7313d7d6

Ahmed Bougacha authored Jun 25, 2019

RETA always implicitly uses LR, unlike RET which merely has an
alias that defaults it to LR.
Additionally, RETA implicitly uses SP as well, which it uses as
a discriminator to authenticate LR.

This isn't usually noticeable, because RET_ReallyLR is used in most
of the backend.  However, the post-RA scheduler, if enabled, will
cause miscompiles if the imp-uses are missing.

While there, fix a typo in the lone affected testcase.

7313d7d6

[AArch64][v8.3a] Add LDRA '[xN]!' alias. · 643ac6c0

Ahmed Bougacha authored Jun 25, 2019

The instruction definition has been retroactively expanded to
allow for an alias for '[xN, 0]!' as '[xN]!'.
That wouldn't make sense on LDR, but does for LDRA.

643ac6c0

Fix typo in DwarfDebug [NFC] · 7417cc14
David Stenberg authored Nov 13, 2019

7417cc14
[SLP] reduce code duplication for min/max vs. other reductions; NFCI · e9bf7a60
Sanjay Patel authored Nov 13, 2019

e9bf7a60
Fix comment spelling {addresing -> addressing} (NFC) · e5f3760e
Matthew Malcomson authored Nov 13, 2019

e5f3760e

[InstCombine] propagate fast-math-flags (FMF) to select when inverting fcmp+select · 3d6b5398

Sanjay Patel authored Nov 13, 2019

As noted by the FIXME comment, this is not correct based on our current FMF semantics.
We should be propagating FMF from the final value in a sequence (in this case the
'select'). So the behavior even without this patch is wrong, but we did not allow FMF
on 'select' until recently.

But if we do the correct thing right now in this patch, we'll inevitably introduce
regressions because we have not wired up FMF propagation for 'phi' and 'select' in
other passes (like SimplifyCFG) or other places in InstCombine. I'm not seeing a
better incremental way to make progress.

That said, the potential extra damage over the existing wrong behavior from this
patch is very limited. AFAIK, the only way to have different FMF on IR in the same
function is if we have LTO inlined IR from 2 modules that were compiled using
different fast-math settings.

As seen in the tests, we may actually see some improvements with this patch because
adding the FMF to the 'select' allows matching to min/max intrinsics that were
previously missed (in the common case, the 'fcmp' and 'select' should have identical
FMF to begin with).

Next steps in the transition:

Make similar changes in instcombine as needed.
Enable phi-to-select FMF propagation in SimplifyCFG.
Remove dependencies on fcmp with FMF.
Deprecate FMF on fcmp.

Differential Revision: https://reviews.llvm.org/D69720

3d6b5398

DWARFDebugLoclists: Add an api to get the location lists of a DWARF unit · 1eea3fa0

Pavel Labath authored Nov 08, 2019

Summary:
This avoid the need to duplicate the location lists searching logic in
various users. The "inline location list dumping" code (which is the
only user actually updated to handle DWARF v5 location lists)  is
switched to this method. After adding v4 location list support, I'll
switch other users too.

Reviewers: dblaikie, probinson, JDevlieghere, aprantl, SouraVX

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70084

1eea3fa0

PowerPC - fix uninitialized variable warnings. NFCI. · 86f07e82
Simon Pilgrim authored Nov 13, 2019

86f07e82
Fix uninitialized variable warning. NFCI. · e1670175
Simon Pilgrim authored Nov 13, 2019

e1670175
Fix uninitialized variable warning. NFCI. · 29a5a6ee
Simon Pilgrim authored Nov 13, 2019

29a5a6ee
Fix uninitialized variable warning. NFCI. · 6ebc5089
Simon Pilgrim authored Nov 13, 2019

6ebc5089
Sparc - fix uninitialized variable warnings. NFCI. · b3be859b
Simon Pilgrim authored Nov 13, 2019

b3be859b
PPCReduceCRLogicals - fix static analyzer warnings. NFC · 66f2ed07
Simon Pilgrim authored Nov 13, 2019
```
- Fix uninitialized variable warnings.
- Fix null dereference warnings.
```
66f2ed07
SLPVectorizer - make comparison operators + isInSchedulingRegion const · d1bd5e47
Simon Pilgrim authored Nov 13, 2019
```
Fixes cppcheck warnings.
```
d1bd5e47

[InstCombine] Avoid moving ops that do restrict undef across shuffles. · f7499011

Florian Hahn authored Nov 13, 2019

I think we have to be a bit more careful when it comes to moving
ops across shuffles, if the op does restrict undef. For example, without
this patch, we would move 'and %v, <0, 0, -1, -1>' over a
'shufflevector %a, undef, <undef, undef, 1, 2>'. As a result, the first
2 lanes of the result are undef after the combine, but they really
should be 0, unless I am missing something.

For ops that do fold to undef on undef operands, the current behavior
should be fine. I've add conservative check OpDoesRestrictUndef, maybe
there's a better existing utility?

Reviewers: spatel, RKSimon, lebedev.ri

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D70093

f7499011

Revert "[RISCV] Fix wrong CFI directives" · c5b56caa
Luís Marques authored Nov 13, 2019
```
test/DebugInfo/RISCV/relax-debug-frame.ll wasn't properly updated.
```
c5b56caa

[ARM][MVE] canTailPredicateLoop · d90804d2

Sjoerd Meijer authored Nov 13, 2019

This implements TTI hook 'preferPredicateOverEpilogue' for MVE.  This is a
first version and it operates on single block loops only. With this change, the
vectoriser will now determine if tail-folding scalar remainder loops is
possible/desired, which is the first step to generate MVE tail-predicated
vector loops.

This is disabled by default for now. I.e,, this is depends on option
-disable-mve-tail-predication, which is off by default.

I will follow up on this soon with a patch for the vectoriser to respect loop
hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled,
we don't want to tail-fold and we shouldn't query this TTI hook, which is
done in D70125.

Differential Revision: https://reviews.llvm.org/D69845

d90804d2

[RISCV] Fix wrong CFI directives · a5ce8bd7

Luís Marques authored Nov 13, 2019

Summary: Removes CFI CFA directives that could incorrectly propagate
beyond the basic block they were inteded for. Specifically it removes
the epilogue CFI directives. See the branch_and_tail_call test for an
example of the issue. Should fix the stack unwinding issues caused by
the incorrect directives.

Reviewers: asb, lenary, shiva0217
Reviewed By: lenary
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69723

a5ce8bd7

[X86][AVX] Add plausible schedule classes to MASKPAIR/VP2INTERSECT/VDPBF16PS instructions · 4d0e7b62

Simon Pilgrim authored Nov 13, 2019

These are really just placeholders that use approximately the right resources - once we have CPUs scheduler models that support these instructions they will need revisiting.

In the meantime this means that all instructions have a class of some kind., meaning models can be more easily flagged as complete.

4d0e7b62

Revert "[ValueTracking] Allow context-sensitive nullness check for non-pointers" · 6ea47759

Hans Wennborg authored Nov 13, 2019

This caused miscompiles of Chromium (https://crbug.com/1023818). The reduced
repro is small enough to fit here:

  $ cat /tmp/a.c
  unsigned char f(unsigned char *p) {
    unsigned char result = 0;
    for (int shift = 0; shift < 1; ++shift)
      result |= p[0] << (shift * 8);
    return result;
  }
  $ bin/clang -O2 -S -o - /tmp/a.c | grep -A4 f:
  f:                                      # @f
          .cfi_startproc
  # %bb.0:                                # %entry
          xorl    %eax, %eax
          retq

That's nicely optimized, but I don't think it's the right result :-)

> Same as D60846 but with a fix for the problem encountered there which
> was a missing context adjustment in the handling of PHI nodes.
>
> The test that caused D60846 to be reverted was added in e15ab8f2.
>
> Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam
>
> Subscribers: hiraditya, bollu, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D69571

This reverts commit 57dd4b03.

6ea47759

[Mips] Add rematerialization support for ldi.fmt · fed17867

Mirko Brkusanin authored Nov 13, 2019

Instruction ldi.fmt can be considered cheap enough to avoid spill and restore
of value that it produces since it's loaded from immediate.

Differential Revision: https://reviews.llvm.org/D69898

fed17867

[mips] Show an error if 64-bit target triple provided with 32-bit CPU · 068db2ed

Simon Atanasyan authored Nov 13, 2019

When a 64-bit triple is used emit an error if the CPU only supports
32-bit code.

Patch by Miloš Stojanović.

Differential Revision: https://reviews.llvm.org/D70018

068db2ed

Temporarily revert "[InstCombine] Fold PHIs with equal incoming pointers" · cba4a277
Daniil Suchkov authored Nov 13, 2019
```
Revert due to sanitizer-windows buildbot failure.

This reverts commit bbb29738.
```
cba4a277

[DebugInfo] Avoid creating entry values for clobbered registers · 5e646ff5

David Stenberg authored Nov 13, 2019

Summary:
Entry values are considered for parameters that have register-described
DBG_VALUEs in the entry block (along with other conditions).

If a parameter's value has been propagated from the caller to the
callee, then the parameter's DBG_VALUE in the entry block may be
described using a register defined by some instruction, and entry values
should not be emitted for the parameter, which can currently occur.
One such case was seen in the attached test case, in which the second
parameter, which is described by a redefinition of the first parameter's
register, would incorrectly get an entry value using the first
parameter's register. This commit intends to solve such cases by keeping
track of register defines, and ignoring DBG_VALUEs in the entry block
that are described by such registers.

In a RelWithDebInfo build of clang-8, the average size of the set was
27, and in a RelWithDebInfo+ASan build it was 30.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: djtodoro, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D69889

5e646ff5

[DebugInfo] Add helper for finding entry value candidates [NFC] · 4fec44cd

David Stenberg authored Nov 13, 2019

Summary:
The conditions that are used to determine if entry values should be
emitted for a parameter are quite many, and will grow slightly
in a follow-up commit, so move those to a helper function, as was
suggested in the code review for D69889.

Reviewers: djtodoro, NikolaPrica

Reviewed By: djtodoro

Subscribers: probinson, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69955

4fec44cd

[AArch64] Extend storeRegToStackSlot to spill SVE registers. · 3367686b

Sander de Smalen authored Nov 13, 2019

This patch allows the register allocator to spill SVE registers to the stack.

Reviewers: ostannard, efriedma, rengolin, cameron.mcinally

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D70082

3367686b

[InstCombine] Fold PHIs with equal incoming pointers · bbb29738

Daniil Suchkov authored Nov 08, 2019

In case when all incoming values of a PHI are equal pointers, this
transformation inserts a definition of such a pointer right after
definition of the base pointer and replaces with this value both PHI and
all it's incoming pointers. Primary goal of this transformation is
canonicalization of this pattern in order to enable optimizations that
can't handle PHIs. Non-inbounds pointers aren't currently supported.

Reviewers: spatel, RKSimon, lebedev.ri, apilipenko

Reviewed By: apilipenko

Tags: #llvm

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D68128

bbb29738