Commits · 0de8aeae7249d314f25b5188c91b04b9a24003ad · Lorenzo Albano / LLVM bpEVL

Mar 10, 2021

[VPlan] Support to widen select intructions in VPlan native path · 0de8aeae

Mauri Mustonen authored Mar 10, 2021

Add support to widen select instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer.

Previously select instructions get handled by the wrong recipe and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D97136

0de8aeae

GlobalISel: Try to combine G_[SU]DIV and G_[SU]REM · 4c6ab48f

Christudasan Devadasan authored Mar 10, 2021

It is good to have a combined `divrem` instruction when the
`div` and `rem` are computed from identical input operands.
Some targets can lower them through a single expansion that
computes both division and remainder. It effectively reduces
the number of instructions than individually expanding them.

Reviewed By: arsenm, paquette

Differential Revision: https://reviews.llvm.org/D96013

4c6ab48f

[SampleFDO] Support enabling -funique-internal-linkage-name. · ee35784a

Wei Mi authored Jan 19, 2021

now -funique-internal-linkage-name flag is available, and we want to flip
it on by default since it is beneficial to have separate sample profiles
for different internal symbols with the same name. As a preparation, we
want to avoid regression caused by the flip.

When we flip -funique-internal-linkage-name on, the profile is collected
from binary built without -funique-internal-linkage-name so it has no uniq
suffix, but the IR in the optimized build contains the suffix. This kind of
mismatch may introduce transient regression.

To avoid such mismatch, we introduce a NameTable section flag indicating
whether there is any name in the profile containing uniq suffix. Compiler
will decide whether to keep uniq suffix during name canonicalization
depending on the NameTable section flag. The flag is only available for
extbinary format. For other formats, by default compiler will keep uniq
suffix so they will only experience transient regression when
-funique-internal-linkage-name is just flipped.

Another type of regression is caused by places where we miss to call
getCanonicalFnName. Those places are fixed.

Differential Revision: https://reviews.llvm.org/D96932

ee35784a

Mar 09, 2021

Revert "[llvm-cov] reset executation count to 0 after wrapped segment" · 8d5c3ae3
Zequan Wu authored Mar 05, 2021
```
This reverts D85036

Differential Revision: https://reviews.llvm.org/D98084
```
8d5c3ae3

[Support][test] Unconditionally use setenv macro when compiling on Windows · 1956288f

Markus Böck authored Mar 09, 2021

This test currently fails to compile when using a MinGW toolchain as setenv is not defined. This function is a POSIX function Windows does not implement.

This patch enables the setenv macro used in the unit test for all of Windows, making the test compile and run successfully.

Differential Revision: https://reviews.llvm.org/D98271

1956288f

[DebugInfo] Add replaceArg function to simplify DBG_VALUE_LIST expressions · f0513413

gbtozers authored Sep 11, 2020

The LiveDebugValues and LiveDebugVariables implementations for handling
DBG_VALUE_LIST instructions can be simplified significantly if they do not have
to deal with any duplicated operands, such as a DBG_VALUE_LIST that uses the
same register multiple times in its expression. This patch adds a function,
replaceArg, that can be used to simplify a DIExpression in the case of
duplicated operands.

Differential Revision: https://reviews.llvm.org/D83896

f0513413

[DebugInfo] Handle multiple variable location operands in IR · df69c694

gbtozers authored Sep 30, 2020

This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.

Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.

Differential Revision: https://reviews.llvm.org/D88232

df69c694

[llvm-readelf] Support dumping the BB address map section with --bb-addr-map. · c245c21c

Rahman Lavaee authored Mar 08, 2021

This patch lets llvm-readelf dump the content of the BB address map
section in the following format:
```
Function {
  At: <address>
  BB entries [
    {
      Offset:   <offset>
      Size:     <size>
      Metadata: <metadata>
    },
    ...
  ]
}
...
```

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D95511

c245c21c

Mar 08, 2021

Fix 2: [DebugInfo] Support DIArgList in DbgVariableIntrinsic · 57a0e0d4

Stephen Tozer authored Mar 08, 2021

Changes to function calls in LocalTest resulted in comparisons between
unsigned values and signed literals; the latter have been updated to be
unsigned to prevent this warning.

57a0e0d4

[DebugInfo] Support DIArgList in DbgVariableIntrinsic · e5d958c4

gbtozers authored Sep 30, 2020

This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.

Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.

e5d958c4

Mar 05, 2021

[DebugInfo] Add DIArgList MD to store multple values in DbgVariableIntrinsics · 65600cb2

gbtozers authored Sep 30, 2020

This patch adds a new metadata node, DIArgList, which contains a list of SSA
values. This node is in many ways similar in function to the existing
ValueAsMetadata node, with the difference being that it tracks a list instead of
a single value. Internally, it uses ValueAsMetadata to track the individual
values, but there is also a reasonable amount of DIArgList-specific
value-tracking logic on top of that. Similar to ValueAsMetadata, it is a special
case in parsing and printing due to the fact that it requires a function state
(as it may reference function-local values).

This patch should not result in any immediate functional change; it allows for
DIArgLists to be parsed and printed, but debug variable intrinsics do not yet
recognize them as a valid argument (outside of parsing).

Differential Revision: https://reviews.llvm.org/D88175

65600cb2

Mar 04, 2021

Reland [GlobalISel] Start using vectors in GISelKnownBits · d7834556

Petar Avramovic authored Mar 04, 2021

This is recommit of 4c8fb7dd.
MIR in one unit test had mismatched types.

For vectors we consider a bit as known if it is the same for all demanded
vector elements (all elements by default). KnownBits BitWidth for vector
type is size of vector element. Add support for G_BUILD_VECTOR.
This allows combines of urem_pow2_to_mask in pre-legalizer combiner.

Differential Revision: https://reviews.llvm.org/D96122

d7834556

Revert "[Support] Add raw_ostream_iterator: ostream_iterator for raw_ostream" · 6b8cf735

Nicolas Guillemot authored Mar 04, 2021

This reverts commit 7479a2e0.

This commit causes compile errors on clang-x64-windows-msvc, so I'm
reverting the patch for now.

For reference, the error in question is:

```
error C2280: 'llvm::raw_ostream_iterator<char,char>
&llvm::raw_ostream_iterator<char,char>::operator =(const
llvm::raw_ostream_iterator<char,char> &)': attempting to reference a deleted
function

note: compiler has generated 'llvm::raw_ostream_iterator<char,char>::operator ='
here

note: 'llvm::raw_ostream_iterator<char,char>
&llvm::raw_ostream_iterator<char,char>::operator =(const
llvm::raw_ostream_iterator<char,char> &)': function was implicitly deleted
because 'llvm::raw_ostream_iterator<char,char>' has a data member
'llvm::raw_ostream_iterator<char,char>::OutStream' of reference type
```

6b8cf735

[Support] Add raw_ostream_iterator: ostream_iterator for raw_ostream · 7479a2e0

Nicolas Guillemot authored Mar 04, 2021

Adds a class `raw_ostream_iterator` that behaves like
std::ostream_iterator, but can be used with raw_ostream.
This is useful for using raw_ostream with std algorithms.

For example, it can be used to output std containers as follows:

```
std::vector<int> V = { 1, 2, 3 };
std::copy(V.begin(), V.end(), raw_ostream_iterator<int>(outs(), ", "));
// Output: "1, 2, 3, "
```

The API tries to follow std::ostream_iterator as closely as is
practically possible.

Reviewed By: dblaikie, mkitzan

Differential Revision: https://reviews.llvm.org/D78795

7479a2e0

[mir] Fix confusing MIR when MMO's value is nullptr but offset is non-zero · 9fc2be6f

Daniel Sanders authored Mar 02, 2021

:: (store 1 + 4, addrspace 1)
->
:: (store 1 into undef + 4, addrspace 1)

An offset without a base isn't terribly useful but it's convenient to update
the offset without checking the value. For example, when breaking apart
stores into smaller units

Differential Revision: https://reviews.llvm.org/D97812

9fc2be6f

[CMake][AIX] Adjust plugin library extension used on AIX · e9f9ec83

Xiangling Liao authored Mar 04, 2021

As stated in the CMake manual, we are supposed to use MODULE rules to generate
plugin libraries:

"MODULE libraries are plugins that are not linked into other targets but may be
loaded dynamically at runtime using dlopen-like functionality"

Besides, LLVM's plugin infrastructure fits with the AIX treatment of .so
shared objects more than it fits with the AIX treatment of .a library archives
(which may contain shared objects).

Differential revision: https://reviews.llvm.org/D96282

e9f9ec83

[Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC · 36a489d1

Sanjay Patel authored Mar 04, 2021

Similar to b3a33553, but this shows a TODO and a potential
miscompile is already present.

We are tracking an FP instruction that does *not* have FMF (reassoc)
properties, so calling that "Unsafe" seems opposite of the common
reading.

I also removed one getter method by rolling the null check into
the access. Further simplification may be possible.

The motivation is to clean up the interactions between FMF and
function-level attributes in these classes and their callers.

The new test shows that there is an existing bug somewhere in
the callers. We assumed that the original code was fully 'fast'
and so we produced IR with 'fast' even though it was just 'reassoc'.

36a489d1

Revert "[GlobalISel] Start using vectors in GISelKnownBits" · 4b101536
Nico Weber authored Mar 04, 2021
```
This reverts commit 4c8fb7dd.
Breaks check-llvm everywhere, see https://reviews.llvm.org/D96122
```
4b101536

[GlobalISel] Start using vectors in GISelKnownBits · 4c8fb7dd

Petar Avramovic authored Mar 04, 2021

For vectors we consider a bit as known if it is the same for all demanded
vector elements (all elements by default). KnownBits BitWidth for vector
type is size of vector element. Add support for G_BUILD_VECTOR.
This allows combines of urem_pow2_to_mask in pre-legalizer combiner.

Differential Revision: https://reviews.llvm.org/D96122

4c8fb7dd

[ARM] Remove new ARMSelectionDAGTest unittest. · 098aea95

David Green authored Mar 04, 2021

This removes the unit test from a968e7b8 as it reportedly causes
some link problems. It can be reinstated once the issues are understood
and sorted out.

098aea95

[ARM] Fix linking of the new unittest from a968e7b8 · 1bdb6366
Martin Storsjö authored Mar 04, 2021

1bdb6366

[ARM] KnownBits for CSINC/CSNEG/CSINV · a968e7b8

David Green authored Mar 04, 2021

This adds some simple known bits handling for the three CSINC/NEG/INV
instructions. From the operands known bits we can compute the common
bits of the first operand and incremented/negated/inverted second
operand. The first, especially CSINC ZR, ZR, comes up fair amount in the
tests. The others are more rare so a unit test for them is added.

Differential Revision: https://reviews.llvm.org/D97788

a968e7b8

Mar 03, 2021

[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm · c3ce7bae

Piotr Sobczak authored Mar 02, 2021

 * Introduce the new intrinsic amdgcn_strict_wwm
 * Deprecate the old intrinsic amdgcn_wwm

The change is done for consistency as the "strict"
prefix will become an important, distinguishing factor
between amdgcn_wqm and amdgcn_strictwqm in the future.

The "strict" prefix indicates that inactive lanes do not
take part in control flow, specifically an inactive lane
enabled by a strict mode will always be enabled irrespective
of control flow decisions.

The amdgcn_wwm will be removed, but doing so in two steps
gives users time to switch to the new name at their own pace.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96257

c3ce7bae

Mar 02, 2021

GlobalISel: Merge and cleanup more AMDGPU call lowering code · fd82cbcf

Matt Arsenault authored Feb 09, 2021

This merges more AMDGPU ABI lowering code into the generic call
lowering. Start cleaning up by factoring away more of the pack/unpack
logic into the buildCopy{To|From}Parts functions. These could use more
improvement, and the SelectionDAG versions are significantly more
complex, and we'll eventually have to emulate all of those cases too.

This is mostly NFC, but does result in some minor instruction
reordering. It also removes some of the limitations with mismatched
sizes the old code had. However, similarly to the merge on the input,
this is forcing gfx6/gfx7 to use the gfx8+ ABI (which is what we
actually want, but SelectionDAG is stuck using the weird emergent
ABI).

This also changes the load/store size for stack passed EVTs for
AArch64, which makes it consistent with the DAG behavior.

fd82cbcf

[AA] Cache (optionally) estimated PartialAlias offsets. · 6e967834

dfukalov authored Dec 21, 2020

For the cases of two clobbering loads and one loaded object is fully contained
in the second `BasicAAResult::aliasGEP` returns just `PartialAlias` that
is actually more common case of partial overlap, it doesn't say anything about
actual overlapping sizes.

AA users such as GVN and DSE have no functionality to estimate aliasing of GEPs
with non-constant offsets. The change stores estimated relative offsets so they
can be used further.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93529

6e967834

AArch64: report fp16 arithmetic is present for apple-a11 CPU. · 888c5c24
Tim Northover authored Mar 02, 2021
```
AArch64.td got it right, but the target-parser dropped it, leading to missing
feature flags in Clang.
```
888c5c24
[Orc] Fix a file header (NFC) · b66b73be
Stefan Gränitz authored Feb 18, 2021

b66b73be

Feb 26, 2021

[debug-info] refactor emitDwarfUnitLength · d39bc36b

Chen Zheng authored Feb 25, 2021

remove `Hi` `Lo` argument from `emitDwarfUnitLength`, so we
can make caller of emitDwarfUnitLength easier.

Reviewed By: MaskRay, dblaikie, ikudrin

Differential Revision: https://reviews.llvm.org/D96409

d39bc36b

Feb 24, 2021

Revert "[Profile] Include a few asserts in coverage mapping test" · ae7528a3
Petr Hosek authored Feb 24, 2021
```
This reverts commit 80f329bc.
```
ae7528a3

[Profile] Include a few asserts in coverage mapping test · 80f329bc

Petr Hosek authored Feb 24, 2021

These should catch any accidental use of the compilation directory.

Differential Revision: https://reviews.llvm.org/D97402

80f329bc

Make sure some types are indeed trivially_copyable per llvm::is_trivially_copyable · ca0bb0e8

serge-sans-paille authored Feb 11, 2021

Test a few types used as llvm::SmallVector parameter. It is important to ensure
we have a consistent behavior for these types to prevent ABI issues as the one
we met in https://bugs.llvm.org/show_bug.cgi?id=39427.

Differential Revision: https://reviews.llvm.org/D96536

ca0bb0e8

[Coverage][Unittest] Fix stringref issue · ff6dc053

Jinsong Ji authored Feb 24, 2021

We will pass StringRef and change it in reader.
But we reuse the same Filename vector without clear it,
so in some systems, we may clobbeer previous results.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D97353

ff6dc053

[Debug-Info][NFC] move emitDwarfUnitLength to MCStreamer class · be5d92e3

Chen Zheng authored Feb 23, 2021

We may need to do some customization for DWARF unit length in DWARF
section headers for some targets for some code generation path.

For example, for XCOFF in assembly path, AIX assembler does not require
the debug section containing its debug unit length in the header.

Move emitDwarfUnitLength to MCStreamer class so that we can do
customization in different Streamers

Reviewed By: ikudrin

Differential Revision: https://reviews.llvm.org/D95932

be5d92e3

Feb 23, 2021

[WebAssembly] Fix incorrect grouping and sorting of exceptions · ea8c6375

Heejin Ahn authored Feb 17, 2021

This CL is not big but contains changes that span multiple analyses and
passes. This description is very long because it tries to explain basics
on what each pass/analysis does and why we need this change on top of
that. Please feel free to skip parts that are not necessary for your
understanding.

---

`WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next
unwind destination>. The value (unwind dest) here is where an exception
should end up when it is not caught by the key (EH pad). We record this
info in WasmEHPrepare to fix catch mismatches, because the CFG itself
does not have this info. A CFG only contains BBs and
predecessor-successor relationship between them, but in `WasmEHFuncInfo`
the unwind destination BB is not necessarily a successor or the key EH
pad BB. Their relationship can be intuitively explained by this C++ code
snippet:
```
try {
  try {
    foo();
  } catch (int) { // EH pad
    ...
  }
} catch (...) {   // unwind destination
}
```
So when `foo()` throws, it goes to `catch (int)` first. But if it is not
caught by it, it ends up in the next unwind destination `catch (...)`.
This unwind destination is what you see in `catchswitch`'s
`unwind label %bb` part.

---

`WebAssemblyExceptionInfo` groups exceptions so that they can be sorted
continuously together in CFGSort, as we do for loops. What this analysis
does is very simple: it creates a single `WebAssemblyException` per EH
pad, and all BBs that are dominated by that EH pad are included in this
exception. We also identify subexception relationship in this way: if
EHPad A domiantes EHPad B, EHPad B's exception is a subexception of
EHPad A's exception.

This simple rule turns out to be incorrect in some cases. In
`WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means
semantically EHPad B should not be included in EHPad A's exception,
because it does not make sense to rethrow/delegate to an inner scope.
This is what happened in CFGStackify as a result of this:
```
try
  try
  catch
    ...   <- %dest_bb is among here!
  end
delegate %dest_bb
```

So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to
make sure excptions' unwind destinations are not subexceptions of
their unwind sources in `WasmEHFuncInfo`.

But this alone does not prevent `dest_bb` in the example above from
being sorted within the inner `catch`'s exception, even if its exception
is not a subexception of that `catch`'s exception anymore, because of
how CFGSort works, which will be explained below.

---

CFGSort places BBs within the same `SortRegion` (loop or exception)
continuously together so they can be demarcated with `loop`-`end_loop`
or `catch`-`end_try` in CFGStackify.

`SortRegion` is a wrapper for one of `MachineLoop` or
`WebAssemblyException`. `SortRegionInfo` already does some complicated
things because there discrepancies between those two data structures.
`WebAssemblyException` is what we control, and it is defined as an EH
pad as its header and BBs dominated by the header as its BBs (with a
newly added exception of unwind destinations explained in the previous
paragraph). But `MachineLoop` is an LLVM data structure and uses the
standard loop detection algorithm. So by the algorithm, BBs that are 1.
dominated by the loop header and 2. have a path back to its header.
Because of the second condition, many BBs that are dominated by the loop
header are not included in the loop. So BBs that contain `return` or
branches to outside of the loop are not technically included in
`MachineLoop`, but they can be sorted together with the loop with no
problem.

Maybe to relax the condition, in CFGSort, when we are in a `SortRegion`
we allow sorting of not only BBs that belong to the current innermost
region but also BBs that are by the current region header.
(This was written this way from the first version written by Dan, when
only loops existed.) But now, we have cases in exceptions when EHPad B
is the unwind destination for EHPad A, even if EHPad B is dominated by
EHPad A it should not be included in EHPad A's exception, and should not
be sorted within EHPad A.

One way to make things work, at least correctly, is change `dominates`
condition to `contains` condition for `SortRegion` when sorting BBs, but
this will change compilation results for existing non-EH code and I
can't be sure it will not degrade performance or code size. I think it
will degrade performance because it will force many BBs dominated by a
loop, which don't have the path back to the header, to be placed after
the loop and it will likely to create more branches and blocks.

So this does a little hacky check when adding BBs to `Preferred` list:
(`Preferred` list is a ready list. CFGSort maintains ready list in two
priority queues: `Preferred` and `Ready`. I'm not very sure why, but it
was written that way from the beginning. BBs are first added to
`Preferred` list and then some of them are pushed to `Ready` list, so
here we only need to guard condition for `Preferred` list.)

When adding a BB to `Preferred` list, we check if that BB is an unwind
destination of another BB. To do this, this adds the reverse mapping,
`UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB
is an unwind destination, it checks if the current stack of regions
(`Entries`) contains its source BB by traversing the stack backwards. If
we find its unwind source in there, we add the BB to its `Deferred`
list, to make sure that unwind destination BB is added to `Preferred`
list only after that region with the unwind source BB is sorted and
popped from the stack.

---

This does not contain a new test that crashes because of this bug, but
this fix changes the result for one of existing test case. This test
case didn't crash because it fortunately didn't contain `delegate` to
the incorrectly placed unwind destination BB.

Fixes https://github.com/emscripten-core/emscripten/issues/13514.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97247

ea8c6375

[Support] Add reserve() method to the raw_ostream. · 875b3b2c

Alexey Lapshin authored Feb 08, 2021

If resulting size of the output stream is already known,
then the space for stream data could be preliminary
allocated in some cases. f.e. raw_string_ostream could
preallocate the space for the target string(it allows
to avoid reallocations during writing into the stream).

Differential Revision: https://reviews.llvm.org/D91693

875b3b2c

[GlobalISel] Implement narrowScalar for SADDE/SSUBE/UADDE/USUBE · 8f956a5e
Cassie Jones authored Feb 22, 2021
```
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96673
```
8f956a5e
[GlobalISel] Implement narrowScalar for SADDO/SSUBO · e1532649
Cassie Jones authored Feb 22, 2021
```
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96672
```
e1532649
[GlobalISel] Implement narrowScalar for UADDO/USUBO · c63b33b7
Cassie Jones authored Feb 22, 2021
```
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96671
```
c63b33b7
[JITLink] Add a getFixupAddress convenience method to Block. · 430817d0
Lang Hames authored Feb 23, 2021

430817d0
[JITLink] Don't allow creation of sections with duplicate names. · adf2098b
Lang Hames authored Feb 22, 2021

adf2098b