Commits · a80d5c34e4b99f21fa371160ac7eb7e9db093997 · Lorenzo Albano / LLVM bpEVL

Jan 31, 2022

[SVE] Fix TypeSize->uint64_t implicit conversion in visitAlloca() · 002b944d

Kerry McLaughlin authored Jan 31, 2022

Fixes a crash ('Invalid size request on a scalable vector') in visitAlloca()
when we call this function for a scalable alloca instruction, caused
by the implicit conversion of TySize to uint64_t.
This patch changes TySize to a TypeSize as returned by getTypeAllocSize()
and ensures the allocation size is multiplied by vscale for scalable vectors.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D118372

002b944d

[Analysis] Attribute noundef should not prevent tail call optimization · ae990a3c
Dávid Bolvanský authored Jan 31, 2022
```
Very similar to https://reviews.llvm.org/D101230
Fixes https://github.com/llvm/llvm-project/issues/53501
```
ae990a3c

[X86] combineAnd() - per-element simplification - call SimplifyDemandedBits... · 7ec8fc29

Simon Pilgrim authored Jan 31, 2022

[X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails

We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements.

This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.

7ec8fc29

[DebugInfo][InstrRef] Don't fully propagate single assigned variables · c703d77a

Jeremy Morse authored Jan 31, 2022

If we only assign a variable value a single time, we can take a short-cut
when computing its location: the variable value is only valid up to the
dominance frontier of where the assignemnt happens. Past that point, there
are other predecessors from where the variable has no value, meaning the
variable has no location past that point.

This patch recognises this scenario, and avoids expensive SSA computation,
to improve compile-time performance.

Differential Revision: https://reviews.llvm.org/D117877

c703d77a

[DAG] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero · 2d1390ef
Simon Pilgrim authored Jan 31, 2022

2d1390ef
[X86] Limit mul(x,x) knownbits tests with not undef/poison check · 48f45f6b
Simon Pilgrim authored Jan 31, 2022
```
We can only assume bit[1] == zero if its the only demanded bit or the source is not undef/poison
```
48f45f6b
[mlgo][regalloc] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds... · 0e691aed
Fangrui Song authored Jan 30, 2022
```
[mlgo][regalloc] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after a8a7bf92
```
0e691aed

Jan 30, 2022

[mlgo][regalloc] Fix register masking · a8a7bf92

Mircea Trofin authored Jan 30, 2022

If AllocationOrder has less than 32 elements, we were treating the extra
positions as if they were valid. This was detected by a subsequent
assert. The fix also tightens the asserts.

a8a7bf92

[Support][NFC] Fix generic `ChildrenGetterTy` of `IDFCalculatorBase` · e0b11c76

Markus Böck authored Jan 30, 2022

Both IDFCalculatorBase and its accompanying DominatorTreeBase only supports pointer nodes. The template argument is the block type itself and any uses of GraphTraits is therefore done via a pointer to the node type.
However, the ChildrenGetterTy type of IDFCalculatorBase has a use on just the node type instead of a pointer to the node type. Various parts of the monorepo has worked around this issue by providing specializations of GraphTraits for the node type directly, or not been affected by using specializations instead of the generic case. These are unnecessary however and instead the generic code should be fixed instead.

An example from within Tree is eg. A use of IDFCalculatorBase in InstrRefBasedImpl.cpp. It basically instantiates a IDFCalculatorBase<MachineBasicBlock, false> but due to the bug above then goes on to specialize GraphTraits<MachineBasicBlock> although GraphTraits<MachineBasicBlock*> exists (and should be used instead).

Similar dead code exists in clang which defines redundant GraphTraits to work around this bug.

This patch fixes both the original issue and removes the dead code that was used to work around the issue.

Differential Revision: https://reviews.llvm.org/D118386

e0b11c76

[CodeGen] Use default member initialization (NFC) · 2bea207d
Kazu Hirata authored Jan 30, 2022
```
Identified with modernize-use-default-member-init.
```
2bea207d

Jan 29, 2022
- [MLGO] Regalloc: allow multiple occurences of -regalloc-enable-advisor · bc5644ee
  Mircea Trofin authored Jan 29, 2022
```
This allows scearios where some central config sets it one way and a
user wants to override it.
```
  bc5644ee
Jan 28, 2022

[lld] Add module name to LTO inline asm diagnostic · 33b38339

Fangrui Song authored Jan 28, 2022

Close #52781: for LTO, the inline asm diagnostic uses `<inline asm>` as the file
name (lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp) and it is unclear which
module has the issue.

With this patch, we will see the module name (say `asm.o`) before `<inline asm>` with ThinLTO.

```
% clang -flto=thin -c asm.c && myld.lld asm.o -e f
ld.lld: error: asm.o <inline asm>:1:2: invalid instruction mnemonic 'invalid'
        invalid
        ^~~~~~~
```

For regular LTO, unfortunately the original module name is lost and we only get
ld-temp.o.

Reviewed By: #lld-macho, ychen, Jez Ng

Differential Revision: https://reviews.llvm.org/D118434

33b38339

[DAGCombiner] Fix invalid size request in combineRepeatedFPDivisors · 5d089d9a

Cullen Rhodes authored Jan 28, 2022

If we have a vector FP division with a splatted divisor, use
getVectorMinNumElements when scaling the num of uses by splat factor.

For AArch64 the combine kicks in for the <vscale x 4 x float> case since it's
above the fdiv threshold (3) when scaling num uses by splat factor, but the
codegen is worse (splat + vector fdiv + vector fmul) than the <vscale x 2 x
double> case (splat + vector fdiv).

If the combine could be converted into a scalar FP division by
scalarizeBinOpOfSplats it may be cheaper, but it looks like this is predicated
on the isExtractVecEltCheap TLI function which is implemented for x86 but not
AArch64. Perhaps for now combineRepeatedFPDivisors should only scale num uses
by splat if the division can be converted into scalar op.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D118343

5d089d9a

[MVerifier] Don't check liveness of any debug instruction operands · 76fd78b4

Jeremy Morse authored Jan 28, 2022

Shiny new DBG_PHI instruction usually have physical registers as operands
-- however, the machine verifier checks to see whether they're live, and
occasionally this fails. There's a filter for DBG_VALUE instructions to not
get verified in this way: expand it to exempt all debug instructions from
liveness checking, which means DBG_PHIs get treated like DBG_VALUEs.

This also future proofs against us adding new debug instructions.

Differential Revision: https://reviews.llvm.org/D117891

76fd78b4

[CodeGen] Emit COFF symbol type for function aliases · f7d2afba

Martin Storsjö authored Jan 27, 2022

On the level of the generated object files, both symbols (both
original and alias) are generally indistinguishable - both are
regular defined symbols. But previously, only the original
function had the COFF ComplexType set to IMAGE_SYM_DTYPE_FUNCTION,
while the symbol created via an alias had the type set to
IMAGE_SYM_DTYPE_NULL.

This matches what GCC does, which emits directives for setting the
COFF symbol type for this kind of alias symbol too.

This makes a difference when GNU ld.bfd exports symbols without
dllexport directives or a def file - it seems to decide between
function or data exports based on the COFF symbol type. This means
that functions created via aliases, like some C++ constructors,
are exported as data symbols (missing the thunk for calling without
dllimport).

The hasnt been an issue when doing the same with LLD, as LLD decides
between function or data export based on the flags of the section
that the symbol points at.

This should fix the root cause of
https://github.com/msys2/MINGW-packages/issues/10547.

Differential Revision: https://reviews.llvm.org/D118328

f7d2afba

[InstrProf] Add single byte coverage mode · 11d30742

Ellis Hoag authored Jan 27, 2022

Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
  * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
  * We use a single byte per function rather than 8 bytes per block

The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.

When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).

[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D116180

11d30742

Jan 27, 2022

[DAG] SelectionDAG::getNode(N1,N2) - detect N2 constant vector splats as well as scalars · fdd3e2c9

Simon Pilgrim authored Jan 27, 2022

We already perform some basic folds (add/sub with zero etc.) on scalar types, this patch adds some basic support for constant splats as well in a few cases (we can add more with future test coverage).

In the cases I've enabled, we can handle buildvector implicit truncation as we're not creating new constant nodes from the vector types - we're just returning existing nodes. This allows us to get a number of extra cases in the aarch64 tests.

I haven't enabled support for undefs in buildvector splats, as we're often checking for zero/allones patterns that return the original constant and we shouldn't be returning undef elements in some of these cases - we can enable this later if we're OK with creating new constants.

Differential Revision: https://reviews.llvm.org/D118264

fdd3e2c9

[SelectionDAG][VP] Provide expansion for VP_MERGE · 84e85e02

Fraser Cormack authored Jan 24, 2022

This patch adds support for expanding VP_MERGE through a sequence of
vector operations producing a full-length mask setting up the elements
past EVL/pivot to be false, combining this with the original mask, and
culminating in a full-length vector select.

This expansion should work for any data type, though the only use for
RVV is for boolean vectors, which themselves rely on an expansion for
the VSELECT.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118058

84e85e02

Jan 26, 2022

Fix UB in DwarfExpression::emitLegacyZExt() · ee72b173

Adrian Prantl authored Jan 25, 2022

A shift-left > 63 triggers a UBSAN failure. This patch kicks the can
down the road (to the consumer) by emitting a more compact
representation of the shift computation in DWARF expressions.

Relanding (I accidentally pushed an earlier version of the patch previously).

Differential Revision: https://reviews.llvm.org/D118183

ee72b173

Revert "Fix UB in DwarfExpression::emitLegacyZExt()" · f400a601
Adrian Prantl authored Jan 26, 2022
```
This reverts commit 216002c4
while investigating bot breakage.
```
f400a601

GlobalISel: Avoid crash on asm with lying result types · 2d670de8

Matt Arsenault authored Jan 19, 2022

The physical register in the asm has the wrong type for the declared
IR. It seems to work in the DAG by extracting the 4 elements that are
defined in the IR from the register, but that isn't handled here. This
doesn't seem to be a well tested path since other mismatched cases are
crashing the DAG asm handling.

2d670de8

Fix UB in DwarfExpression::emitLegacyZExt() · 216002c4

Adrian Prantl authored Jan 25, 2022

A shift-left > 63 triggers a UBSAN failure. This patch kicks the can
down the road (to the consumer) by emitting a more compact
representation of the shift computation in DWARF expressions.

Differential Revision: https://reviews.llvm.org/D118183

216002c4

[DebugInfo] Add stringLocationExp field to DIStringType · 28bfa57a

Chih-Ping Chen authored Jan 18, 2022

DIStringType is used to encode the debug info of a character object
in Fortran. A Fortran deferred-length character object is typically
implemented as a pair of the following two pieces of info: An address
of the raw storage of the characters, and the length of the object.
The stringLocationExp field contains the DIExpression to get to the
raw storage.

This patch also enables the emission of DW_AT_data_location attribute
in a DW_TAG_string_type debug info entry based on stringLocationExp
in DIStringType.

A test is also added to ensure that the bitcode reader is backward
compatible with the old DIStringType format.

Differential Revision: https://reviews.llvm.org/D117586

28bfa57a

Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" · f15014ff

Benjamin Kramer authored Jan 26, 2022

This reverts commit ef820632.

- It conflicts with the existing llvm::size in STLExtras, which will now
  never be called.
- Calling it without llvm:: breaks C++17 compat

f15014ff

[SDAG] fix bug in ComputeNumSignBits of target constant · 63daea8b

Sanjay Patel authored Jan 26, 2022

The loop below the changed line assumes that the element
width of the target constant is the same as the element
width of the loaded value, but that is not always true.

We could try harder to do some kind of min/max calc even
if the sizes don't match, but that can be another patch
if needed. This fixes #53401 (miscompile) and does not
change the motivating cases added when this analysis
was introduced:
ad298f86

63daea8b

Rename llvm::array_lengthof into llvm::size to match std::size from C++17 · ef820632

serge-sans-paille authored Jan 26, 2022

As a conquence move llvm::array_lengthof from STLExtras.h to
STLForwardCompat.h (which is included by STLExtras.h so no build
breakage expected).

ef820632

[AMDGPU] Enable divergence-driven XNOR selection · 5157f984

alex-t authored Dec 24, 2021

Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence.
This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32)
on those targets which have no V_XNOR.
Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form.
We assume that xor (not) is already turned into the not (xor) by the combiner.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D116270

5157f984

[AMDGPU][GlobalISel] Combine unmerge of undef · 4723f3cf
Sebastian Neubauer authored Jan 25, 2022
```
Fold (unmerge undef) -> undef, undef, ...

Differential Revision: https://reviews.llvm.org/D118138
```
4723f3cf

[DAG] Create fptoui.sat from clamped fptoui · 57356d6b

David Green authored Jan 26, 2022

This is the unsigned variant of D111976, where we convert a clamped
fptoui to a fptoui.sat. Because we are unsigned, the condition this time
is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D114964

57356d6b

[regalloc] Fix assertion error when LiveInterval is empty · 85974582

wangpc authored Jan 26, 2022

When evicting interference, it causes an asseertion error
since LiveIntervals::intervalIsInOneMBB assumes that input
is not empty.

This patch fixed bug mentioned in D118020.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D118124

85974582

Jan 25, 2022

Revert accidentally pushed commit. It was not yet reviewed. · 3efa016d
Adrian Prantl authored Jan 25, 2022
```
"Fix UB in DwarfExpression::emitLegacyZExt()"

This reverts commit e37de5d3.
```
3efa016d

Fix UB in DwarfExpression::emitLegacyZExt() · e37de5d3

Adrian Prantl authored Jan 25, 2022

A shift-left > 63 triggers a UBSAN failure. This patch kicks the can
down the road (to the consumer) by emitting a more compact
representation of the shift computation in DWARF expressions.

Differential Revision: https://reviews.llvm.org/D118183

e37de5d3

[PowerPC][AIX] Override markFunctionEnd() · a2505bd0

Sean Fertile authored Jan 20, 2022

During fast-isel calling 'markFunctionEnd' in the base class will call
tidyLandingPads. This can cause an issue where we have determined that
we need ehinfo and emitted a traceback table with the bits set to
indicate that we will be emitting the ehinfo, but the tidying deletes
all landing pads. In this case we end up emitting a reference to
__ehinfo.N symbol, but not emitting a definition to said symbol and the
resulting file fails to assemble.

Differential Revision: https://reviews.llvm.org/D117040

a2505bd0

[GlobalISel] Avoid pointer element type access during InlineAsm lowering · a3a2239a
Nikita Popov authored Jan 25, 2022
```
Same change as has been made for the SDAG lowering.
```
a3a2239a

[DAG] visitMULHS/MULHU/AND - remove some redundant LHS constant checks · 15e2be29

Simon Pilgrim authored Jan 25, 2022

Now that we constant fold and canonicalize constants to the RHS, we don't need to check both LHS and RHS for specific constants

15e2be29

[DAGCombine] Fold SRA of a load into a narrower sign-extending load · 109cc5ad

Bjorn Pettersson authored Jan 12, 2022

An sra is basically sign-extending a narrower value. Fold away the
shift by doing a sextload of a narrower value, when it is legal to
reduce the load width accordingly.

Differential Revision: https://reviews.llvm.org/D116930

109cc5ad

[SelectionDAG][VP] Add widening support for VP_MERGE · 7cb452bf

Fraser Cormack authored Jan 24, 2022

This patch adds widening support for ISD::VP_MERGE, which widens
identically to VP_SELECT and similarly to other select-like nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118030

7cb452bf

[SelectionDAG][VP] Add splitting support for VP_MERGE · 5f5c5603

Fraser Cormack authored Jan 24, 2022

This patch adds splitting support for ISD::VP_MERGE, which splits
identically to VP_SELECT and similarly to other select-like nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D118032

5f5c5603

[LegalizeTypes][VP] Add splitting support for vp.gather and vp.scatter · 2233befa

Victor Perez authored Jan 25, 2022

Split these nodes in a similar way as their masked versions.

Reviewed By: frasercrmck, craig.topper

Differential Revision: https://reviews.llvm.org/D117760

2233befa

[NFC] Remove uses of PointerType::getElementType() · aa97bc11

Nikita Popov authored Jan 21, 2022

Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.

aa97bc11