Commits · ba22c403b2b316d59902ba55e8774a7a34d2d056 · Lorenzo Albano / LLVM bpEVL

Oct 24, 2020
- [Inliner][NPM] Properly pass callee AAResults · ba22c403
  Arthur Eubanks authored Oct 16, 2020
```
Fixes noalias-calls.ll under NPM.

Differential Revision: https://reviews.llvm.org/D89592
```
  ba22c403
Oct 23, 2020

GC-parseable element atomic memcpy/memmove · 6ec2c5e4

Artur Pilipenko authored Oct 01, 2020

This change introduces a GC parseable lowering for element atomic
memcpy/memmove intrinsics. This way runtime can provide an
implementation which can take a safepoint during copy operation.

See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev
for the background and details:
https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ

Differential Revision: https://reviews.llvm.org/D88861

6ec2c5e4

[IR] add fn attr for no_stack_protector; prevent inlining on mismatch · b7926ce6

Nick Desaulniers authored Oct 23, 2020

It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.

It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining. Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.

Fixes pr/47479.

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D87956

b7926ce6

[LSR] ignore profitable chain when reg num is not major cost. · 1e0b6c1d
Chen Zheng authored Oct 20, 2020
```
Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D89665
```
1e0b6c1d

[InstCombine] matchBSwapOrBitReverse - expose bswap/bitreverse matching flags. · 1cab3bf0

Simon Pilgrim authored Oct 23, 2020

matchBSwapOrBitReverse was hardcoded to just match bswaps - we're going to need to expose the ability to match bitreverse as well, so make this part of the function call.

1cab3bf0

[InstCombine] Rename InstCombinerImpl::matchBSwap to matchBSwapOrBitReverse. NFCI. · 19a13bf5
Simon Pilgrim authored Oct 23, 2020
```
This matches bswap and bitreverse intrinsics, so we should make that clear in the function name.
```
19a13bf5

[mem2reg] Remove dbg.values describing contents of dead allocas · fea067bd

OCHyams authored Oct 23, 2020

This patch copies @vsk's fix to instcombine from D85555 over to mem2reg. The
motivation and rationale are exactly the same: When mem2reg removes an alloca,
it erases the dbg.{addr,declare} instructions which refer to the alloca. It
would be better to instead remove all debug intrinsics which describe the
contents of the dead alloca, namely all dbg.value(<dead alloca>, ...,
DW_OP_deref)'s.

As far as I can tell, prior to D80264 these `dbg.value+deref`s would have been
silently dropped instead of being made `undef`, so we're just returning to
previous behaviour with these patches.

Testing:
`llvm-lit llvm/test` and `ninja check-clang` gave no unexpected failures. Added
3 tests, each of which covers a dbg.value deletion path in mem2reg:
  mem2reg-promote-alloca-1.ll
  mem2reg-promote-alloca-2.ll
  mem2reg-promote-alloca-3.ll
The first is based on the dexter test inlining.c from D89543. This patch also
improves the debugging experience for loop.c from D89543, which suffers
similarly after arg promotion instead of inlining.

fea067bd

[SVE]Clarify TypeSize comparisons in llvm/lib/Transforms · 24156364

Caroline Concatto authored Oct 16, 2020

Use isKnownXY comparators when one of the operands can be with
scalable vectors or getFixedSize() for all the other cases.

This patch also does bug fixes for getPrimitiveSizeInBits by using
getFixedSize() near the places with the TypeSize comparison.

Differential Revision: https://reviews.llvm.org/D89703

24156364

[SCEV][NFC] Cache symbolic max exit count · 6e574abf

Max Kazantsev authored Oct 23, 2020

We want to have a caching version of symbolic BE exit count
rather than recompute it every time we need it.

Differential Revision: https://reviews.llvm.org/D89954
Reviewed By: nikic, efriedma

6e574abf

[Inliner] Run always-inliner in inliner-wrapper · 0291e2c9

Arthur Eubanks authored Sep 01, 2020

An alwaysinline function may not get inlined in inliner-wrapper due to
the inlining order.

Previously for the following, the inliner would first inline @a() into @b(),

```
define void @a() {
entry:
  call void @b()
  ret void
}

define void @b() alwaysinline {
entry:
  br label %for.cond

for.cond:
  call void @a()
  br label %for.cond
}
```

making @b() recursive and unable to be inlined into @a(), ending at

```
define void @a() {
entry:
  call void @b()
  ret void
}

define void @b() alwaysinline {
entry:
  br label %for.cond

for.cond:
  call void @b()
  br label %for.cond
}
```

Running always-inliner first makes sure that we respect alwaysinline in more cases.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46945.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D86988

0291e2c9

Oct 22, 2020

Revert "[CodeExtractor] Don't create bitcasts when inserting lifetime markers (NFCI)" · 099bffe7

Vedant Kumar authored Oct 22, 2020

This reverts commit 26ee8aff.

It's necessary to insert bitcast the pointer operand of a lifetime
marker if it has an opaque pointer type.

rdar://70560161

099bffe7

Port -instnamer to NPM · 92d9a386

Arthur Eubanks authored Oct 21, 2020

Some clang tests use this.

Reviewed By: akhuang

Differential Revision: https://reviews.llvm.org/D89931

92d9a386

[InstCombine][NFC] Use ConstantExpr::getBinOpIdentity · d49911c2

Layton Kifer authored Oct 22, 2020

Delete duplicate implementation getSelectFoldableConstant and
replace with ConstantExpr::getBinOpIdentity.

Differential Revision: https://reviews.llvm.org/D89839

d49911c2

[MemCpyOpt] Move GEP during call slot optimization · 3e375431

Nikita Popov authored Oct 17, 2020

When performing a call slot optimization to a GEP destination, it
will currently usually fail, because the GEP is directly before the
memcpy and as such does not dominate the call. We should move it
above the call if that satisfies the domination requirement.

I think that a constant-index GEP is the only useful thing to move
here, as otherwise isDereferenceablePointer couldn't look through
it anyway. As such I'm not trying to generalize this further.

Differential Revision: https://reviews.llvm.org/D89623

3e375431

[NFC][PartialInliner]: Clean up code · e6521ce0

Ettore Tiotto authored Oct 22, 2020

Make member function const where possible, use LLVM_DEBUG to print debug traces
rather than a custom option, pass by reference to avoid null checking, ...

Reviewed By: fhann

Differential Revision: https://reviews.llvm.org/D89895

e6521ce0

[InstCombine] Remove dbg.values describing contents of dead allocas · 3419252a

Vedant Kumar authored Oct 20, 2020

When InstCombine removes an alloca, it erases the dbg.{addr,declare}
instructions which refer to the alloca. It would be better to instead
remove all debug intrinsics which describe the contents of the dead
alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s.

This effectively undoes work performed in an InstCombine run earlier in
the pipeline by LowerDbgDeclare, which inserts DW_OP_deref dbg.values
before CallInst users of an alloca. The motivating example looks like:

```
  define void @foo(i32 %0) {
    %a = alloca i32              ; This alloca is erased.
    store i32 %0, i32* %a
    dbg.value(i32 %0, "arg0")    ; This dbg.value survives.
    dbg.value(i32* %a, "arg0", DW_OP_deref)
    call void @trivially_inlinable_no_op(i32* %a)
    ret void
  }
```

If the DW_OP_deref dbg.value is not erased, it becomes dbg.value(undef)
after inlining, making "arg0" unavailable. But we already have dbg.value
descriptions of the alloca's value (from LowerDbgDeclare), so the
DW_OP_deref dbg.value cannot serve its purpose of describing an
initialization of the alloca by some callee. It invalidates other useful
dbg.values, causing large gaps in location coverage, so we should delete
it (even though doing so may cause stale dbg.values to appear, if
there's a dead store to `%a` in @trivially_inlinable_no_op).

OTOH, it wouldn't be correct to delete all dbg.value descriptions of an
alloca. Note that it's possible to describe a variable that takes on
different pointer values, e.g.:

```
  void use(int *);
  void t(int a, int b) {
    int *local = &a;     // dbg.value(i32* %a.addr, "local")
    local = &b;          // dbg.value(i32* undef, "local")
    use(&a);             //           (note: %b.addr is optimized out)
    local = &a;          // dbg.value(i32* %a.addr, "local")
  }
```

In this example, the alloca for "b" is erased, but we need to describe
the value of "local" as <unavailable> before the call to "use". This
prevents "local" from appearing to be equal to "&a" at the callsite.

rdar://66592859

Differential Revision: https://reviews.llvm.org/D85555

3419252a

[IRCE] consolidate profitability check · 75d0e0cd

Serguei Katkov authored Oct 20, 2020

Use BFI if it is available and BPI otherwise.
This is a promised follow-up after D89541.

Reviewers: ebrevnov, mkazantsev
Reviewed By: ebrevnov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D89773

75d0e0cd

Revert "Revert "SimplifyCFG: Clean up optforfuzzing implementation"" · 2f293411
Zequan Wu authored Oct 21, 2020
```
This reverts commit 716f7636.
```
2f293411
Revert "SimplifyCFG: Clean up optforfuzzing implementation" · 716f7636
Zequan Wu authored Oct 21, 2020
```
See discussion: https://reviews.llvm.org/D89590
This reverts commit cdd006ee.
```
716f7636

Oct 21, 2020

[BlockExtract][NewPM] Port -extract-blocks to NPM · 8d9466a3
Arthur Eubanks authored Oct 07, 2020
```
Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89015
```
8d9466a3

[LowerMatrixIntrinsics][NewPM] Fix PreservedAnalyses result · aa6c3053

Arthur Eubanks authored Oct 09, 2020

PreservedCFGCheckerInstrumentation was saying that LowerMatrixIntrinsics
didn't properly preserve CFG even though it claimed to. The legacy pass
says it doesn't. Match the legacy pass's preserved analyses.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89175

aa6c3053

[RS4GC] NFC. Preparatory refactoring to make GC parseable memcpy · e8cce5ad

Artur Pilipenko authored Oct 01, 2020

For GC parseable element atomic memcpy/memmove we'll need to
shuffle statepoint arguments. Make it possible by storing the
arguments as Value *, not Use *.

e8cce5ad

[InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI. · 7b4a8284
Simon Pilgrim authored Oct 21, 2020

7b4a8284

[Passes] Move ADCE before DSE & LICM. · 88241ffb

Florian Hahn authored Oct 21, 2020

The adjustment seems to have very little impact on optimizations.
The only binary change with -O3 MultiSource/SPEC2000/SPEC2006 on X86 is
in consumer-typeset and the size there actually decreases by -0.1%, with
not significant changes in the stats.

On its own, it is mildly positive in terms of compile-time, most likely
due to LICM & DSE having to process slightly less instructions. It
should also be unlikely that DSE/LICM make much new code dead.

http://llvm-compile-time-tracker.com/compare.php?from=df63eedef64d715ce1f31843f7de9c11fe1e597f&to=e3bdfcf94a9eeae6e006d010464f0c1b3550577d&stat=instructions

With DSE & MemorySSA, it gives some nice compile-time improvements, due
to the fact that DSE can re-use the PDT from ADCE, if it does not make
any changes:

http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D87322

88241ffb

Revert "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A +... · 4de215ff

Martin Storsjö authored Oct 21, 2020

Revert "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support"

Also revert "[InstCombine] foldOrOfICmps - use m_Specific instead of
explicit comparisons. NFCI." to make the primarily intended revert
work.

This reverts commits ce135497 and
e372a5f8.

This commit caused failed asserts e.g. like this:

$ cat repro.cpp
bool a(char b) {
  return b >= '0' && b <= '9' || (b | 32) >= 'a' && (b | 32) <= 'z';
$ clang++ -target x86_64-linux-gnu -c -O2 repro.cpp
clang++: ../include/llvm/ADT/APInt.h:1151: bool llvm::APInt::operator==(const
llvm::APInt&) const: Assertion `BitWidth == RHS.BitWidth && "Comparison
requires equal bit widths"' failed.

4de215ff

Remove unnecessary header include which violates layering · c17ae291

Geoffrey Martin-Noble authored Oct 20, 2020

This was introduced in https://reviews.llvm.org/D89774, but I don't
think it should be necessary.

Reviewed By: TaWeiTu, aeubanks

Differential Revision: https://reviews.llvm.org/D89843

c17ae291

Oct 20, 2020

DomTree: Extract (mostly) read-only logic into type-erased base classes · 848a68a0

Nicolai Hähnle authored Oct 20, 2020

Avoid having to instantiate and compile a subset of the dominator tree logic
separately for each node type. More importantly, this allows generic
algorithms to be built on top of dominator trees without writing them as
templates -- such algorithms can now use opaque CfgBlockRef and
CfgInterface instead.

A type-erased implementation of dominator trees could be written in
terms of CfgInterface as well, but doing so would change the current
trade-off: it would slightly reduce code size at the cost of a slight
runtime overhead.

This patch does not change the trade-off, as it only does type-erasure
where basic blocks can be treated in a fully opaque way, i.e. it only
moves methods that don't require iteration over CFG successors and
predecessors.

v5:
- rename generic_{begin,end,children} back without the generic_ prefix
  and refer explictly to base class methods in NewGVN, which wants to
  mutate the order of dominator tree node children directly

v6:
- style change: iDom -> idom; it's arguable whether this is really
  invalid, since it is actually standard camelCase, but clang-tidy
  complains about it so... *shrug*
- rename {to,from}Generic -> {wrap,unwrap}Ref

Change-Id: Ib860dc04cf8bb093d8ed00be7def40d662213672

Differential Revision: https://reviews.llvm.org/D83089

848a68a0

[NPM] port -unify-loop-exits to NPM · 529ecd19
Ta-Wei Tu authored Oct 20, 2020
```
Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89774
```
529ecd19

[NPM] Port -mergereturn to NPM · 59286b36

Ta-Wei Tu authored Oct 20, 2020

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89781

59286b36

[DSE] Do not scan users of memory terminators for further reads. · 2e580102

Florian Hahn authored Oct 20, 2020

isMemTerminator checks if the current def is a memory terminator that
terminates the memory pointed to by DefLoc. We do not have to add any of
their users to the worklist, because the follow-on users cannot read the
memory in question.

This leads to more stores eliminated in the presence of lifetime calls.
Previously we added the users of those intrinsics to the worklist,
limiting elimination.

In terms of removed stores, this gives a nice boost on some benchmarks
(MultiSource/SPEC2000/SPEC2006 on X86 with -flto -O3):

Same hash: 205 (filtered out)
Remaining: 32
Metric: dse.NumFastStores

Program                                          base   patch   diff
 test-suite...000/197.parser/197.parser.test     4.00    8.00  100.0%
 test-suite...rolangs-C++/family/family.test     4.00    7.00  75.0%
 test-suite...marks/7zip/7zip-benchmark.test   1722.00 2189.00 27.1%
 test-suite...CFP2000/177.mesa/177.mesa.test    30.00   38.00  26.7%
 test-suite :: External/Nurbs/nurbs.test        44.00   49.00  11.4%
 test-suite...lications/sqlite3/sqlite3.test   115.00  128.00  11.3%
 test-suite...006/447.dealII/447.dealII.test   2715.00 3013.00 11.0%
 test-suite...ProxyApps-C++/CLAMR/CLAMR.test   237.00  261.00  10.1%
 test-suite...tions/lambda-0.1.3/lambda.test    40.00   44.00  10.0%
 test-suite...3.xalancbmk/483.xalancbmk.test   1366.00 1475.00  8.0%
 test-suite...abench/jpeg/jpeg-6a/cjpeg.test    13.00   14.00   7.7%
 test-suite...oxyApps-C++/miniFE/miniFE.test    43.00   46.00   7.0%
 test-suite...lications/ClamAV/clamscan.test   230.00  246.00   7.0%
 test-suite...006/450.soplex/450.soplex.test   284.00  299.00   5.3%
 test-suite...nsumer-jpeg/consumer-jpeg.test    21.00   22.00   4.8%

2e580102

[InstCombine] SimplifyDemandedUseBits - replace dyn_cast<ConstantInt> with m_ConstantInt. NFCI. · ec228fbf
Simon Pilgrim authored Oct 20, 2020

ec228fbf
[InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI. · ce135497
Simon Pilgrim authored Oct 20, 2020

ce135497

[DSE] Bail out from getLocForWriteEx if call is not argmemonly/inacc_mem. · 6439fde6

Florian Hahn authored Oct 20, 2020

This change should currently not have any impact, but guard against
further inconsistencies between MemoryLocation and function attributes.

6439fde6

[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3))... · e372a5f8

Simon Pilgrim authored Oct 20, 2020

[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support

Reapplied rGa704d8238c86 with a check for integer/integervector types to prevent matching with pointer types

e372a5f8

Introduce CfgTraits abstraction · c0cdd22c

Nicolai Hähnle authored Oct 20, 2020

The CfgTraits abstraction simplfies writing algorithms that are
generic over the type of CFG, and enables writing such algorithms
as regular non-template code that operates on opaque references
to CFG blocks and values.

Implementations of CfgTraits provide operations on the concrete
CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock *`.

CfgInterface is an abstract base class which provides operations
on opaque types CfgBlockRef and CfgValueRef. Those opaque types
encapsulate a `void *`, but the meaning depends on the concrete
CFG type. For example, MachineCfgTraits -- for use with MachineIR
in SSA form -- encodes a Register inside CfgValueRef. Converting
between concrete references and opaque/generic ones is done by
CfgTraits::{fromGeneric,toGeneric}. Convenience methods
CfgTraits::{un}wrap{Iterator,Range} are available as well.

Writing algorithms in terms of CfgInterface adds some overhead
(virtual method calls, plus in same cases it removes the
opportunity to inline iterators), but can be much more convenient
since generic algorithms can be written as non-templates.

This patch adds implementations of CfgTraits for all CFGs on
which dominator trees are calculated, so that the dominator
tree can be ported to this machinery. Only IrCfgTraits (LLVM IR)
and MachineCfgTraits (Machine IR in SSA form) are complete, the
other implementations are limited to the absolute minimum
required to make the upcoming dominator tree changes work.

v5:
- fix MachineCfgTraits::blockdef_iterator and allow it to iterate over
  the instructions in a bundle
- use MachineBasicBlock::printName

v6:
- implement predecessors/successors for all CfgTraits implementations
- fix error in unwrapRange
- rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming
  that is consistent with {wrap,unwrap}{Iterator,Range}
- use getVRegDef instead of getUniqueVRegDef

v7:
- std::forward fix in wrapping_iterator
- fix typos

v8:
- cleanup operators on CfgOpaqueType
- address other review comments

Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d

Differential Revision: https://reviews.llvm.org/D83088

c0cdd22c

[InstCombine] SimplifyDemandedUseBits - pass APInt by const reference. NFCI. · e346ea99
Simon Pilgrim authored Oct 20, 2020

e346ea99

[IR] Adds mustprogress as a LLVM IR attribute · 595c6156

Atmn Patel authored Oct 20, 2020

This adds the LLVM IR attribute `mustprogress` as defined in LangRef through D86233. This attribute will be applied to functions with in languages like C++ where forward progress is guaranteed. Functions without this attribute are not required to make progress.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85393

595c6156

[IRCE] Do not transform if loop has small number of iterations · 38799975

Serguei Katkov authored Oct 16, 2020

IRCE has some overhead for runtime checks and in case number of iteration is small
the overhead can kill the benefit from optimizations.

This CL bases on BlockFrequencyInfo of pre-header and header to estimate the
number of loop iterations. If it is less than irce-min-estimated-iters we do not transform the loop.

Probably it is better to make more complex cost model but for simplicity it seems the be enough.

The usage of BFI is added only for new pass manager and tries to use it efficiently.

Reviewers: ebrevnov, dantrushin, asbirlea, mkazantsev
Reviewed By: mkazantsev
Subscribers: llvm-commits, fhahn
Differential Revision: https://reviews.llvm.org/D89541

38799975

[NFC] Inline assertion-only variable · 8a377f1e
Jordan Rupprecht authored Oct 19, 2020

8a377f1e

Oct 19, 2020

[NFCI][SCEV] Always refer to enum SCEVTypes as enum, not integer · e0567582

Roman Lebedev authored Oct 19, 2020

The main tricky thing here is forward-declaring the enum:
we have to specify it's underlying data type.

In particular, this avoids the danger of switching over the SCEVTypes,
but actually switching over an integer, and not being notified
when some case is not handled.

I have updated most of such switches to be exaustive and not have
a default case, where it's pretty obvious to be the intent,
however not all of them.

e0567582