Commits · f5d89523567b08420ff3fa48a6fc50dbf530afa8 · Lorenzo Albano / LLVM bpEVL

Sep 15, 2021

[InstCombine] Transform X == 0 ? 0 : X * Y --> X * freeze(Y) · f5d89523

Filipp Zhinkin authored Sep 15, 2021

Enabled mul folding optimization that was previously disabled
by being incorrect.
To preserve correctness, mul's operand that is not compared
with zero in select's condition is now frozen.

Related bug: https://bugs.llvm.org/show_bug.cgi?id=51286

Correctness:
https://alive2.llvm.org/ce/z/bHef7J
https://alive2.llvm.org/ce/z/QcR7sf
https://alive2.llvm.org/ce/z/vvBLzt
https://alive2.llvm.org/ce/z/jGDXgq
https://alive2.llvm.org/ce/z/3Pe8Z4
https://alive2.llvm.org/ce/z/LGga8M
https://alive2.llvm.org/ce/z/CTG5fs

Differential Revision: https://reviews.llvm.org/D108408

f5d89523

[PhaseOrdering] add tests for PR47023; NFC · be102805
Sanjay Patel authored Sep 15, 2021

be102805

[CostModel][X86] Adjust bitreverse/ctpop/ctlz/cttz AVX2+ costs based on llvm-mca reports · 0767e43d

Simon Pilgrim authored Sep 15, 2021

Based off the worse case numbers generated by D103695, the AVX2/512 bit reversing/counting costs were higher than necessary (based off instruction counts instead of actual throughput).

0767e43d

[lldb] [Windows] Fix an incorrect assert in NativeRegisterContextWindows_arm · b4133a21
Martin Storsjö authored Sep 14, 2021
```
This codepath hadn't been exercised in a build with asserts before.

Differential Revision: https://reviews.llvm.org/D109778
```
b4133a21
[ARM] Move fetching of ARMSubtarget into the scopes that need it. NFC. · b33a43e5
Martin Storsjö authored Sep 01, 2021
```
This was requested in D38253, but missed back then.

Differential Revision: https://reviews.llvm.org/D109046
```
b33a43e5
[gn build] (manually) port 2c42a73d · afc45ff0
Nico Weber authored Sep 15, 2021

afc45ff0

[mlir][Linalg] Make codegen strategy late transformations opt-in · 660f281b

Nicolas Vasilache authored Sep 15, 2021

Summary: Making the late transformations opt-in results in less surprising behavior when composing multiple calls to the codegen strategy.

Reviewers:

Subscribers:

Differential revision: https://reviews.llvm.org/D109820

660f281b

[mlir][Linalg] Replace DenseSet by UnionFind in ComprehensiveBufferize - NFC · e3889b30

Nicolas Vasilache authored Sep 15, 2021

AliasInfo can now use union-find for a much more efficient implementation.
This brings no functional changes but large performance gains on more complex examples.

Differential Revision: https://reviews.llvm.org/D109819

e3889b30

[ARM] Prevent continuous folding of SUBC · a2332d53

David Green authored Sep 15, 2021

Under some situations under Thumb1, we could be stuck in an infinite
loop recombining the same instruction. This puts a limit on that, not
combining SUBC with SUBE repeatedly.

a2332d53

[DSE] Add capture-before test cases with loads. · 05c12082

Florian Hahn authored Sep 14, 2021

Add a set of test cases where redundant stores may be removable,
depending on whether a local allocation gets captured before performing
a load.

05c12082

[LV] Recognize intrinsic min/max reductions · 61cc873a

David Green authored Sep 15, 2021

This extends the reduction logic in the vectorizer to handle intrinsic
versions of min and max, both the floating point variants already
created by instcombine under fastmath and the integer variants from
D98152.

As a bonus this allows us to match a chain of min or max operations into
a single reduction, similar to how add/mul/etc work.

Differential Revision: https://reviews.llvm.org/D109645

61cc873a

[X86] combineX86ShuffleChain - ensure we only peek through bitcasts to vectors (PR51858) · dcba9941

Simon Pilgrim authored Sep 15, 2021

When searching for hidden identity shuffles (added at rG41146bfe82aecc79961c3de898cda02998172e4b), only peek through bitcasts to the source operand if it is a vector type as well.

dcba9941

[MIPS] Remove unused tblgen template args. NFC · 533471ff
Simon Atanasyan authored Sep 15, 2021
```
Identified in D109359.
```
533471ff

[OpenCL] Supports optional image types in C++ for OpenCL 2021 · 3b9470a6

Justas Janickas authored Aug 31, 2021

Adds support for a feature macro `__opencl_c_images` in C++ for
OpenCL 2021 enabling a respective optional core feature from
OpenCL 3.0.

This change aims to achieve compatibility between C++ for OpenCL
2021 and OpenCL 3.0.

Differential Revision: https://reviews.llvm.org/D109002

3b9470a6

[NVPTX] NFC: Remove unused imm type intrinsic arg · 18655140
Cullen Rhodes authored Sep 15, 2021
```
Identified in D109359.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D109755
```
18655140
[LV] Min/max intrinsic reduction test cases. · bddfbf91
David Green authored Sep 15, 2021

bddfbf91

[mlir][linalg] ComprehensiveBufferize: Do not copy InitTensorOp results · 934e2f69

Matthias Springer authored Sep 15, 2021

E.g.:

```
%2 = memref.alloc() {alignment = 128 : i64} : memref<256x256xf32>
%3 = memref.alloc() {alignment = 128 : i64} : memref<256x256xf32>

// ... (%3 is not written to)

linalg.copy(%3, %2) : memref<256x256xf32>, memref<256x256xf32>
vector.transfer_write %11, %2[%c0, %c0] {in_bounds = [true, true]} : vector<256x256xf32>, memref<256x256xf32>
```

Avoid copies of %3 if %3 came directly from an InitTensorOp.

Differential Revision: https://reviews.llvm.org/D109742

934e2f69

[VPlan] Support sinking recipes with uniform users outside sink target. · e90d55e1

Florian Hahn authored Sep 15, 2021

This is a first step towards addressing the last remaining limitation of
the VPlan version of sinkScalarOperands: the legacy version can
partially sink operands. For example, if a GEP has uniform users outside
the sink target block, then the legacy version will sink all scalar
GEPs, other than the one for lane 0.

This patch works towards addressing this case in the VPlan version by
detecting such cases and duplicating the sink candidate. All users
outside of the sink target will be updated to use the uniform clone.

Note that this highlights an issue with VPValue naming. If we duplicate
a replicate recipe, they will share the same underlying IR value and
both VPValues will have the same name ir<%gep>.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104254

e90d55e1

[X86][InlineAsm] Use mem size information (*word ptr) for "global variable +... · 1f1c71ae

Xiang1 Zhang authored Sep 14, 2021

[X86][InlineAsm] Use mem size information (*word ptr) for "global variable + registers" memory expression in inline asm.

Differential Revision: https://reviews.llvm.org/D109739

1f1c71ae

[mlir] Update docs on conversion and translation to LLVM · b10940ed

Alex Zinenko authored Sep 10, 2021

Create a new document that explain both stages of the process in a single
place, merge and deduplicate the content from the two previous documents. Also
extend the documentation to account for the recent changes in pass structure
due to standard dialect splitting and translation being more flexible.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D109605

b10940ed

[mlir][linalg] Update OpDSL doc (NFC). · a543abc5
Tobias Gysi authored Sep 15, 2021
```
Update the doc due to recent path changes an point to a helper script.
```
a543abc5

[AArch64][GlobalISel] Add a new reassociation for G_PTR_ADDs. · 5ec1845c

Amara Emerson authored Sep 09, 2021

G_PTR_ADD (G_PTR_ADD X, C), Y) -> (G_PTR_ADD (G_PTR_ADD(X, Y), C)

Improves CTMark -Os on AArch64:

Program            before after  diff
           sqlite3 286932 287024  0.0%
                kc 432512 432508 -0.0%
             SPASS 412788 412764 -0.0%
    pairlocalalign 249460 249416 -0.0%
            bullet 475740 475512 -0.0%
    7zip-benchmark 568864 568356 -0.1%
  consumer-typeset 419088 418648 -0.1%
        tramp3d-v4 367628 367224 -0.1%
          clamscan 383184 382732 -0.1%
            lencod 430028 429284 -0.2%
Geomean difference               -0.1%

Differential Revision: https://reviews.llvm.org/D109528

5ec1845c

[NPM] Added -print-pipeline-passes print params for a few passes. · 1ac209ed

Markus Lavin authored Sep 15, 2021

Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.

Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.

The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).

LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch

Differential Revision: https://reviews.llvm.org/D109310

1ac209ed

Add extra check for llvm::Any::TypeId visibility · 2c42a73d

serge-sans-paille authored Sep 03, 2021

This check should ensure we don't reproduce the problem fixed by
02df443d

More accurately, it checks every llvm::Any::TypeId symbol in libLLVM-x.so and
make sure they have weak linkage and are not local to the library, which would
lead to duplicate definition if another weak version of the symbol is defined in
another linked library.

Differential Revision: https://reviews.llvm.org/D109252

2c42a73d

[obj2yaml][XCOFF] Dump sections · 945df8bc

Esme-Yi authored Sep 15, 2021

Summary: This patch implements parsing sections for obj2yaml on AIX.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D98003

945df8bc

[CSSPGO][llvm-profgen] Truncate stack samples with invalid return address. · 0057c718

Hongtao Yu authored Sep 14, 2021

Invalid frame addresses exist in call stack samples due to bad unwinding. This could happen to frame-pointer-based unwinding and the callee functions that do not have the frame pointer chain set up. It isn't common when the program is built with the frame pointer omission disabled, but can still happen with third-party static libs built with frame pointer omitted.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D109638

0057c718

Revert "[flang] Make 'this_image()' an intrinsic function" · 0dc46144

Mehdi Amini authored Sep 15, 2021

This reverts commit 81f8ad17.
This seems to break the shared libs build
(linaro-flang-aarch64-sharedlibs bot) with:

  undefined reference to `Fortran::semantics::IsCoarray(Fortran::semantics::Symbol const&)

(from tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/tools.cpp.o)

When linking lib/libFortranEvaluate.so.14git

0dc46144

Make the --mlir-disable-threading command line option overrides the C++ API usage · a32300a6

Mehdi Amini authored Sep 15, 2021

This seems in-line with the intent and how we build tools around it.
Update the description for the flag accordingly.
Also use an injected thread pool in MLIROptMain, now we will create
threads up-front and reuse them across split buffers.

Differential Revision: https://reviews.llvm.org/D109802

a32300a6

[MLIR] Use memref.copy ops in BufferResultsToOutParams pass. · 500d4c45

cwz920716 authored Sep 15, 2021

Both copy/alloc ops are using memref dialect after this change.

Reviewed By: silvas, mehdi_amini

Differential Revision: https://reviews.llvm.org/D109480

500d4c45

[gn build] Port 626586fc · 10b069d1
LLVM GN Syncbot authored Sep 15, 2021

10b069d1
Re-Revert "clang-tidy: introduce readability-containter-data-pointer check" · 626586fc
Nico Weber authored Sep 14, 2021
```
This reverts commit 49992c04.
The test is still failing on Windows, see comments on https://reviews.llvm.org/D108893
```
626586fc
regen an autogened test which is stale · d4e03bcc
Philip Reames authored Sep 14, 2021

d4e03bcc

DAG: Fix incorrect folding of fmul -1 to fneg · 54d755a0

Matt Arsenault authored Aug 25, 2021

The fmul is a canonicalizing operation, and fneg is not so this would
break denormals that need flushing and also would not quiet signaling
nans. Fold to fsub instead, which is also canonicalizing.

54d755a0

[CSSPGO] Enable pseudo probe instrumentation in O0 mode. · 299b5d42

Hongtao Yu authored Sep 09, 2021

Pseudo probe instrumentation was missing from O0 build. It is needed in cases where some source files are built in O0 while the others are built in optimize mode.

Reviewed By: wenlei, wlei, wmi

Differential Revision: https://reviews.llvm.org/D109531

299b5d42

[lld][WebAssembly] Use llvm-objdump to test __wasm_init_memory · 962acf0a

Thomas Lively authored Sep 14, 2021

Rather than depending on the hex dump from obj2yaml. Now the test shows the
expected function body in a human readable format.

Differential Revision: https://reviews.llvm.org/D109730

962acf0a

RegAllocGreedy: Account for reserved registers in num regs heuristic · 4a36e96c

Matt Arsenault authored Aug 21, 2021

This simple heuristic uses the estimated live range length combined
with the number of registers in the class to switch which heuristic to
use. This was taking the raw number of registers in the class, even
though not all of them may be available. AMDGPU heavily relies on
dynamically reserved numbers of registers based on user attributes to
satisfy occupancy constraints, so the raw number is highly misleading.

There are still a few problems here. In the original testcase that
made me notice this, the live range size is incorrect after the
scheduler rearranges instructions, since the instructions don't have
the original InstrDist offsets. Additionally, I think it would be more
appropriate to use the number of disjointly allocatable registers in
the class. For the AMDGPU register tuples, there are a large number of
registers in each tuple class, but only a small fraction can actually
be allocated at the same time since they all overlap with each
other. It seems we do not have a query that corresponds to the number
of independently allocatable registers. Relatedly, I'm still debugging
some allocation failures where overlapping tuples seem to not be
handled correctly.

The test changes are mostly noise. There are a handful of x86 tests
that look like regressions with an additional spill, and a handful
that now avoid a spill. The worst looking regression is likely
test/Thumb2/mve-vld4.ll which introduces a few additional
spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll
shows a massive improvement by completely eliminating a large number
of spills inside a loop.

4a36e96c

[CMake] Delete obsoleted COMPILER_RT_TEST_TARGET_TRIPLE · aaf62958
Fangrui Song authored Sep 14, 2021
```
The last user has been removed from llvm-zorg for Android.
```
aaf62958

[mlir][linalg] PadTensorOp vectorization: Avoid redundant FillOps · 9adc0114

Matthias Springer authored Sep 15, 2021

Do not generate FillOps when these would be entirely overwritten.

Differential Revision: https://reviews.llvm.org/D109741

9adc0114

SeparateConstOffsetFromGEP: Fix stack overflow in unreachable code · 88146230
Matt Arsenault authored Sep 13, 2021
```
ConstantOffsetExtractor::Find was infinitely recursing on the add
referencing itself.
```
88146230
Attributor: Fix crash on undef in !callees · fdd9761d
Matt Arsenault authored Sep 13, 2021

fdd9761d