Commits · fe0197e194a64f950602fb50736b6648a9e5b2a9 · Lorenzo Albano / LLVM bpEVL

Oct 07, 2020

[InstCombine] Add checks for and(logicalshift(zext(x),undef),y) cases · fe0197e1
Simon Pilgrim authored Oct 07, 2020
```
Prep work before some cleanup in narrowMaskedBinOp
```
fe0197e1
Add REQUIRES: x86-registered-target to test as it was failing on build bots without x86. · ea274be7
Douglas Yung authored Oct 07, 2020
```
This should fix the failure on http://lab.llvm.org:8011/#/builders/91/builds/30
```
ea274be7

[test][MC] Use %python in llvm/test/MC/COFF/bigobj.py · dd2f79ed

Edd Dawson authored Oct 07, 2020

... instead of the one on the $PATH.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D88986

dd2f79ed

[LAA] Use DL to get element size for bound computation. · a73166a4

Florian Hahn authored Oct 07, 2020

Currently LAA uses getScalarSizeInBits to compute the size of an element
when computing the end bound of an access.

This does not work as expected for pointers to pointers, because
getScalarSizeInBits will return 0 for pointer types.

By using DataLayout to get the size of the element we can also correctly
handle pointer element types.

Note the changes to the existing test, which seems to also use the wrong
offset for the end.

Fixes PR47751.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D88953

a73166a4

[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. · 322d0afd

Amara Emerson authored Oct 02, 2020

This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787

322d0afd

[WebAssembly] Rename Emscripten EH functions · 3bba91f6

Heejin Ahn authored Sep 28, 2020

Renaming for some Emscripten EH functions has so far been done in
wasm-emscripten-finalize tool in Binaryen. But recently we decided to
make a compilation/linking path that does not rely on
wasm-emscripten-finalize for modifications, so here we move that
functionality to LLVM.

Invoke wrappers are generated in LowerEmscriptenEHSjLj pass, but final
wasm types are not available in the IR pass, we need to rename them at
the end of the pipeline.

This patch also removes uses of `emscripten_longjmp_jmpbuf` in
LowerEmscriptenEHSjLj pass, replacing that with `emscripten_longjmp`.
`emscripten_longjmp_jmpbuf` is lowered to `emscripten_longjmp`, but
previously we generated calls to `emscripten_longjmp_jmpbuf` in
LowerEmscriptenEHSjLj pass because it takes `jmp_buf*` instead of `i32`.
But we were able use `ptrtoint` to make it use `emscripten_longjmp`
directly here.

Addresses:
https://github.com/WebAssembly/binaryen/issues/3043
https://github.com/WebAssembly/binaryen/issues/3081

Companions:
https://github.com/WebAssembly/binaryen/pull/3191
https://github.com/emscripten-core/emscripten/pull/12399

Reviewed By: dschuff, tlively, sbc100

Differential Revision: https://reviews.llvm.org/D88697

3bba91f6

[MemCpyOpt] Add additional callslot test cases (NFC) · 7a01fc5a
Nikita Popov authored Oct 06, 2020
```
For cases where the destination is captured.
```
7a01fc5a
[NFC][InstCombine] Autogenerate a few tests being affected by upcoming patch · bef27e50
Roman Lebedev authored Oct 07, 2020

bef27e50
[Tests] Precommit test showing gap around load forwarding of vectors in instcombine · 14d5ee63
Philip Reames authored Oct 07, 2020

14d5ee63

BPF: add AdjustOpt IR pass to generate verifier friendly codes · ddf1864a

Yonghong Song authored Aug 06, 2020

Add an IR phase right before main module optimization.
This is to modify IR to restrict certain downward optimizations
in order to generate verifier friendly code.
  > prevent certain instcombine optimizations, handling both
    in-block/cross-block instcombines.
  > avoid speculative code motion if the variable used in
    condition is also used in the later blocks.

Internally, a bpf IR builtin
  result = __builtin_bpf_passthrough(seq_num, result)
is used to enforce ordering. This builtin is only used
during target independent IR optimizations and it will
be removed at the beginning of target dependent IR
optimizations.

For example, removing the following workaround,
  --- a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
  +++ b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
  @@ -47,7 +47,7 @@ int sysctl_tcp_mem(struct bpf_sysctl *ctx)
          /* a workaround to prevent compiler from generating
           * codes verifier cannot handle yet.
           */
  -       volatile int ret;
  +       int ret;
this patch is able to generate code which passed the verifier.

To disable optimization, users need to use "opt" command like below:
  clang -target bpf -O2 -S -emit-llvm -Xclang -disable-llvm-passes test.c
  // disable icmp serialization
  opt -O2 -bpf-disable-serialize-icmp test.ll | llvm-dis > t.ll
  // disable avoid-speculation
  opt -O2 -bpf-disable-avoid-speculation test.ll | llvm-dis > t.ll
  llc t.ll

Differential Revision: https://reviews.llvm.org/D85570

ddf1864a

[AMDGPU] Support disassembly for AMDGPU kernel descriptors · 528057c1

Ronak Chauhan authored Oct 07, 2020

Decode AMDGPU Kernel descriptors as assembler directives.

Reviewed By: scott.linder, jhenderson, kzhuravl

Differential Revision: https://reviews.llvm.org/D80713

528057c1

[SVE] Lower fixed length VECREDUCE_OR operation · 333b2ab6
Cameron McInally authored Oct 07, 2020
```
Differential Revision: https://reviews.llvm.org/D88847
```
333b2ab6
[AMDGPU] Use @LINE for error checking in gfx10.3 assembler tests · fc819b69
Jay Foad authored Oct 07, 2020

fc819b69

[llvm-readelf] - Implement --addrsig option. · 55a60af2

Georgii Rymar authored Oct 05, 2020

We have `--addrsig` implemented for `llvm-readobj`.
Usually it is convenient to use a single tool for dumping,
so it seems we might want to implement `--addrsig` for `llvm-readelf` too.

I've selected a simple output format which is a bit similar to one,
used for dumping of the symbol table. It looks like:

```
Address-significant symbols section '.llvm_addrsig' contains 2 entries:
   Num: Name
     1: foo
     2: bar
```

Differential revision: https://reviews.llvm.org/D88835

55a60af2

[AMDGPU][MC] Improved diagnostics for instructions with missing features · 4a7e7620
Dmitry Preobrazhensky authored Oct 07, 2020
```
Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D88887
```
4a7e7620

InstCombine: Negator: don't rely on complexity sorting already being performed (PR47752) · fed0f890

Roman Lebedev authored Oct 07, 2020

In some cases, we can negate instruction if only one of it's operands
negates. Previously, we assumed that constants would have been
canonicalized to RHS already, but that isn't guaranteed to happen,
because of InstCombine worklist visitation order,
as the added test (previously-hanging) shows.

So if we only need to negate a single operand,
we should ensure ourselves that we try constant operand first.
Do that by re-doing the complexity sorting ourselves,
when we actually care about it.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47752

fed0f890

[AMDGPU] Implement hardware bug workaround for image instructions · f71f5f39

Rodrigo Dominguez authored Apr 03, 2020

Summary:
This implements a workaround for a hardware bug in gfx8 and gfx9,
where register usage is not estimated correctly for image_store and
image_gather4 instructions when D16 is used.

Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81172

f71f5f39

[InstCombine] Tweak funnel by constant tests for better shl/lshr commutation coverage · dce03e30
Simon Pilgrim authored Oct 07, 2020

dce03e30
[ARM] Regenerate vldlane tests · 6625892d
Simon Pilgrim authored Oct 07, 2020
```
To help make the diffs in D88569 clearer
```
6625892d
[LAA] Add test for PR47751, which currently uses wrong bounds. · 20cfd5fa
Florian Hahn authored Oct 07, 2020

20cfd5fa

[SDag] SimplifyDemandedBits: simplify to FP constant if all bits known · 1aa8e6a5

Jay Foad authored Sep 30, 2020

We were already doing this for integer constants. This patch implements
the same thing for floating point constants.

Differential Revision: https://reviews.llvm.org/D88570

1aa8e6a5

[Test] Add one more test where we can avoid creating trunc · 85a6f8fc
Max Kazantsev authored Oct 07, 2020

85a6f8fc

[SROA] rewritePartition()/findCommonType(): if uses have conflicting type, try... · 7fa503ef

Roman Lebedev authored Oct 07, 2020

[SROA] rewritePartition()/findCommonType(): if uses have conflicting type, try getTypePartition() before falling back to largest integral use type (PR47592)

And another step towards transformss not introducing inttoptr and/or
ptrtoint casts that weren't there already.

In this case, when load/store uses have conflicting types,
instead of falling back to the iN, we can try to use allocated sub-type.
As disscussed, this isn't the best idea overall (we shouldn't rely on
allocated type), but it works fine as a temporary measure.

I've measured, and @ `-O3` as of vanilla llvm test-suite + RawSpeed,
this results in +0.05% more bitcasts, -5.51% less inttoptr
and -1.05% less ptrtoint (at the end of middle-end opt pipeline)

See https://bugs.llvm.org/show_bug.cgi?id=47592

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D88788

7fa503ef

BPF: avoid duplicated globals for CORE relocations · edd71db3

Yonghong Song authored Oct 06, 2020

This patch fixed two issues related with relocation globals.
In LLVM, if a global, e.g. with name "g", is created and
conflict with another global with the same name, LLVM will
rename the global, e.g., with a new name "g.2". Since
relocation global name has special meaning, we do not want
llvm to change it, so internally we have logic to check
whether duplication happens or not. If happens, just reuse
the previous global.

The first bug is related to non-btf-id relocation
(BPFAbstractMemberAccess.cpp). Commit 54d9f743
("BPF: move AbstractMemberAccess and PreserveDIType passes
to EP_EarlyAsPossible") changed ModulePass to FunctionPass,
i.e., handling each function at a time. But still just
one BPFAbstractMemberAccess object is created so module
level de-duplication still possible. Commit 40251fee
("[BPF][NewPM] Make BPFTargetMachine properly adjust NPM optimizer
pipeline") made a change to create a BPFAbstractMemberAccess
object per function so module level de-duplication is not
possible any more without going through all module globals.
This patch simply changed the map which holds reloc globals
as class static, so it will be available to all
BPFAbstractMemberAccess objects for different functions.

The second bug is related to btf-id relocation
(BPFPreserveDIType.cpp). Before Commit 54d9f743, the pass
is a ModulePass, so we have a local variable, incremented for
each instance, and works fine. But after Commit 54d9f743,
the pass becomes a FunctionPass. Local variable won't work
properly since different functions will start with the same
initial value. Fix the issue by change the local count variable
as static, so it will be truely unique across the whole module
compilation.

Differential Revision: https://reviews.llvm.org/D88942

edd71db3

[Test] Add test showing that we can avoid inserting trunc/zext · 0c009e09
Max Kazantsev authored Oct 07, 2020

0c009e09

[PowerPC] implement target hook getTgtMemIntrinsic · f0560870

Chen Zheng authored Sep 27, 2020

This patch can make pass recognize Powerpc related memory intrinsics.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D88373

f0560870

[CodeGen][TailDuplicator] Don't duplicate blocks with INLINEASM_BR · d2c61d2b

Bill Wendling authored Oct 06, 2020

Tail duplication of a block with an INLINEASM_BR may result in a PHI
node on the indirect branch. This is okay, but it also introduces a copy
for that PHI node *after* the INLINEASM_BR, which is not okay.

See: https://github.com/ClangBuiltLinux/linux/issues/1125

Differential Revision: https://reviews.llvm.org/D88823

d2c61d2b

[Attributor] Use smarter way to determine alignment of GEPs · 7993d611
Johannes Doerfert authored Sep 12, 2020
```
Use same logic existing in other places to deal with base case GEPs.

Add the original Attributor talk example.
```
7993d611

[Attributor] Ignore read accesses to constant memory · c4cfe7a4

Johannes Doerfert authored Sep 09, 2020

The old function attribute deduction pass ignores reads of constant
memory and we need to copy this behavior to replace the pass completely.
First step are constant globals. TBAA can also describe constant
accesses and there are other possibilities. We might want to consider
asking the alias analyses that are available but for now this is simpler
and cheaper.

c4cfe7a4

[Attributor] Give up early on AANoReturn::initialize · 3f540c05

Johannes Doerfert authored Sep 07, 2020

If the function is not assumed `noreturn` we should not wait for an
update to mark the call site as "may-return".

This has two kinds of consequences:
  - We have less iterations in many tests.
  - We have less deductions based on "known information" (since we ask
    earlier, point 1, and therefore assumed information is not "known"
    yet).
The latter is an artifact that we might want to tackle properly at some
point but which is not easily fixable right now.

3f540c05

Oct 06, 2020

[AMDGPU] Fix remaining kernel descriptor test · bf5c1d92
Scott Linder authored Oct 06, 2020
```
Follow up on e4a9e4ef to fix a test I missed in the original patch.
Committed as obvious.
```
bf5c1d92

[AMDGPU] Emit correct kernel descriptor on big-endian hosts · e4a9e4ef

Scott Linder authored Oct 05, 2020

Previously we wrote multi-byte values out as-is from host memory. Use
the `emitIntN` helpers in `MCStreamer` to produce a valid descriptor
irrespective of the host endianness.

Reviewed By: arsenm, rochauha

Differential Revision: https://reviews.llvm.org/D88858

e4a9e4ef

[MemCpyOpt] Use dereferenceable pointer helper · 616f5450

Nikita Popov authored Oct 04, 2020

The call slot optimization has some home-grown code for checking
whether the destination is dereferenceable. Replace this with the
generic isDereferenceableAndAlignedPointer() helper.

I'm not checking alignment here, because that is currently handled
separately and may be an enforced alignment for allocas. The clean
way of integrating that part would probably be to accept a callback
in isDereferenceableAndAlignedPointer() for the actual isAligned check,
which would then have a chance to use an enforced alignment instead.

This allows the destination to be a GEP (among other things), though
the two open TODOs may prevent it from working in practice.

Differential Revision: https://reviews.llvm.org/D88805

616f5450

[MemCpyOpt] Check for throwing calls during call slot optimization · 6b441ca5

Nikita Popov authored Oct 04, 2020

When performing call slot optimization for a non-local destination,
we need to check whether there may be throwing calls between the
call and the copy. Otherwise, the early write to the destination
may be observable by the caller.

This was already done for call slot optimization of load/store,
but not for memcpys. For the sake of clarity, I'm moving this check
into the common optimization function, even if that does need an
additional instruction scan for the load/store case.

As efriedma pointed out, this check is not sufficient due to
potential accesses from another thread. This case is left as a TODO.

Differential Revision: https://reviews.llvm.org/D88799

6b441ca5

[X86] .code16: temporarily set Mode32Bit when matching an instruction with the data32 prefix · 43c7dc52

Fangrui Song authored Oct 06, 2020

PR47632

This allows MC to match `data32 ...` as one instruction instead of two (data32 without insn + insn).

The compatibility with GNU as improves: `data32 ljmp` will be matched as ljmpl.
`data32 lgdt 4(%eax)` will be matched as `lgdtl` (prefixes: 0x67 0x66, instead
of 0x66 0x67).

GNU as supports many other `data32 *w` as `*l`. We currently just hard code
`data32 callw` and `data32 ljmpw`.  Generalizing the suffix replacement is
tricky and requires a think about the "bwlq" appending suffix rules in MatchAndEmitATTInstruction.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88772

43c7dc52

[SimplifyLibCalls] Optimize mempcpy_chk to mempcpy · 86429c4e
Dávid Bolvanský authored Oct 05, 2020

86429c4e

[BPF][NewPM] Make BPFTargetMachine properly adjust NPM optimizer pipeline · 40251fee

Arthur Eubanks authored Oct 05, 2020

This involves porting BPFAbstractMemberAccess and BPFPreserveDIType to
NPM, then adding them BPFTargetMachine::registerPassBuilderCallbacks
(the NPM equivalent of adjustPassManager()).

Reviewed By: yonghong-song, asbirlea

Differential Revision: https://reviews.llvm.org/D88855

40251fee

[test][InstCombine][NewPM] Fix InstCombine tests under NPM · 8df17b4d

Arthur Eubanks authored Sep 23, 2020

Some of these depended on analyses being present that aren't provided
automatically in NPM.

early_dce_clobbers_callgraph.ll was previously inlining a noinline function?

cast-call-combine.ll relied on the legacy always-inline pass being a
CGSCC pass and getting rerun.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D88187

8df17b4d

[test][NewPM] Make dead-uses.ll work under NPM · 61d4b342

Arthur Eubanks authored Sep 22, 2020

This one is weird...

globals-aa needs to be already computed at licm, or else a function pass
can't run a module analysis and won't have access to globals-aa.
But the globals-aa result is impacted by instcombine in a way that
affects what the test is expecting. If globals-aa is computed before
instcombine, it is cached and globals-aa used in licm won't contain the
necessary info provided by instcombine.
Another catch is that if we don't invalidate AAManager, it will use the
cached AAManager that instcombine requested, which may not contain
globals-aa. So we have to invalidate<aa> so that licm can recompute
an AAManager with the globals-aa created by the require<globals-aa>.

This is essentially the problem described in https://reviews.llvm.org/D84259.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D88118

61d4b342

[Attributor][FIX] Move assertion to make it not trivially fail · 4a7a9884

Johannes Doerfert authored Sep 09, 2020

The idea of this assertion was to check the simplified value before we
assign it, not after, which caused this to trivially fail all the time.

4a7a9884