Commits · 5efa78985bf5cbba1c4346ba41a16435fc516446 · Roger Ferrer / llvm-epi

Mar 18, 2022

[SLP] Fix lookahead operand reordering for splat loads. · 5efa7898

Vasileios Porpodas authored Mar 09, 2022

Splat loads are inexpensive in X86. For a 2-lane vector we need just one
instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads
to worse code. This patch adds a new score dedicated for splat loads.

Please note that a splat is usually three IR instructions:
- It is usually a load and 2 inserts:
 %ld = load double, double* %gep
 %ins1 = insertelement <2 x double> poison, double %ld, i32 0
 %ins2 = insertelement <2 x double> %ins1, double %ld, i32 1

- But it can also be a load, an insert and a shuffle:
 %ld = load double, double* %gep
 %ins = insertelement <2 x double> poison, double %ld, i32 0
 %shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer

Because of this some of the lit tests contain more IR instructions.

Differential Revision: https://reviews.llvm.org/D121354

5efa7898

[SLP][NFC] This adds a test for a follow-up patch that fixes a look-ahead operand reordering issue · b051c836
Vasileios Porpodas authored Mar 09, 2022
```
Differential Revision: https://reviews.llvm.org/D121353
```
b051c836
Use llvm::append_range instead of push_back loops where applicable. NFCI. · 5d2ce766
Benjamin Kramer authored Mar 18, 2022

5d2ce766
Revert "Revert "Revert "[misexpect] Re-implement MisExpect Diagnostics""" · 964398cc
Paul Kirth authored Mar 18, 2022
```
This reverts commit 6cf560d6.
```
964398cc
[gn build] (manually) port 6316129e · 5f4a334d
Nico Weber authored Mar 17, 2022

5f4a334d
Revert "Revert "[misexpect] Re-implement MisExpect Diagnostics"" · 6cf560d6
Paul Kirth authored Mar 18, 2022
```
I mistakenly reverted my commit, so I'm relanding it.

This reverts commit 10866a1d.
```
6cf560d6
Revert "[misexpect] Re-implement MisExpect Diagnostics" · 10866a1d
Paul Kirth authored Mar 17, 2022
```
This reverts commit e7749d47.
```
10866a1d

[misexpect] Re-implement MisExpect Diagnostics · e7749d47

Paul Kirth authored Mar 09, 2022

Reimplements MisExpect diagnostics from D66324 to reconstruct its
original checking methodology only using MD_prof branch_weights
metadata.

New checks rely on 2 invariants:

1) For frontend instrumentation, MD_prof branch_weights will always be
   populated before llvm.expect intrinsics are lowered.

2) for IR and sample profiling, llvm.expect intrinsics will always be
   lowered before branch_weights are populated from the IR profiles.

These invariants allow the checking to assume how the existing branch
weights are populated depending on the profiling method used, and emit
the correct diagnostics. If these invariants are ever invalidated, the
MisExpect related checks would need to be updated, potentially by
re-introducing MD_misexpect metadata, and ensuring it always will be
transformed the same way as branch_weights in other optimization passes.

Frontend based profiling is now enabled without using LLVM Args, by
introducing a new CodeGen option, and checking if the -Wmisexpect flag
has been passed on the command line.

Differential Revision: https://reviews.llvm.org/D115907

e7749d47

[BPF] handle unsigned icmp ops in BPFAdjustOpt pass · 2e94d8e6

Yonghong Song authored Mar 16, 2022

When investigating an issue with bcc tool inject.py, I found
a verifier failure with latest clang. The portion of code
can be illustrated as below:
  struct pid_struct {
    u64 curr_call;
    u64 conds_met;
    u64 stack[2];
  };
  struct pid_struct *bpf_map_lookup_elem();
  int foo() {
    struct pid_struct *p = bpf_map_lookup_elem();
    if (!p) return 0;
    p->curr_call--;
    if (p->conds_met < 1 || p->conds_met >= 3)
        return 0;
    if (p->stack[p->conds_met - 1] == p->curr_call)
        p->conds_met--;
    ...
  }

The verifier failure looks like:
  ...
  8: (79) r1 = *(u64 *)(r0 +0)
   R0_w=map_value(id=0,off=0,ks=4,vs=32,imm=0) R10=fp0 fp-8=mmmm????
  9: (07) r1 += -1
  10: (7b) *(u64 *)(r0 +0) = r1
   R0_w=map_value(id=0,off=0,ks=4,vs=32,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmm????
  11: (79) r2 = *(u64 *)(r0 +8)
   R0_w=map_value(id=0,off=0,ks=4,vs=32,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmm????
  12: (bf) r3 = r2
  13: (07) r3 += -3
  14: (b7) r4 = -2
  15: (2d) if r4 > r3 goto pc+13
   R0=map_value(id=0,off=0,ks=4,vs=32,imm=0) R1=inv(id=0) R2=inv(id=2)
   R3=inv(id=0,umin_value=18446744073709551614,var_off=(0xffffffff00000000; 0xffffffff))
   R4=inv-2 R10=fp0 fp-8=mmmm????
  16: (07) r2 += -1
  17: (bf) r3 = r2
  18: (67) r3 <<= 3
  19: (bf) r4 = r0
  20: (0f) r4 += r3
  math between map_value pointer and register with unbounded min value is not allowed

Here the compiler optimized "p->conds_met < 1 || p->conds_met >= 3" to
  r2 = p->conds_met
  r3 = r2
  r3 += -3
  r4 = -2
  if (r3 < r4) return 0
  r2 += -1
  r3 = r2
  ...
In the above, r3 is initially equal to r2, but is modified used by the comparison.
But later on r2 is used again. This caused verification failure.

BPF backend has a pass, AdjustOpt, to prevent such transformation, but only
focused on signed integers since typical bpf helper returns signed integers.
To fix this case, let us handle unsigned integers as well.

Differential Revision: https://reviews.llvm.org/D121937

2e94d8e6

Mar 17, 2022

[docs] Fix codeblock. · 6c4931e7
Alina Sbirlea authored Mar 17, 2022

6c4931e7

Revert "[MLIR][Presburger] introduce SetCoalescer" · 71302b67

Mehdi Amini authored Mar 17, 2022

This reverts commit dad80e97.

The build is broken with some configurations (gcc-5 and gcc-8):

mlir/lib/Analysis/Presburger/PresburgerRelation.cpp:402:32: error: qualified name does not name a class before '{' token
class presburger::SetCoalescer {

71302b67

[AMDGPU] Add 2 gfx940 mfma tests. NFC. · 275b0c5a
Stanislav Mekhanoshin authored Mar 17, 2022

275b0c5a
[Attributor] Remove more non-deterministic behavior and debug output · 4308fdf8
Johannes Doerfert authored Mar 17, 2022

4308fdf8
[OpenMP][FIX] Initialize member to avoid undefined value in debug output · 59a6b668
Johannes Doerfert authored Mar 17, 2022

59a6b668

[Attributor][FIX] Remove reference into map that might dangle · 88ea86c3

Johannes Doerfert authored Mar 17, 2022

The reference was taken and the map was modified after. This can (and
did) lead to dangling pointers and all sorts of problems afterwards.

88ea86c3

[AlwaysInliner] Emit inline remark only when successful · f6b5142a

Ellis Hoag authored Mar 17, 2022

Failures in `InlineFunction()` are caught after D121722, but `emitInlinedIntoBasedOnCost()` should only be called when inlining is successful. This also removes an unnecessary call to `shouldInline()` which always returned `InlineCost::getAlways()`.

Reviewed By: kyulee, nikic

Differential Revision: https://reviews.llvm.org/D121946

f6b5142a

[docs] Add details to MemorySSA docs. · 187a5f23

Alina Sbirlea authored Mar 15, 2022

Add more details to the docs regarding optimized accesses for Uses and Defs.
Include incoming changes from https://reviews.llvm.org/D121381.

Differential Revision: https://reviews.llvm.org/D121740

187a5f23

[WebAssembly] Add end-to-end codegen tests for wasm_simd128.h · 7062094b

Thomas Lively authored Mar 17, 2022

Add a test checking that each SIMD intrinsic produces the expected instruction.
Since this test spans both clang and LLVM, place it in a new
intrinsic-header-tests subdirectory of cross-project-tests.

This revives D101684 now that cross-project-tests exists. In practice, the tests
of lowering from wasm_simd128.h to LLVM IR were not as useful as this end-to-end
test.

Updates the version check of gdb in cross-project-tests/lit.cfg.py so that
unexpected version formats do not prevent the new tests from running.

Depends on D121661.

Differential Revision: https://reviews.llvm.org/D121662

7062094b

Add a cmake flag to turn `llvm_unreachable()` into builtin_trap() when assertions are disabled · 6316129e
Mehdi Amini authored Mar 17, 2022
```
Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D121750
```
6316129e
[lldb] Migrate ProcessGDBRemote to ReportWarning · 74b45f91
Jonas Devlieghere authored Mar 17, 2022

74b45f91

[ObjCARC] Fix non-determinism · ddb85f34

Kyungwoo Lee authored Mar 17, 2022

We often failed in the assertion, non-deterministically with a large IR:
```
Assertion `notDifferentParent(LocA.Ptr, LocB.Ptr) && "BasicAliasAnalysis doesn't support interprocedural queries."
```
Looking at the comment in https://reviews.llvm.org/D87806, it appears it's actually a module pass for new PM while the legacy PM still works as a function pass.
The fix is to align the same behavior in between new PM and old PM, which initializes ObjCARCContract for each function.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D121949

ddb85f34

[libc++] [test] Add ranges_robust_against_copying_*.pass.cpp · 3e02c8e2

Nikolas Klauser authored Mar 17, 2022

This tests the same QoI issue as the existing STL Classic test,
but for the Ranges algorithms. Also, do the same thing for all
the algorithms that take projections.

I found a few missing algorithms and added them to the existing test, too. `std::find_first_of` currently fails; I should look at why that is (and in particular, what is it doing weird that //makes// it inconsistent with the entire rest of libc++?).

Reviewed By: ldionne, #libc

Spies: libcxx-commits

Differential Revision: https://reviews.llvm.org/D121265

3e02c8e2

[libc++] Install psutil on CI builders · ce3feebd
Louis Dionne authored Mar 17, 2022
```
This will make it possible to add a timeout when running the tests.
```
ce3feebd

AMDGPU: Use the implicit kernargs for code object version 5 · dd5895cc

Changpeng Fang authored Mar 17, 2022

Summary:
Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.

Reviewers: arsenm, sameerds, yaxunl

Differential Revision: https://reviews.llvm.org/D120265

dd5895cc

[libc++] Add missing <cstddef> include · 2c9995c1
Louis Dionne authored Mar 17, 2022

2c9995c1

[lld][WebAssembly] Fix crash accessing non-live __tls_base symbol · a04a5077

Sam Clegg authored Mar 16, 2022

In programs that don't otherwise depend on `__tls_base` it won't
be marked as live.  However this symbol is used internally in
a couple of places do we need to mark it as live explictily in
those places.

Fixes: #54386

Differential Revision: https://reviews.llvm.org/D121931

a04a5077

[IndVars] Add a new test affected by 62f86d4f · 523c572c
Eli Friedman authored Mar 17, 2022

523c572c

[IROutliner] Make sure that loop debug info is stripped. · f7d90ad5

Andrew Litteken authored Mar 13, 2022

As pointed out in https://github.com/llvm/llvm-project/issues/54155#issuecomment-1057465479, there was a crash when loop info was being outlined. It was not being properly stripped and adjusted, so would point to the wrong location. This uses similar logic found in the CodeExtractor to adjust the loop debug info.

Reviewer: fhahn, paquette

Differential Revision: https://reviews.llvm.org/D120869

f7d90ad5

[flang] Add array constructor lowering tests · 518a837e

Valentin Clement authored Mar 17, 2022

This patch adds some tests for the lowering of
array constructors.

This patch is part of the upstreaming effort from fir-dev branch.

Reviewed By: PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D121945



Co-authored-by: mleair <leairmark@gmail.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>

518a837e

[AMDGPU] New MFMA names for existing instructions · d9ac55fa

Stanislav Mekhanoshin authored Mar 15, 2022

Old names are supported as aliases.
_1k MFMA got new opcodes.

Differential Revision: https://reviews.llvm.org/D121741

d9ac55fa

[AMDGPU] Add gfx90a and gfx940 to get_elf_mach_gfx_name.cpp · e0b9364b
Stanislav Mekhanoshin authored Mar 02, 2022
```
Differential Revision: https://reviews.llvm.org/D120849
```
e0b9364b

[VFS] Add print/dump to the whole FileSystem hierarchy · 41255241

Ben Barham authored Mar 09, 2022

For now most are implemented by printing out the name of the filesystem,
but this can be expanded in the future. Only `OverlayFileSystem` and
`RedirectingFileSystem` are properly implemented in this patch.
  - `OverlayFileSystem`: Prints each filesystem in the order that any
    operations are actually run on them. Optionally prints recursively.
  - `RedirectingFileSystem`: Prints out all mappings, as well as the
    `ExternalFS`. Most of this was already implemented other than the
    handling for the `DirectoryRemap` case and to actually print out the
    mapping.

Each FS should implement `printImpl` rather than `print`, where the
latter just fowards to the former. This is to avoid spreading the
default arguments through to the subclasses (where we may miss updating
in the future).

Differential Revision: https://reviews.llvm.org/D121421

41255241

[OpenMP][FIX] Make metadata and attribute check lines less detailed · b4cc3b1d
Johannes Doerfert authored Mar 17, 2022
```
The update_cc script should really do this automatically :(
```
b4cc3b1d

[MLIR][Presburger] introduce SetCoalescer · dad80e97

Michel Weber authored Mar 17, 2022

This patch refactors the current coalesce implementation. It introduces
the `SetCoalescer`, a class in which all coalescing functionality lives.
The main advantage over the old design is the fact that the vectors of
constraints do not have to be passed around, but are implemented as
private fields of the SetCoalescer. This will become especially
important once more inequality types are introduced.

Reviewed By: arjunp

Differential Revision: https://reviews.llvm.org/D121364

dad80e97

[dsymutil] Store possible Swift reflection sections in an array · d80210fc
Benjamin Kramer authored Mar 17, 2022
```
No need for a unordered_map of enum, which is also broken in GCC before
6.1. No functionality change intended.
```
d80210fc
[mlir] Move InterfaceMap::InterfaceMap to the cpp file · 548757ba
Benjamin Kramer authored Mar 17, 2022
```
So we don't end up with a copy of std::sort in every dialect definition.
NFCI.
```
548757ba
[mlir] Use array_pod_sort for sorting stats/counters. · ba8e336a
Benjamin Kramer authored Mar 17, 2022
```
This isn't performance sensitive and array_pod_sort is a lot smaller.
NFCI.
```
ba8e336a
[AMDGPU] Allow v_accvgpr_write to use SGPR src on gfx940 · 522b2599
Stanislav Mekhanoshin authored Mar 16, 2022
```
Differential Revision: https://reviews.llvm.org/D121843
```
522b2599
[gn build] Port 22570bac · f1859011
LLVM GN Syncbot authored Mar 17, 2022

f1859011

Precommit test for D121483: · ae6db2f3

Kevin P. Neal authored Mar 17, 2022

[FPEnv][InstSimplify] Teach CannotBeNegativeZero() about constrained intrinsics.

ae6db2f3