Commits · fa822a2ee52f8243d29eb035d7002a9ab40788a0 · Lorenzo Albano / LLVM bpEVL

Sep 20, 2021

[DebugInfo] Add test for dumping DW_AT_defaulted · fa822a2e
Paul Robinson authored Sep 20, 2021

fa822a2e

Revert "Fix CLANG_ENABLE_STATIC_ANALYZER=OFF building all analyzer source" · 91978345

Nico Weber authored Sep 20, 2021

This reverts commit 6d7b3d6b.
Breaks running cmake with `-DCLANG_ENABLE_STATIC_ANALYZER=OFF`
without turning off CLANG_TIDY_ENABLE_STATIC_ANALYZER.
See comments on https://reviews.llvm.org/D109611 for details.

91978345

[cmake] Put check from D110016 behind (default-on) flag · 55f0b337
Nico Weber authored Sep 20, 2021
```
See discussion on https://reviews.llvm.org/D110016 for details.
```
55f0b337
[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FMA. · a95ba810
Craig Topper authored Sep 20, 2021
```
If either of the multiplicands is a splat, we can sink it to use
vfmacc.vf or similar.
```
a95ba810
[RISCV] Add test cases for missed opportunity to use vfmacc.vf. NFC · 792101ff
Craig Topper authored Sep 20, 2021
```
This is another case of a splat being in another basic block
preventing SelectionDAG from optimizing it.
```
792101ff

[libc++] [P0919] Some belated review on D87171. · d5db71d1

Arthur O'Dwyer authored Aug 31, 2021

- Simplify the structure of the new tests.
- Test const containers as well as non-const containers,
    since it's easy to do so.
- Remove redundant enable-iffing of helper structs' member functions.
    (They're not instantiated unless they're called, and who would call them?)
- Fix indentation and use more consistent SFINAE method in <unordered_map>.
- Add _LIBCPP_INLINE_VISIBILITY on some swap functions.

Differential Revision: https://reviews.llvm.org/D109011

d5db71d1

[libc++] [LIBCXX-DEBUG-FIXME] Constexpr char_traits::copy mustn't compare unrelated pointers. · df81bb71

Arthur O'Dwyer authored Apr 20, 2021

Now that __builtin_is_constant_evaluated() is present on all supported
compilers, we can use it to skip the UB-inducing assert in cases where
the computation might be happening at constexpr time.

Differential Revision: https://reviews.llvm.org/D101674

df81bb71

[lldb][NFC] Remove outdated FIXME · c4a406bb
Alex Langford authored Sep 20, 2021

c4a406bb

[gn build] Don't pass -Wl,-z,defs for sanitizer builds · b64fdaa8

Arthur Eubanks authored Sep 20, 2021

-Wl,-z,defs doesn't work with sanitizers.
See https://clang.llvm.org/docs/AddressSanitizer.html

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D110086

b64fdaa8

[IR] Add helper to convert offset to GEP indices · dd022656

Nikita Popov authored Sep 19, 2021

We implement logic to convert a byte offset into a sequence of GEP
indices for that offset in a number of places. This patch adds a
DataLayout::getGEPIndicesForOffset() method, which implements the
core logic. I've updated SROA, ConstantFolding and InstCombine to
use it, and there's a few more places where it looks relevant.

Differential Revision: https://reviews.llvm.org/D110043

dd022656

[mlir][MemRef] Compute unused dimensions of a rank-reducing subviews using strides as well. · 4cf9bf6c

MaheshRavishankar authored Sep 20, 2021

For `memref.subview` operations, when there are more than one
unit-dimensions, the strides need to be used to figure out which of
the unit-dims are actually dropped.

Differential Revision: https://reviews.llvm.org/D109418

4cf9bf6c

[OpenMP][host runtime] Fix indirect lock table race condition · 1e45cd75

Peyton, Jonathan L authored Sep 13, 2021

The indirect lock table can exhibit a race condition during initializing
and setting/unsetting locks. This occurs if the lock table is
resized by one thread (during an omp_init_lock) and accessed (during an
omp_set|unset_lock) by another thread.

The test runtime/test/lock/omp_init_lock.c test exposed this issue and
will fail if run enough times.

This patch restructures the lock table so pointer/iterator validity is
always kept. Instead of reallocating a single table to a larger size, the
lock table begins preallocated to accommodate 8K locks. Each row of the
table is allocated as needed with each row allowing 1K locks. If the 8K
limit is reached for the initial table, then another table, capable of
holding double the number of locks, is allocated and linked
as the next table. The indices stored in the user's locks take this
linked structure into account when finding the lock within the table.

Differential Revision: https://reviews.llvm.org/D109725

1e45cd75

Fix bad merge the removed switch case · 01b097af

Geoffrey Martin-Noble authored Sep 20, 2021

When https://reviews.llvm.org/D109520 was landed, it reverted the addition of this switch
case added in https://reviews.llvm.org/D109293. This caused `-Wswitch` failures (and
presumably broke the functionality added in the latter patch).

01b097af

Diagnose -Wunused-value based on CFG reachability · 63e0d038

Yuanfang Chen authored Jun 23, 2021

While at it, add the diagnosis message "left operand of comma operator has no effect" (used by GCC) for comma operator.

This also makes Clang diagnose in the constant evaluation context which aligns with GCC/MSVC behavior. (https://godbolt.org/z/7zxb8Tx96)

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D103938

63e0d038

[mlir][Linalg] Add ConvolutionOpInterface. · 0b33890f

MaheshRavishankar authored Sep 20, 2021

Add an interface that allows grouping together all covolution and
pooling ops within Linalg named ops. The interface currently
- the indexing map used for input/image access is valid
- the filter and output are accessed using projected permutations
- that all loops are charecterizable as one iterating over
  - batch dimension,
  - output image dimensions,
  - filter convolved dimensions,
  - output channel dimensions,
  - input channel dimensions,
  - depth multiplier (for depthwise convolutions)

Differential Revision: https://reviews.llvm.org/D109793

0b33890f

Revert "[CMake] Add debuginfo-tests to LLVM_ALL_PROJECTS after D110016" · 6cd382bf
Fangrui Song authored Sep 20, 2021
```
This reverts commit 4b80f012.

debuginfo-tests has been renamed to cross-project-tests.
```
6cd382bf

Revert code change of D63497 & D74399 for riscv64-*-linux GCC detection · a0772719

Fangrui Song authored Sep 20, 2021



This partially reverts commits 1fc2a47f and 9816e726.

See D109727. Replacing config.guess in favor of {gcc,clang} -dumpmachine
can avoid the riscv64-{redhat,suse}-linux GCC detection.

Acked-by: Luís Marques <luismarques@lowrisc.org>

a0772719

Eliminate _LIBCPP_EQUAL_DELETE in favor of `=delete`. · d7d70601

Arthur O'Dwyer authored Sep 16, 2021

All supported compilers have supported `=delete` as an extension
in C++03 mode for many years at this point.

Differential Revision: https://reviews.llvm.org/D109942

d7d70601

[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FAdd/FSub/FMul/FDiv. · 04ab6c85
Craig Topper authored Sep 20, 2021

04ab6c85

[RISCV] Add test cases showing failure to use .vf vector operations when splat... · 890027b3

Craig Topper authored Sep 20, 2021

[RISCV] Add test cases showing failure to use .vf vector operations when splat is in another basic block. NFC

We should have CGP copy the splats into the same basic block as the
FP operation so that SelectionDAG can fold them.

890027b3

[lldb][crashlog] Avoid specifying arch for image when a UUID is present · e31b2d7d

Vedant Kumar authored Sep 17, 2021

When adding an image to a target for crashlog purposes, avoid specifying
the architecture of the image.

This has the effect of making SBTarget::AddModule infer the ArchSpec for
the image based on the SBTarget's architecture, which LLDB puts serious
effort into calculating correctly (in TargetList::CreateTargetInternal).

The status quo is that LLDB randomly guesses the ArchSpec for a module
if its architecture is specified, via:

```
  SBTarget::AddModule -> Platform::GetAugmentedArchSpec -> Platform::IsCompatibleArchitecture ->
GetSupportedArchitectureAtIndex -> {ARM,x86}GetSupportedArchitectureAtIndex
```

... which means that the same crashlog can fail to load on an Apple
Silicon Mac (due to the random guess of arm64e-apple-macosx for the
module's ArchSpec not being compatible with the SBTarget's (correct)
ArchSpec), while loading just fine on an Intel Mac.

I'm not sure how to add a test for this (it doesn't look like there's
test coverage of this path in-tree). It seems like it would be pretty
complicated to regression test: the host LLDB would need to be built for
arm64e, we'd need a hand-crafted arm64e iOS crashlog, and we'd need a
binary with an iOS deployment target. I'm open to other / simpler
options.

rdar://82679400

Differential Revision: https://reviews.llvm.org/D110013

e31b2d7d

[NCF][OpenMP] Fix metadirective test on SystemZ · 3679d200
cchen authored Sep 20, 2021

3679d200
Revert "[MLIR][SCF] Add for-to-while loop transformation pass" · 5edd79fc
Mehdi Amini authored Sep 20, 2021
```
This reverts commit 644b55d5.

The added test is failing the bots.
```
5edd79fc
Temporarily XFAIL MLIR test that fails the LLVM verifier after 8700f2bd · f18f1ab4
Mehdi Amini authored Sep 20, 2021

f18f1ab4

Add use_default_shell_env = True to ctx.actions.run · f4b5d597

Alexander Grund authored Sep 20, 2021

When building a tool in a non-standard environment (e.g. custom
compiler path -> LD_LIBRARY_PATH set) then
`use_default_shell_env = True` is required to run that tool in the same
environment or otherwise the build will fail due to missing symbols.
See https://github.com/google/jax/issues/7842 for this issue and
https://github.com/tensorflow/tensorflow/pull/44549 for related fix in
TF.

Reviewed By: GMNGeoffrey

Differential Revision: https://reviews.llvm.org/D109873

f4b5d597

[lld] Remove timers.ll because inconsistent timers behavior causes the test to fail sometimes · 6e994a83
Amy Huang authored Sep 20, 2021
```
See https://reviews.llvm.org/D109904
```
6e994a83

[ELF] Add --why-extract= to query why archive members/lazy object files are extracted · a954bb18

Fangrui Song authored Sep 20, 2021

Similar to D69607 but for archive member extraction unrelated to GC. This patch adds --why-extract=.

Prior art:

GNU ld -M prints
```
Archive member included to satisfy reference by file (symbol)

a.a(a.o)                      main.o (a)
b.a(b.o)                      (b())
```

-M is mainly for input section/symbol assignment <-> output section mapping
(often huge output) and the information may appear ad-hoc.

Apple ld64
```
__Z1bv forced load of b.a(b.o)
_a forced load of a.a(a.o)
```

It doesn't say the reference file.

Arm's proprietary linker
```
Selecting member vsnprintf.o(c_wfu.l) to define vsnprintf.
...
Loading member vsnprintf.o from c_wfu.l.
              definition:  vsnprintf
              reference :  _printf_a
```

---

--why-extract= gives the user the full data (which is much shorter than GNU ld
-Map). It is easy to track a chain of references to one archive member with a
one-liner, e.g.

```
% ld.lld main.o a_b.a b_c.a c.a -o /dev/null --why-extract=- | tee stdout
reference       extracted       symbol
main.o  a_b.a(a_b.o)    a
a_b.a(a_b.o)    b_c.a(b_c.o)    b()
b_c.a(b_c.o)    c.a(c.o)        c()

% ruby -ane 'BEGIN{p={}}; p[$F[1]]=[$F[0],$F[2]] if $.>1; END{x="c.a(c.o)"; while y=p[x]; puts "#{y[0]} extracts #{x} to resolve #{y[1]}"; x=y[0] end}' stdout
b_c.a(b_c.o) extracts c.a(c.o) to resolve c()
a_b.a(a_b.o) extracts b_c.a(b_c.o) to resolve b()
main.o extracts a_b.a(a_b.o) to resolve a
```

Archive member extraction happens before --gc-sections, so this may not be a live path
under --gc-sections, but I think it is a good approximation in practice.

* Specifying a file avoids output interleaving with --verbose.
* Required `=` prevents accidental overwrite of an input if the user forgets `=`. (Most of compiler drivers' long options accept `=` but not ` `)

Differential Revision: https://reviews.llvm.org/D109572

a954bb18

[Verifier] Try to fix MSVC build · ecd52a5b

Nikita Popov authored Sep 20, 2021

Some buildbots fail with:

> C:\a\llvm-clang-x86_64-expensive-checks-win\llvm-project\llvm\lib\IR\Verifier.cpp(4352): error C2678: binary '==': no operator found which takes a left-hand operand of type 'const llvm::MDOperand' (or there is no acceptable conversion)

Possibly the explicit MDOperand to Metadata* conversion will help?

ecd52a5b

[MCA] Fix a warning · f3cfec9c

Kazu Hirata authored Sep 20, 2021

This patch fixes the warning

  InstructionTables.cpp:27:56: error: loop variable 'Resource' of type
  'const std::pair<const uint64_t, ResourceUsage> &' (aka 'const
  pair<const unsigned long, llvm::mca::ResourceUsage> &') binds to a
  temporary constructed from type 'const std::pair<unsigned long,
  llvm::mca::ResourceUsage> &' [-Werror,-Wrange-loop-construct]

Note that Resource is declared as:

   SmallVector<std::pair<uint64_t, ResourceUsage>, 4> Resources;

without "const" for uint64_t.

f3cfec9c

[gn build] Port d85e347a · 93604c97
LLVM GN Syncbot authored Sep 20, 2021

93604c97

[RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter. · d85e347a

Craig Topper authored Sep 20, 2021

For strided accesses the loop vectorizer seems to prefer creating a
vector induction variable with a start value of the form
<i32 0, i32 1, i32 2, ...>. This value will be incremented each
loop iteration by a splat constant equal to the length of the vector.
Within the loop, arithmetic using splat values will be done on this
vector induction variable to produce indices for a vector GEP.

This pass attempts to dig through the arithmetic back to the phi
to create a new scalar induction variable and a stride. We push
all of the arithmetic out of the loop by folding it into the start,
step, and stride values. Then we create a scalar GEP to use as the
base pointer for a strided load or store using the computed stride.
Loop strength reduce will run after this pass and can do some
cleanups to the scalar GEP and induction variable.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107790

d85e347a

[ELF] Don't fall back to .text for e_entry · d001ab82

Fangrui Song authored Sep 20, 2021

We have the rule to simulate
(https://sourceware.org/binutils/docs/ld/Entry-Point.html),
but the behavior is questionable
(https://sourceware.org/pipermail/binutils/2021-September/117929.html).

gold doesn't fall back to .text.
The behavior is unlikely relied by projects (there is even a warning for
executable links), so let's just delete this fallback path.

Reviewed By: jhenderson, peter.smith

Differential Revision: https://reviews.llvm.org/D110014

d001ab82

[Verifier] Verify scoped noalias metadata · 8700f2bd

Nikita Popov authored Sep 16, 2021

Verify that !noalias, !alias.scope and llvm.experimental.noalias.scope
arguments have the format specified in
https://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata.
I've fixed up a lot of broken metadata used by tests in advance.
Especially using a scope instead of the expected scope list is a
commonly made mistake.

Differential Revision: https://reviews.llvm.org/D110026

8700f2bd

[lldb] Extract adding symbols for UUID/File/Frame (NFC) · a89bfc61

Jonas Devlieghere authored Sep 20, 2021

This moves the logic for adding symbols based on UUID, file and frame
into little helper functions. This is in preparation for D110011.

Differential revision: https://reviews.llvm.org/D110010

a89bfc61

[lldb] Fix whitespace in CommandObjectTarget (NFC) · fe4b8467
Jonas Devlieghere authored Sep 17, 2021

fe4b8467

[DSE] Add additional tests to cover review comments. · 963d3a22

Florian Hahn authored Sep 20, 2021

Adds additional tests following comments from D109844.

Also removes unusued in.ptr arguments and places in the call tests that
used loads instead of a getval call.

963d3a22

[mlir][linalg] Add IndexOp support to fusion on tensors. · 7be28d82

Tobias Gysi authored Sep 20, 2021

This revision depends on https://reviews.llvm.org/D109761 and https://reviews.llvm.org/D109766.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D109774

7be28d82

[MLIR][SCF] Add for-to-while loop transformation pass · 644b55d5

Morten Borup Petersen authored Aug 20, 2021

This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.

This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).

Differential Revision: https://reviews.llvm.org/D108454

644b55d5

[mlir][linalg] Fix typo (NFC). · 09100c75
Tobias Gysi authored Sep 20, 2021

09100c75

[SLP]Improve graph reordering. · bc69dd62

Alexey Bataev authored Aug 03, 2021

Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020

bc69dd62