Commits · f7d1a60cac6b7bb3199e3aba083346bf0eb4a24f · Lorenzo Albano / LLVM bpEVL

Sep 22, 2021

[mailmap] Add entry for myself · f7d1a60c
Joseph Tremoulet authored Sep 22, 2021

f7d1a60c
[AArch64][SVE] Add extract_subvector patterns for unpacked fp16 and bfloat types. · 6375ca40
Sander de Smalen authored Sep 22, 2021
```
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110163
```
6375ca40

[SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR. · 3e8d2008

Sander de Smalen authored Sep 22, 2021

This code seems untested and is likely obsolete, because this case
should already be handled by the code that legalizes the result type
of EXTRACT_SUBVECTOR.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110061

3e8d2008

AArch64: use indivisible cmpxchg for 128-bit atomic loads at O0 · 3a00e58c

Tim Northover authored Sep 22, 2021

Like normal atomicrmw operations, at -O0 the simple register-allocator can
insert spills into the LL/SC loop if it's expanded and visible when regalloc
runs. This can cause the operation to never succeed by repeatedly clearing the
monitor. Instead expand to a cmpxchg, which has a pseudo-instruction for -O0.

3a00e58c

[ELF][test] Restore important part of ICF alignment test · 05b13034

Andrew Ng authored Sep 20, 2021

Restore the checking of addresses in ICF test which was testing the
behaviour of ICF with regards to different alignments of otherwise
identical sections. Also make the test more robust to layout changes.

Differential Revision: https://reviews.llvm.org/D110090

05b13034

[SLP][NFC]Rename function in the test for better matching of the · b6d10beb
Alexey Bataev authored Sep 22, 2021
```
transformation.
```
b6d10beb

[lldb] JITLoaderGDB tests can use lli in ORC greedy mode · 9689c1b7

Stefan Gränitz authored Sep 22, 2021

At first, lli only supported lazy mode for ORC. Greedy mode was added with e1579894 and is the default settings now. JITLoaderGDB tests don't rely on laziness, so we can switch them to greedy and remove some complexity.

9689c1b7

[SelectionDAG] Add PromoteIntOp_INSERT_SUBVECTOR. · d5681f1d

Sander de Smalen authored Sep 22, 2021

This is required to codegen something like:
  <vscale x 8 x i16> @llvm.experimental.vector.insert(<vscale x 8 x i16> %vec,
                                                      <vscale x 2 x i16> %subvec,
                                                      i64 %idx)
where the output vector is legal, but the input vector needs promoting.

It implements this by performing the whole operation on the promoted type,
and then truncating the result.

Reviewed By: david-arm, craig.topper

Differential Revision: https://reviews.llvm.org/D110059

d5681f1d

[gn build] Port 7a320b27 · f099ac83
LLVM GN Syncbot authored Sep 22, 2021

f099ac83
[gn build] (manually) port f8b1cc36 · c828b93f
Nico Weber authored Sep 22, 2021

c828b93f

[Passes] Run vector-combine early with -fenable-matrix. · a7c6471a

Florian Hahn authored Sep 22, 2021

IR with matrix intrinsics is likely to also contain large vector
operations, which can benefit from early simplifications.

This is the last step in a series of changes to improve code-gen for
code using matrix subscript operators with the C/C++ matrix extension in
CLang, like

    using matrix_t = double __attribute__((matrix_type(15, 15)));

    void foo(unsigned i, matrix_t &A, matrix_t &B) {
      for (unsigned j = 0; j < 4; ++j)
        for (unsigned k = 0; k < i; k++)
          B[k][j] -= A[k][j] * B[i][j];
    }

https://clang.godbolt.org/z/6dKxK1Ed7

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D102496

a7c6471a

Revert "[InstCombine] fold cast of right-shift if high bits are not demanded" · c6013f71

Sanjay Patel authored Sep 22, 2021

This reverts commit 2f6b0731.

This caused several bots to hit an infinite loop at stage 2,
so it needs to be reverted while figuring out how to fix that.

c6013f71

Revert "[CodeGen] regenerate test checks; NFC" · 1ee851c5

Sanjay Patel authored Sep 22, 2021

This reverts commit 52832cd9.
The motivating commit 2f6b0731 caused several bots to hit
an infinite loop at stage 2, so that needs to be reverted too
while figuring out how to fix that.

1ee851c5

[Matrix] Emit assumption that matrix indices are valid. · ea21d688

Florian Hahn authored Sep 22, 2021

The matrix extension requires the indices for matrix subscript
expression to be valid and it is UB otherwise.

extract/insertelement produce poison if the index is invalid, which
limits the optimizer to not be bale to scalarize load/extract pairs for
example, which causes very suboptimal code to be generated when using
matrix subscript expressions with variable indices for large matrixes.

This patch updates IRGen to emit assumes to for index expression to
convey the information that the index must be valid.

This also adjusts the order in which operations are emitted slightly, so
indices & assumes are added before the load of the matrix value.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D102478

ea21d688

[lldb] [Windows] Fix continuing from breakpoints and singlestepping on ARM/AArch64 · 9f34f75f

Martin Storsjö authored Sep 14, 2021

Based on suggestions by Eric Youngdale.

This fixes https://llvm.org/PR51673.

Differential Revision: https://reviews.llvm.org/D109777

9f34f75f

[ARM] Allow smaller VMOVL in tail predicated loops · 02cd8a6b

David Green authored Sep 22, 2021

This allows VMOVL in tail predicated loops so long as the the vector
size the VMOVL is extending into is less than or equal to the size of
the VCTP in the tail predicated loop. These cases represent a
sign-extend-inreg (or zero-extend-inreg), which needn't block tail
predication as in https://godbolt.org/z/hdTsEbx8Y.

For this a vecsize has been added to the TSFlag bits of MVE
instructions, which stores the size of the elements that the MVE
instruction operates on. In the case of multiple size (such as a
MVE_VMOVLs8bh that extends from i8 to i16, the largest size was be
chosen). The sizes are encoded as 00 = i8, 01 = i16, 10 = i32 and 11 =
i64, which often (but not always) comes from the instruction encoding
directly. A unit test was added, and although only a subset of the
vecsizes are currently used, the rest should be useful for other cases.

Differential Revision: https://reviews.llvm.org/D109706

02cd8a6b

Unbreak module builds by making InstructionWorklist.h non-modular · a5e1c746

Raphael Isemann authored Sep 22, 2021

This regressed in D110181 and apparently the header intentionally requires
DEBUG_TYPE to be defined by the including file. Just exclude the header from
the module to unbreak the build.

a5e1c746

Don't fold (select C, (gep Ptr, Idx), Ptr) if C is vector but Idx is scalar · d0746f2e

Yi Kong authored Sep 22, 2021

The folding rule (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C,
Idx, 0)) creates a malformed SELECT IR if C is a vector while Idx is scalar.

  SELECT VecC, ScalarIdx, 0

We could splat Idx to a vector but it defeats the purpose of
optimisation. Don't apply the folding rule in this case.

This fixes a regression from commit d561b6fb.

d0746f2e

[hwasan] also omit safe mem[cpy|mov|set]. · 36daf074
Florian Mayer authored Sep 15, 2021
```
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D109816
```
36daf074

[SelectionDAG] Make WidenVecRes_Convert work for scalable vectors. · 4ca1fbe3

Sander de Smalen authored Sep 22, 2021

Most of the code wasn't yet scalable safe, although most of the
code conceptually just works for scalable vectors. This change
makes the algorithm work on ElementCount, where appropriate,
and leaves the fixed-width only code to use `getFixedNumElements`.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110058

4ca1fbe3

[LoopVectorize][X86] Add operands to make it more obvious what line the CHECK concerns · 41492d77

Simon Pilgrim authored Sep 22, 2021

As we're checking the cost debug analysis these should match the original IR line - so we shouldn't have any variable naming issues.

I'm investigating v4i32 mul -> PMADDDW costs handling (for PR47437) and these CHECK lines were proving tricky to keep track of

41492d77

[VectorCombine] Switch to using a worklist. · 300870a9

Florian Hahn authored Sep 22, 2021

This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.

Suggested in D100302.

The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.

Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.

Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%

http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D110171

300870a9

[AArch64][SVE] Add missing load/store patterns for unpacked bfloat vectors. · ab3607c0
Sander de Smalen authored Sep 21, 2021
```
Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D110063
```
ab3607c0

[AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers · 0205806d

Jay Foad authored Sep 21, 2021

Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac
instruction, so we might as well convert it to the more flexible VOP3-
only mad/fma form.

With this change, the only way we should emit VOP3-encoded mac/fmac is
if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs
for both src0 and src1. In all other cases the mac/fmac should either be
converted to mad/fma or shrunk to VOP2 encoding.

Differential Revision: https://reviews.llvm.org/D110156

0205806d

[AMDGPU] Divergence-driven instruction selection for mul i32 · 3828ea61
Jay Foad authored Sep 16, 2021
```
Differential Revision: https://reviews.llvm.org/D109881
```
3828ea61
[ARM] Add additional tests for VMOVL in tail predicated loops. · 636fc0ef
David Green authored Sep 22, 2021

636fc0ef

tsan: write uptime in mem profile · 0ee77d6d

Dmitry Vyukov authored Sep 21, 2021

Write uptime in real time seconds for every mem profile record.
Uptime is useful to make more sense out of the profile,
compare random lines, etc.

Depends on D110153.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D110154

0ee77d6d

tsan: remove stale comment · ae6d57ca

Dmitry Vyukov authored Sep 21, 2021

We do query it every 100ms now.
(GetRSS was fixed to not be dead slow IIRC)

Depends on D110152.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D110153

ae6d57ca

tsan: move mem profile initialization into separate function · e8101f21

Dmitry Vyukov authored Sep 21, 2021

BackgroundThread function is quite large,
move mem profile initialization into a separate function.

Depends on D110151.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D110152

e8101f21

tsan: include internal allocator info in mem profile · b8aa9b0c

Dmitry Vyukov authored Sep 21, 2021

We allocate things from the internal allocator,
it's useful to know how much it consumes.

Depends on D110150.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D110151

b8aa9b0c

tsan: make mem profile data more consistent · 58a157cd

Dmitry Vyukov authored Sep 21, 2021

We currently query number of threads before reading /proc/self/smaps.
But reading /proc/self/smaps can take lots of time for huge processes
and it's retries several times with different buffer sizes.
Overall it can take tens of seconds. This can make number of threads
significantly inconsistent with the rest of the stats.
So query it after reading /proc/self/smaps.

Depends on D110149.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D110150

58a157cd

tsan: include MBlock/SyncObj stats into mem profile · eefef56e

Dmitry Vyukov authored Sep 21, 2021

Include info about MBlock/SyncObj memory consumption in the memory profile.

Depends on D110148.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D110149

eefef56e

tsan: account for mid app range in mem profile · 608ffc98

Dmitry Vyukov authored Sep 21, 2021

We account low and high ranges, but forgot abount the mid range.
Account mid range as well.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110148

608ffc98

[Utils] Replace llc with cat for tests · ecd5145c

Sebastian Neubauer authored Sep 21, 2021

Make the update_llc_test_checks script test independant of llc behavior
by using cat with static files to simulate llc output.

This allows changing llc without breaking the script test case.

The update script is executed in a temporary directory, so the
llc-generated assembly files are copied there. %T is deprecated, but it
allows copying a file with a predictable filename.

Differential Revision: https://reviews.llvm.org/D110143

ecd5145c

[clang][ASTImporter] Generic attribute import handling (first step). · 7ce63853

Balázs Kéri authored Sep 22, 2021

Import of Attr objects was incomplete in ASTImporter.
This change introduces support for a generic way of importing an attribute.
For an usage example import of the attribute AssertCapability is
added to ASTImporter.
Updating the old attribute import code and adding new attributes or extending
the generic functions (if needed) is future work.

Reviewed By: steakhal, martong

Differential Revision: https://reviews.llvm.org/D109608

7ce63853

[InstCombine] Move InstCombineWorklist to Utils to allow reuse (NFC). · e08a5dc8

Florian Hahn authored Sep 22, 2021

InstCombine's worklist can be re-used by other passes like
VectorCombine. Move it to llvm/Transform/Utils and rename it to
InstructionWorklist.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110181

e08a5dc8

[flang] Change complex type define in runtime for clang-cl · abbb0f90

Diana Picus authored Sep 21, 2021

When compiling the runtime with a version of clang-cl newer than 12, we
define CMPLXF as __builtin_complex, which returns a float _Complex type.
This errors out in contexts where the result of CMPLXF is expected to be
a float_Complex_t. This is defined as _Fcomplex whenever _MSC_VER is
defined (and as float _Complex otherwise).

This patch defines float_Complex_t & friends as _Fcomplex only when
we're using "true" MSVC, and not just clang-pretending-to-be-MSVC. This
should only affect clang-cl >= 12.

Differential Revision: https://reviews.llvm.org/D110139

abbb0f90

[lldb] Add --stack option to `target symbols add` command · 47f79c60

Jonas Devlieghere authored Sep 21, 2021

Currently you can ask the target symbols add command to locate the debug
symbols for the current frame. This patch add an options to do that for
the whole call stack.

Differential revision: https://reviews.llvm.org/D110011

47f79c60

tsan: prepare for trace mapping removal · 4986959e

Dmitry Vyukov authored Sep 21, 2021

Don't test for presence of the trace mapping,
it will be removed soon.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D110194

4986959e

tsan: uninline Enable/DisableIgnores · 82e593cf

Dmitry Vyukov authored Sep 21, 2021

ScopedInterceptor::Enable/DisableIgnores is only used for some special cases.
Unline them from the common interceptor handling.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D110157

82e593cf