Commits · 850b57c5fbe7b44d18c9667bb31adfbe307453a6 · Roger Ferrer / llvm-epi

Jul 14, 2021

[runtimes] Bring back TARGET_TRIPLE · 850b57c5

Louis Dionne authored Jul 14, 2021

This commit reverts 5099e015 and 77396bbc, which broke the build
in various ways. I'm reverting until I can investigate, since that
change appears to be way more subtle than it seemed.

850b57c5

[NFC] Drop redundant check prefixes in newly added test file · dfbfc277
Roman Lebedev authored Jul 14, 2021

dfbfc277

[Attributes] Use single method to fetch type from AttributeSet (NFC) · cd88a01c

Nikita Popov authored Jul 14, 2021

While it is nice to have separate methods in the public AttributeSet
API, we can fetch the type from the internal AttributeSetNode
using a generic API for all type attribute kinds.

cd88a01c

[NFC][PhaseOrdering] Add test for the lack of CSE after SimplifyCFG (PR51092) · a4856c73
Roman Lebedev authored Jul 14, 2021

a4856c73

[ARM] Move add(VMLALVA(A, X, Y), B) to VMLALVA(add(A, B), X, Y) · 31b8f400

David Green authored Jul 14, 2021

For i64 reductions we currently try and convert add(VMLALV(X, Y), B) to
VMLALVA(B, X, Y), incorporating the addition into the VMLALVA. If we
have an add of an existing VMLALVA, this patch pushes the add up above
the VMLALVA so that it may potentially be simplified further, for
example being folded into another VMLALV.

Differential Revision: https://reviews.llvm.org/D105686

31b8f400

[scudo] Don't enabled MTE for small alignment · 14362bf1
Vitaly Buka authored Jul 13, 2021
```
Differential Revision: https://reviews.llvm.org/D105954
```
14362bf1
Remove uses of deprecated target AllPassesAndDialectsNoRegistration in Bazel (NFC) · fbab8e6f
Mehdi Amini authored Jul 14, 2021
```
It was an alias for a long time.
```
fbab8e6f

[Verifier] Improve incompatible attribute type check · 5e4b33fe

Nikita Popov authored Jul 14, 2021

A couple of attributes had explicit checks for incompatibility
with pointer types. However, this is already handled generically
by the typeIncompatible() check. We can drop these after adding
SwiftError to typeIncompatible().

However, the previous implementation of the check prints out all
attributes that are incompatible with a given type, even though
those attributes aren't actually used. This has the annoying
result that the error message changes every time a new attribute
is added to the list. Improve this by explicitly finding which
attribute isn't compatible and printing just that.

5e4b33fe

Demangle: correct swift_async demangling for Microsoft scheme · 9c2de238

Saleem Abdulrasool authored Jul 14, 2021

The emission was corrected for the swift_async calling convention but
the demangling support was not.  This repairs the demangling support as
well.

9c2de238

[SelectionDAG] Add an overload of getStepVector that assumes step 1. · 1e30bf86

Eli Friedman authored Jul 12, 2021

This is mostly a minor convenience, but the pattern seems frequent
enough to be worthwhile (and we'll probably add more uses in the
future).

Differential Revision: https://reviews.llvm.org/D105850

1e30bf86

[WebAssembly] Codegen for v128.loadX_lane instructions · 970e0900

Thomas Lively authored Jul 14, 2021

Replace the experimental clang builtin and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50433.

Differential Revision: https://reviews.llvm.org/D105950

970e0900

[runtimes] Inherit the TARGET_TRIPLE that may be set by LLVM · 5099e015
Louis Dionne authored Jul 14, 2021

5099e015

[WebAssembly] Remove datalayout strings from llc tests · 122b0220

Thomas Lively authored Jul 14, 2021

The data layout strings do not have any effect on llc tests and will become
misleadingly out of date as we continue to update the canonical data layout, so
remove them from the tests.

Differential Revision: https://reviews.llvm.org/D105842

122b0220

[ELF] --fortran-common: prefer STB_WEAK to COMMON · 7de2173c

Fangrui Song authored Jul 14, 2021

The ELF specification says "The link editor honors the common definition and
ignores the weak ones." GNU ld and our Symbol::compare follow this, but the
--fortran-common code (D86142) made a mistake on the precedence.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51082

Reviewed By: peter.smith, sfertile

Differential Revision: https://reviews.llvm.org/D105945

7de2173c

[ARM] Lower v16i8 -> i64 VMLA reductions. · 338314f9

David Green authored Jul 14, 2021

MVE does not have a VMLALV instruction that can perform v16i8 -> i64
reductions, like it does for v8i16->i64 and v4i32->i64 reductions. That
means that the pattern to create them will be spilt up by type
legalization, creating a lot of instructions.

This extends the patterns for matching i64 reductions a little to handle
the v16i8->i64 case. We need to turn them into a pair of v8i16->i64
VMLALVs that each perform half of the reduction and are summed together
(so the later is a VMLALVA). The order of the lanes does not matter for
the reduction so we generate a MVEEXT for the extension, that will
either be folded into a extending load or can be optimized to a
VREV/VMOVL. Some of the resulting codegen isn't optimal, but will be
improved in a later patch.

Differential Revision: https://reviews.llvm.org/D105680

338314f9

[InstCombine] reorder icmp with offset folds for better results · ca6e117d

Sanjay Patel authored Jul 14, 2021

This set of folds was added recently with:
c7b658ae
0c400e89
40b752d2

...and I noted that this wasn't likely to fire in code derived
from C/C++ source because of nsw in particular. But I didn't
notice that I had placed the code above the no-wrap block
of transforms.

This is likely the cause of regressions noted from the previous
commit because -- as shown in the test diffs -- we may have
transformed into a compare with an arbitrary constant rather
than a simpler signbit test.

ca6e117d

[InstCombine] add tests for icmp with constant offset and no-wrap flags; NFC · b155c871
Sanjay Patel authored Jul 14, 2021

b155c871

[LV] Print remark when loop cannot be vectorized due to invalid costs. · efaf3099

Sander de Smalen authored Jul 14, 2021

This patch emits remarks for instructions that have invalid costs for
a given set of vectorization factors. Some example output:

  t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
      dst[i] = sinf(src[i]);
                    ^
  t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32
      dst[i] = sinf(src[i]);
               ^
  t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
      dst[i] = sinf(src[i]);
             ^

Reviewed By: fhahn, kmclaughlin

Differential Revision: https://reviews.llvm.org/D105806

efaf3099

GlobalISel: Handle lowering non-power-of-2 extloads · 47269da5
Matt Arsenault authored Jun 10, 2021

47269da5

[CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> invalid. · eac16707

Sander de Smalen authored Jul 14, 2021

At the moment, <vscale x 1 x eltty> are not yet fully handled by the
code-generator, so to avoid vectorizing loops with that VF, we mark the
cost for these types as invalid.
The reason for not adding a new "TTI::getMinimumScalableVF" is because
the type is supposed to be a type that can be legalized. It partially is,
although the support for these types need some more work.

Reviewed By: paulwalker-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D103882

eac16707

Combine two diagnostics into one and correct grammar · aefd6c61

Aaron Ballman authored Jul 14, 2021

The anonymous and non-anonymous bit-field diagnostics are easily
combined into one diagnostic. However, the diagnostic was missing a
"the" that is present in the almost-identically worded
warn_bitfield_width_exceeds_type_width diagnostic, hence the changes to
test cases.

aefd6c61

[AMDGPU] Check llc-pipeline.ll with -match-full-lines -strict-whitespace · 372bb082

Jay Foad authored Jul 13, 2021

This prevents breaking the indentation that shows the structure of the
pass managers.

Differential Revision: https://reviews.llvm.org/D105891

372bb082

[SLP]Workaround for InsertSubVector cost. · 2eb50baf

Alexey Bataev authored Jul 12, 2021

The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.

Differential Revision: https://reviews.llvm.org/D105827

2eb50baf

[runtimes] NFCI: Drop intermediate CMake variable TARGET_TRIPLE · 77396bbc
Louis Dionne authored Jul 14, 2021
```
We might as well use the various XXX_TARGET_TRIPLE variables directly.
```
77396bbc

[Lexer] Fix bug in `makeFileCharRange` called on split tokens. · 93dc73b1

Yitzhak Mandelbaum authored Jul 02, 2021

When the end loc of the specified range is a split token, `makeFileCharRange`
does not process it correctly. This patch adds proper support for split tokens.

Differential Revision: https://reviews.llvm.org/D105365

93dc73b1

[flang][OpenMP] Fix semantic check of test case in taskloop simd construct · 67002b5f

Peixin Qiao authored Jul 14, 2021

The following semantic check is removed in OpenMP Version 5.0:
```
Taskloop simd construct restrictions: No reduction clause can be specified.
```

Also fix several typos.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D105874

67002b5f

[AIX] Enable dollar sign as PC in inlineasm · fe52296a

Jinsong Ji authored Jul 14, 2021

$ is used as PC for PowerPC inlineasm, ELF use it,
enable it for AIX XCOFF as well.

Reviewed By: #powerpc, amyk, nemanjai

Differential Revision: https://reviews.llvm.org/D105956

fe52296a

[mlir][linalg] Fix typo in ExtractSliceOfPadTensorSwapPattern · b70dde52
Matthias Springer authored Jul 14, 2021
```
Differential Revision: https://reviews.llvm.org/D105607
```
b70dde52

[docs] Update CMake cross compiling guide link · 56e6d474

oToToT authored Jul 14, 2021

The CMake community Wiki has been moved to the [[ https://gitlab.kitware.com/cmake/community/wikis/home | Kitware GitLab Instance ]].
Also, the original anchor for `Information how to set up various cross compiling toolchains` section might not work as expected. The original content is now being collapsed, so browser won't navigate to the right section directly.

Hence, I think it might be better to provide the section name instead of `this section` with link to help readers find the right section by themselves.

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D104996

56e6d474

ARM: reuse existing libcall global variable if possible. · b18bda67

Tim Northover authored Jul 14, 2021

If we try to create a new GlobalVariable on each iteration, the Module will
detect the name collision and "helpfully" rename later iterations by appending
".1" etc. But "___udivsi3.1" doesn't exist and we definitely don't want to try
to call it.

So instead check whether there's already a global with the right name in the
module and use that if so.

b18bda67

[SLP] match logical and/or as reduction candidates · 25ee55c0

Sanjay Patel authored Jul 14, 2021

This has been a work-in-progress for a long time...we finally have all of
the pieces in place to handle vectorization of compare code as shown in:
https://llvm.org/PR41312

To do this (see PhaseOrdering tests), we converted SimplifyCFG and
InstCombine to the poison-safe (select) forms of the logic ops, so now we
need to have SLP recognize those patterns and insert a freeze op to make
a safe reduction:
https://alive2.llvm.org/ce/z/NH54Ah

We get the minimal patterns with this patch, but the PhaseOrdering tests
show that we still need adjustments to get the ideal IR in some or all of
the motivating cases.

Differential Revision: https://reviews.llvm.org/D105730

25ee55c0

[Analyzer][solver] Add dump methods for (dis)equality classes. · bdf31471
Gabor Marton authored Jun 09, 2021
```
This proved to be very useful during debugging.

Differential Revision: https://reviews.llvm.org/D103967
```
bdf31471

[lld][MachO] Code cleanup · d21772fa

Alexander Shaposhnikov authored Jul 14, 2021

Make use of ArgList::getLastArgValue. NFC.

Test plan: make check-lld-macho

Differential revision: https://reviews.llvm.org/D105452

d21772fa

[RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs · df686842

Djordje Todorovic authored Jun 28, 2021

This new MIR pass removes redundant DBG_VALUEs.

After the register allocator is done, more precisely, after
the Virtual Register Rewriter, we end up having duplicated
DBG_VALUEs, since some virtual registers are being rewritten
into the same physical register as some of existing DBG_VALUEs.
Each DBG_VALUE should indicate (at least before the LiveDebugValues)
variables assignment, but it is being clobbered for function
parameters during the SelectionDAG since it generates new DBG_VALUEs
after COPY instructions, even though the parameter has no assignment.
For example, if we had a DBG_VALUE $regX as an entry debug value
representing the parameter, and a COPY and after the COPY,
DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets
rewritten into $regX, we'd end up having redundant DBG_VALUE.

This breaks the definition of the DBG_VALUE since some analysis passes
might be built on top of that premise..., and this patch tries to fix
the MIR with the respect to that.

This first patch performs bacward scan, by trying to detect a sequence of
consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one
variable but the last one:

For example:

(1) DBG_VALUE $edi, !"var1", ...
(2) DBG_VALUE $esi, !"var2", ...
(3) DBG_VALUE $edi, !"var1", ...
 ...

in this case, we can remove (1).

By combining the forward scan that will be introduced in the next patch
(from this stack), by inspecting the statistics, the RemoveRedundantDebugValues
removes 15032 instructions by using gdb-7.11 as a testbed.

Differential Revision: https://reviews.llvm.org/D105279

df686842

[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C,... · d561b6fb

Simon Pilgrim authored Jul 14, 2021

[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183) (REAPPLIED)

As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep':

define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3
  %sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %sel = select i1 %a2, i64 %a1, i64 %a3
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address:

define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %sel = select i1 %a2, i64 0, i64 %a1
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

Reapplied with a fix for the bpf "-bpf-disable-avoid-speculation" tests

Differential Revision: https://reviews.llvm.org/D105901

d561b6fb

[NFC] [Coroutines] Remove unused CoroFree · 12d04ce9
Chuanqi Xu authored Jul 14, 2021

12d04ce9
[lldb][docs] Remove mention of subversion. NFC. · f7d931ac
Bruce Mitchener authored Jul 14, 2021
```
Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D103744
```
f7d931ac

[X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to... · ee71c1bb

Simon Pilgrim authored Jul 14, 2021

[X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to i32/i64 and vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction.

We know that "CVTTPS2SI" returns 0x80000000 for out of range inputs (and for FP_TO_UINT, negative float values are undefined). We can use this to make unsigned conversions from vXf32 to vXi32 more efficient, particularly on targets without blend using the following logic:

small := CVTTPS2SI(x);
fp_to_ui(x) := small | (CVTTPS2SI(x - 2^31) & ARITHMETIC_RIGHT_SHIFT(small, 31))

Even on targets where "PBLENDVPS"/"PBLENDVB" exists, it is often a latency 2, low throughput instruction so this logic is applied there too (in particular for AVX2 also). It furthermore gets rid of one high latency floating point comparison in the previous lowering.

@TomHender checked the correctness of this for all possible floats between -1 and 2^32 (both ends excluded).

Original Patch by @TomHender (Tom Hender)

Differential Revision: https://reviews.llvm.org/D89697

ee71c1bb

[gn build] Port c08dabb0 · 90e7f5d2
LLVM GN Syncbot authored Jul 14, 2021

90e7f5d2

Revert rGb803294cf78714303db2d3647291a2308347ef23 : "[InstCombine] Fold... · 0722f3d0

Simon Pilgrim authored Jul 14, 2021

Revert rGb803294cf78714303db2d3647291a2308347ef23 : "[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)"

Missed some BPF test changes that need addressing

0722f3d0