Commits · 9c6a3c1acde48b9827fdcda2766fe881d0e142a4 · Roger Ferrer / llvm-epi

Nov 22, 2024
- Merge remote-tracking branch 'downstream/EPI' into EPI · 9c6a3c1a
  Jenkins CI authored Nov 22, 2024
  
  9c6a3c1a
Nov 21, 2024

Merge remote-tracking branch 'upstream/main' into EPI · 6eb369e8
Roger Ferrer authored Nov 21, 2024

6eb369e8
[LLVM][IR] Use splat syntax when printing ConstantExpr based splats. (#116856) · 56c091ea
Paul Walker authored Nov 21, 2024
```
This brings the printing of scalable vector constant splats inline with
their fixed length counterparts.
```
56c091ea

[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852) · bdd10d9d

kadir çetinkaya authored Nov 21, 2024

Starting with 41e3919d DiagnosticsEngine
creation might perform IO. It was implicitly defaulting to
getRealFileSystem. This patch makes it explicit by pushing the decision
making to callers.

It uses ambient VFS if one is available, and keeps using
`getRealFileSystem` if there aren't any VFS.

bdd10d9d

[AArch64] Don't emit Neon in streaming[-compatible] functions with -fzero-call-used-regs (#116995) · 83c7784c

Benjamin Maxwell authored Nov 21, 2024

Previously, with `-fzero-call-used-regs` clang/LLVM would incorrectly
emit Neon instructions in streaming functions, and streaming-compatible
functions without SVE.

With this change:

* In streaming functions, Z/p registers will be zeroed
* In streaming compatible functions w/o SVE, D registers will be zeroed
  - (As Neon vector instructions are illegal including `movi v..`)

83c7784c

[bazel] format utils/bazel/llvm-project-overlay/libc/libc_build_rules.bzl · 5bdee355
Mikhail Goncharov authored Nov 21, 2024

5bdee355
[X86] IsNOT - match or(not(X),not(Y)) -> and(X,Y) · 600a83bf
Simon Pilgrim authored Nov 20, 2024
```
Fixes #116977
```
600a83bf
[X86] Add test coverage for #116977 · 97ac8484
Simon Pilgrim authored Nov 20, 2024

97ac8484
[Clang] Handle `[[noreturn]]` constructors in CFG (#115558) · a7427410
Oleksandr T. authored Nov 21, 2024
```
Fixes #63009.
```
a7427410

[NFC][Clang][AArch64]Refactor implementation of Neon vectors MFloat8… (#114804) · aaba8406

CarolineConcatto authored Nov 21, 2024

…x8 and MFloat8x16

This patch adds MFloat8 as a TypeFlag and Kind on Neon to generate the
typedefs using emitNeonTypeDefs.
It does not need any change in Clang, because SEMA and CodeGen use the
Builtins defined in AArch64SVEACLETypes.def

aaba8406

[MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) · ef102b4a

Florian Hahn authored Nov 21, 2024

The improvements in 63917e19 / #70796 do not check for memory
barriers/unmodelled sideeffects, which means we may incorrectly hoist
loads across memory barriers.

Fix this by checking any machine instruction in the loop is a load-fold
barrier.

PR: https://github.com/llvm/llvm-project/pull/116987

ef102b4a

[ARM] Fix undefined behaviour in bf16->float conversion (#116985) · eb48e110

Oliver Stannard authored Nov 21, 2024

This was implementing the bf16->float conversion function using a
left-shift of a signed integer, so for negative floating-point values a
1 was being shifted into the sign bit of the signed integer intermediate
value. This is undefined behaviour, and was caught by UBSan.

The vector versions are code-generated via Neon builtin functions, so
probably don't have the same UB problem, but I've updated them anyway to
be consistent.

Fixes #61983.

eb48e110

[LiveRangeEdit][RISCV] Allow rematerialization to extend live ranges · b3f0819f

Roger Ferrer authored Nov 14, 2024

This goes against common wisdom: live ranges are not to be extended.
However, cross-bank instructions (such a vector-splat) may benefit
from being able to do that, especially in systems where spilling
one register class is more expensive than the other (in EPI spill code
for vectors is much more expensive than spill code for the scalar +
vector-splat).

This change leverages the fact that in RISC-V RA happens in two steps:
first vectors and then scalars (and insert vsetvli happens inbetween),
so some of the rematerializations for vectors are still available.

In the case of vector-splats, when they cannot be folded into the
instructions (like in the testcase) they are typically hoisted out of
the loop. This is efficient but can cause spill code (like in the testcase).
So we end with a loop that in each iteration reloads a vector that is
basically a splat. Computing a splat is much faster than loading a vector
and is also faster than having to load the scalar before the splat (if
the scalar got spilled too).

But rematerialization does not happen if the live ranges are not available
at the point of the rematerialization. And this is what happens with
hoisted vector-splats, the scalar used in the splat may not be live in the
loop body. So in this case we extend the live range of the scalar so it is
live to the rematerialization point. We do that only in a very restricted
set of cases and we have observed not a lot of disturbance to other
RISC-V testcases.

This is gated by a TargetInstrInfo hook which is only set to true in EPI.

b3f0819f

Update tests · e493e776

Roger Ferrer authored Nov 20, 2024

We forgot to update these tests in the last merge.

Also adding some nounwind in tests that do not care about CFI
directives.

e493e776

Fix bug that produced no output if no changes were made to the input · abcac33d
Daniel Trujillo authored Oct 22, 2024 and Roger Ferrer committed Nov 21, 2024

abcac33d
Merge upstream/main · b79c0438
Roger Ferrer authored Nov 21, 2024

b79c0438

[DAGCombiner] Limit steps in shouldCombineToPostInc (#116030) · 00d383ee

Jonathan Cohen authored Nov 21, 2024

Currently the function will walk the entire DAG to find other candidates
to perform a post-inc store. This leads to very long compilation times
on large functions. Added a MaxSteps limit to avoid this, which is also
aligned to how hasPredecessorHelper is used elsewhere in the code.

00d383ee

Add EPIVPlanTransforms, use it for EPI VPlan native path. · 4e8bdd75
Lorenzo Albano authored Nov 18, 2024 and Roger Ferrer committed Nov 21, 2024

4e8bdd75
Restore VPlanHCFGBuilder.{cpp,h} as they are upstream. · 4ce117fa
Lorenzo Albano authored Nov 12, 2024 and Roger Ferrer committed Nov 21, 2024

4ce117fa
Add EPI VPlan builder and use it in -epi-vplan-native-path. · d8f71cd7
Lorenzo Albano authored Nov 07, 2024 and Roger Ferrer committed Nov 21, 2024

d8f71cd7
[SDAG] [X86] Extend SplitVecOp_VSETCC for STRICT_FSETCCS (#116768) · a2326008
abhishek-kaushik22 authored Nov 21, 2024
```
Closes #116767
```
a2326008

[LoongArch] Add conditional compilation for FP approximation intrinsics (#117132) · bbafe590

Ami-zhang authored Nov 21, 2024

Introduce a check for `__loongarch_frecipe` macro around the FP
approximation intrinsic implementation. This ensures that these
intrinsics are only included when this macro is defined, providing
better flexibility and control over the usage of FP approximation
instructions.

bbafe590

[mlir][vector] Add more tests for ConvertVectorToLLVM (10/n) (#117041) · 4086ead6

Andrzej Warzyński authored Nov 21, 2024

Adds tests with scalable vectors for the Vector-To-LLVM conversion pass.
Covers the following Ops:

  * `vector.maskedload`,
  * `vector.maskedstore`,
  * `vector.gather`,
  * `vector.scatter`.

In addition:
* For consistency with other tests, renamed test function names
  (e.g. `@masked_load_op` -> `@masked_load_op`)
* Made some test names more descriptive, e.g `@gather_op_2d` ->
  `@gather_1d_from_2d`.

4086ead6

[SCEV] Fix sext handling for `getConstantMultiple` (#117093) · 458dfbd8

Yingwei Zheng authored Nov 21, 2024

Counterexample: 219 is a multiple of 73. But `sext i8 219 to i16 =
65499` is not.
Fixes https://github.com/llvm/llvm-project/issues/116483.

458dfbd8

[LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model · 6377ae46

wanglei authored Nov 21, 2024

This commit fixes an issue in the large code model where non-dso_local
function calls did not use the GOT as expected in PIC mode. Instead,
direct PC-relative access was incorrectly applied, leading to linker
errors when building shared libraries.

For `ExternalSymbol`, it is not possible to determine whether it is
dso_local during pseudo-instruction expansion. We use target flags to
differentiate whether GOT should be used.

Reviewed By: heiher, SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/117099

6377ae46

[clang] [NFC] Remove SourceLocation() parameter from Diag.Report() calls in... · bc7f24cd

Boaz Brickner authored Nov 21, 2024

[clang] [NFC] Remove SourceLocation() parameter from Diag.Report() calls in SourceManager, and use the equivalent Report() overload instead (#116937)

bc7f24cd

[AMDGPU] Fix some cache policy checks for GFX12+ (#116396) · ade0750e

Jay Foad authored Nov 21, 2024

Fix coding errors found by inspection and check that the swz bit still
serves to prevent merging of buffer loads/stores on GFX12+.

ade0750e

[X86][MC] Add R_X86_64_CODE_4_GOTTPOFF (#116633) · 6f76b2a3

Feng Zou authored Nov 21, 2024

For

  mov name@GOTTPOFF(%rip), %reg
  add name@GOTTPOFF(%rip), %reg

add

  `R_X86_64_CODE_4_GOTTPOFF` = 44

if the instruction starts at 4 bytes before the relocation offset. It's
similar to R_X86_64_GOTTPOFF.

Linker can treat `R_X86_64_CODE_4_GOTTPOFF` as `R_X86_64_GOTTPOFF` or
convert the instructions above to

  mov $name@tpoff, %reg
  add $name@tpoff, %reg

if the first byte of the instruction at the relocation `offset - 4` is
`0xd5` (namely, encoded w/REX2 prefix) when possible.

Binutils patch:
https://github.com/bminor/binutils-gdb/commit/a533c8df598b5ef99c54a13e2b137c98b34b043c
Binutils mailthread:
https://sourceware.org/pipermail/binutils/2023-December/131463.html
ABI discussion:
https://groups.google.com/g/x86-64-abi/c/ACwD-UQXVDs/m/vrgTenKyFwAJ
Blog: https://kanrobert.github.io/rfc/All-about-APX-relocation

6f76b2a3

[DebugInfo][InstrRef][MIR][GlobalIsel][MachineLICM] NFC Use std::move to avoid copying (#116935) · 46f43b6d
abhishek-kaushik22 authored Nov 21, 2024

46f43b6d
[llvm] Remove `br i1 undef` from some regression tests [NFC] (#117112) · abb9f9fa
Lee Wei authored Nov 21, 2024
```
This PR removes tests with `br i1 undef` under
`llvm/tests/Transforms/Loop*, Lower*`.
```
abb9f9fa

[NFCI][WPD]Use unique string saver to store type id (#106932) · 97b29034

Mingming Liu authored Nov 20, 2024

Currently, both
[TypeIdMap](https://github.com/llvm/llvm-project/blob/67a1fdb014790a38a205d28e1748634de34471dd/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1356)
and
[TypeIdCompatibleVtableMap](https://github.com/llvm/llvm-project/blob/67a1fdb014790a38a205d28e1748634de34471dd/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1363)
keep type-id as `std::string` in the combined index for LTO indexing
analysis.

With this change, index uses a unique-string-saver to own the string
copies and two maps above can use string references to save some memory.

This shows a 3% memory reduction (from 8.2GiB to 7.9GiB) in an internal
binary with high indexing memory usage.

97b29034

[InstCombine] Convert logical and/or with `icmp samesign` into bitwise ops (#116983) · a6fefc82

Yingwei Zheng authored Nov 21, 2024

See the following case:
```
define i1 @test_logical_and_icmp_samesign(i8 %x) {
  %cmp1 = icmp ne i8 %x, 9
  %cmp2 = icmp samesign ult i8 %x, 11
  %and = select i1 %cmp1, i1 %cmp2, i1 false
  ret i1 %and
}
```
Currently we cannot convert this logical and into a bitwise and due to
the `samesign` flag. But if `%cmp2` evaluates to `poison`, we can infer
that `%cmp1` is either `poison` or `true` (`samesign` violation
indicates that X is negative). Therefore, `%and` still evaluates to
`poison`.

This patch converts a logical and into a bitwise and iff TV is poison
implies that Cond is either poison or true. Likewise, we convert a
logical or into a bitwise or iff FV is poison implies that Cond is
either poison or false.

Note:
1. This logic is implemented in InstCombine. Not sure whether it is
profitable to move it into ValueTracking and call `impliesPoison(TV/FV,
Sel)` instead.
2. We only handle the case that `ValAssumedPoison` is `icmp samesign
pred X, C1` and `V` is `icmp pred X, C2`. There are no suitable variants
for `isImpliedCondition` to pass the fact that X is [non-]negative.

Alive2: https://alive2.llvm.org/ce/z/eorFfa
Motivation: fix [a major
regression](https://github.com/dtcxzyw/llvm-opt-benchmark/pull/1724#discussion_r1849663863)
to unblock https://github.com/llvm/llvm-project/pull/112742.

a6fefc82

[ORC-RT] Test basic C++ static initialization support in the ORC runtime. · 7c078636

Lang Hames authored Nov 21, 2024

This tests that a simple C++ static initializer works as expected.

Compared to the architecture specific, assembly level regression tests for the
ORC runtime; this test is expected to catch cases where the compiler adopts
some new MachO feature that the ORC runtime does not yet support (e.g. a new
initializer section).

7c078636

[clang][bytecode] Fix ToType/FromType diagnostic ordering (#116988) · 476b208e

Timm Baeder authored Nov 21, 2024

We need to check the ToType first, then the FromType. Additionally,
remove qualifiers from the parent type of the field we're emitting a
note for.

476b208e

[RISCV][GISel] Add atomic load/store test. Add additional atomic load/store isel patterns." · e9c561e9
Craig Topper authored Nov 20, 2024

e9c561e9

[mlir] [IR] Allow zero strides in StridedLayoutAttr (#116463) · dbe159b3

donald chen authored Nov 21, 2024

Disabling memrefs with a stride of 0 was intended to prevent internal
aliasing, but this does not address all cases : internal aliasing can
still occur when the stride is less than the shape.

On the other hand, a stride of 0 can be very useful in certain
scenarios. For example, in architectures that support multi-dimensional
DMA, we can use memref::copy with a stride of 0 to achieve a broadcast
effect.

This commit removes the restriction that strides in memrefs cannot be 0.

dbe159b3

[ControlHeightReduction] Add assert to avoid underflow (#116339) · 42775a44
Wu Yingcong authored Nov 21, 2024
```
`NumCHRedBranches - 1` is used later, we should add an assertion to make
sure it will not underflow.
```
42775a44

[mlir][vector] Fix 0-d vector transfer mask inference (#116526) · 32913724

Diego Caballero authored Nov 20, 2024

When inferring the mask of a transfer operation that results in a single `i1` element,
we could represent it using either `vector<i1>` or vector<1xi1>. To avoid type mismatches,
this PR updates the mask inference logic to consistently generate `vector<1xi1>` for
these cases. We can enable 0-D masks if they are needed in the future.

See: https://github.com/llvm/llvm-project/issues/116197

32913724

[AArch64][NFC] NFC for const vector as Instruction operand (#116790) · 197fb270

Sushant Gokhale authored Nov 21, 2024

Current cost-modelling does not take into account cost of materializing
const vector. This results in some cases, as the test shows, being
vectorized but this may not always be profitable. Future patch will try
to address this issue.

197fb270

Revert "[TargetVersion] Only enable on RISC-V and AArch64" (#117110) · c4be13cb
Piyou Chen authored Nov 21, 2024
```
Reverts llvm/llvm-project#115991

Due to build fail
https://lab.llvm.org/buildbot/#/builders/66/builds/6511
```
c4be13cb