Commits · 5fe10263ab39be96e316f37272b85a72596a7928 · Lorenzo Albano / LLVM bpEVL

Nov 30, 2020

[llvm][inliner] Reuse the inliner pass to implement 'always inliner' · 5fe10263

Mircea Trofin authored 4 years ago

Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.

Note that this patch does not yet enable much in terms of post-always
inline function optimization.

Differential Revision: https://reviews.llvm.org/D91567

5fe10263

[DL] Inline getAlignmentInfo() implementation (NFC) · b5f23189

Nikita Popov authored 4 years ago

Apart from getting the entry in the table (which is already a
separate function), the remaining logic is different for all
alignment types and is better combined with getAlignment().

This is a minor efficiency improvement, and should make further
improvements like using separate storage for different alignment
types simpler.

b5f23189

Creating a named struct requires only a Context and a name, but looking up a... · fe431683

Nick Lewycky authored 4 years ago

Creating a named struct requires only a Context and a name, but looking up a struct by name requires a Module. The method on Module merely accesses the LLVMContextImpl and no data from the module itself, so this patch moves getTypeByName to a static method on StructType that takes a Context and a name.

There's a small number of users of this function, they are all updated.

This updates the C API adding a new method LLVMGetTypeByName2 that takes a context and a name.

Differential Revision: https://reviews.llvm.org/D78793

fe431683

[ms] [llvm-ml] Implement the statement expansion operator · abef659a

Eric Astor authored 4 years ago

If prefaced with a %, expand text macros and macro functions in any statement.

Also, prevent expanding text macros in the message of an ECHO directive unless expanded explicitly by the statement expansion operator.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89740

abef659a

[x86] add tests for maxnum/minnum with nnan; NFC · 40dc535b
Sanjay Patel authored 4 years ago

40dc535b

[AArch64] Enable Cortex-A55 schedmodel · 630d37dc

Sjoerd Meijer authored 4 years ago

The model was committed in 4b8ade83
but not yet enabled to allow for a few fix ups. This adds a few
of these fixes, and also a LLVM MCA test to check most instructions.
While I do have plans to look into some more tuning, it's time to
enable this as it better than using the A53 schedule.

Differential Revision: https://reviews.llvm.org/D88017

630d37dc

[CSSPGO] Disabling a pseudo probe test on non-x86 platforms. · 750049d7
Hongtao Yu authored 4 years ago
```
Disabling a pseudo probe test on non-x86 platforms since it's not fully tested there.
```
750049d7
[FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option · a474657e
Paul Robinson authored 4 years ago
```
This option is not used for anything after #dc35368c (D91734).
```
a474657e

[X86] Zero-extend pointers to i64 for x86_64 · cdac34bd

Harald van Dijk authored 4 years ago

For LP64 mode, this has no effect as pointers are already 64 bits.
For ILP32 mode (x32), this extension is specified by the ABI.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D91338

cdac34bd

[InstCombine][X86] Add basic addsub intrinsic SimplifyDemandedVectorElts support (PR46277) · e425d0b9

Simon Pilgrim authored 4 years ago

Pass through the demanded elts mask to the source operands.

The next step will be to add support for folding to add/sub if we only demand odd/even elements.

e425d0b9

[gn build] Port 64fa8cce · a4064cbf
LLVM GN Syncbot authored 4 years ago

a4064cbf

[CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation. · c083fede

Hongtao Yu authored 4 years ago

This change introduces a new clang switch `-fpseudo-probe-for-profiling` to enable AutoFDO with pseudo instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

One implication from pseudo-probe instrumentation is that the profile is now sensitive to CFG changes. We perform the pseudo instrumentation very early in the pre-LTO pipeline, before any CFG transformation. This ensures that the CFG instrumented and annotated is stable and optimization-resilient.

The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86502

c083fede

[CSSPGO] Pseudo probe instrumentation pass · 64fa8cce

Hongtao Yu authored 4 years ago

This change introduces a pseudo probe instrumentation pass for block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each llvm.pseudoprobe intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86499

64fa8cce

[PowerPC] Delete remnant Darwin code in PPCAsmParser · 7c4555f6

Fangrui Song authored 4 years ago

Continue the work started at D50989.
The code has been long dead since the triple has been removed (D75494).

Reviewed By: nickdesaulniers, void

Differential Revision: https://reviews.llvm.org/D91836

7c4555f6

[InstCombine][X86] Add addsub PR46277 test case · 8ca484b9
Simon Pilgrim authored 4 years ago
```
Also fix a copy+paste typo in the elts_addsub_v4f32 demanded elts test from the godbolt reference
```
8ca484b9
[VE][NFC] Update comments · 3d872cbc
Kazushi (Jam) Marukawa authored 4 years ago
```
Update comments.  I forgot to update it previously when I modified code.
```
3d872cbc

[SelectionDAGBuilder] Update signature of `getRegsAndSizes()`. · f6150aa4

Francesco Petrogalli authored 4 years ago

The mapping between registers and relative size has been updated to
use TypeSize to account for the size of scalable EVTs.

The patch is a NFCI, if not for the fact that with this change the
function `getUnderlyingArgRegs` does not raise a warning for implicit
conversion of `TypeSize` to `unsigned` when generating machine code
from the test added to the patch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D92096

f6150aa4

[VE] Optimize prologue/epilogue instructions about GOT · 6834b3d6

Kazushi (Jam) Marukawa authored 4 years ago

Optimize prologue/epilogue instructions if a given function use GOT but
do not call other functions by eliminating FP.  Previously, we had wrong
implementations taken from other architectures.  Update regression tests
also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D92313

6834b3d6

[VE] Clean check routines of branch types · 6fe61053

Kazushi (Jam) Marukawa authored 4 years ago

Previously, these check routines accepted non-generatble instructions.
This time, I clean them and add assert for those non-generatable
instructions.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D92254

6fe61053

[RISCV] Combine (GORCI (GORCI x, C2), C1) -> (GORCI x, C1|C2). · bfc4f29f

Craig Topper authored 4 years ago

Unlike GREVI, GORCI stages can't be undone, but they are
redundant if done more than once.

Differential Revision: https://reviews.llvm.org/D92295

bfc4f29f

[IR][LoopRotate] remove assertion that phi must have at least one operand · 9eb2c011

Sanjay Patel authored 4 years ago

This was suggested in D92247 - I initially committed an alternate
fix ( bfd2c216 ) to avoid the crash/assert shown in
https://llvm.org/PR48296 ,
but that was reverted because it caused msan failures on other
tests. We can try to revive that patch using the test included
here, but I do not have an immediate plan to isolate that problem.

9eb2c011

[RISCV] Custom legalize bswap/bitreverse to GREVI with Zbp extension to enable... · 76d1026b

Craig Topper authored 4 years ago

[RISCV] Custom legalize bswap/bitreverse to GREVI with Zbp extension to enable them to combine with other GREVI instructions

This enables bswap/bitreverse to combine with other GREVI patterns or each other without needing to add more special cases to the DAG combine or new DAG combines.

I've also enabled the existing GREVI combine for GREVIW so that it can pick up the i32 bswap/bitreverse on RV64 after they've been type legalized to GREVIW.

Differential Revision: https://reviews.llvm.org/D92253

76d1026b

[X86] Don't emit R_X86_64_[REX_]GOTPCRELX for a GOT load with an offset · 25c8fbb3

Fangrui Song authored 4 years ago

clang may produce `movl x@GOTPCREL+4(%rip), %eax` when loading the high
32 bits of the address of a global variable in -fpic/-fpie mode.

If assembled by GNU as, the fixup emits R_X86_64_GOTPCRELX with an addend != -4.
The instruction loads from the GOT entry with an offset and thus it is incorrect
to relax the instruction.

This patch does not emit a relaxable relocation for a GOT load with an offset
because R_X86_64_[REX_]GOTPCRELX do not make sense for instructions which cannot
be relaxed. The result is good enough for LLD to work. GNU ld relaxes
mov+GOTPCREL as well, but it suppresses the relaxation if addend != -4.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D92114

25c8fbb3

[RISCV] Only combine (or (GREVI x, shamt), x) -> GORCI if shamt is a power of 2. · cbbd7021

Craig Topper authored 4 years ago

GORCI performs an OR between each stage. So we need to ensure only
one stage is active before doing this combine.

Initial attempts at finding a test case for this failed due to
the order things get combined. It's most likely that we'll form
one stage of GREVI then combine to GORCI before the two stages of
GREVI are able to be formed and combined with each other to form
a multi stage GREVI.

Differential Revision: https://reviews.llvm.org/D92289

cbbd7021

[ConstraintElimination] Add additional GEP decomposition tests. · 4db1de3a
Florian Hahn authored 4 years ago

4db1de3a
[X86] Add vbmi2 test coverage for vector rotations · 8fcc8c31
Simon Pilgrim authored 4 years ago
```
We should be using the funnel shift instructions for vXi16 types.
```
8fcc8c31
[IR] improve code comment/logic in removePredecessor(); NFC · 1dc38f8c
Sanjay Patel authored 4 years ago
```
This was suggested in the post-commit review of ce134da4.
```
1dc38f8c

Revert "[IR][LoopRotate] avoid leaving phi with no operands (PR48296)" · 355aee3d

Sanjay Patel authored 4 years ago

This reverts commit bfd2c216.
This appears to be causing stage2 msan failures on buildbots:
  FAIL: LLVM :: Transforms/SimplifyCFG/X86/bug-25299.ll (65872 of 71835)
  ******************** TEST 'LLVM :: Transforms/SimplifyCFG/X86/bug-25299.ll' FAILED ********************
  Script:
  --
  : 'RUN: at line 1';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SimplifyCFG/X86/bug-25299.ll -simplifycfg -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SimplifyCFG/X86/bug-25299.ll
  --
  Exit Code: 2
  Command Output (stderr):
  --
  ==87374==WARNING: MemorySanitizer: use-of-uninitialized-value
      #0 0x9de47b6 in getBasicBlockIndex /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/Instructions.h:2749:5
      #1 0x9de47b6 in simplifyCommonResume /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:4112:23
      #2 0x9de47b6 in simplifyResume /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:4039:12
      #3 0x9de47b6 in (anonymous namespace)::SimplifyCFGOpt::simplifyOnce(llvm::BasicBlock*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:6330:16
      #4 0x9dcca13 in run /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:6358:16
      #5 0x9dcca13 in llvm::simplifyCFG(llvm::BasicBlock*, llvm::TargetTransformInfo const&, llvm::SimplifyCFGOptions const&, llvm::SmallPtrSetImpl<llvm::BasicBlock*>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:6369:8
      #6 0x974643d in iterativelySimplifyCFG(

355aee3d

[IR][LoopRotate] avoid leaving phi with no operands (PR48296) · bfd2c216

Sanjay Patel authored 4 years ago

https://llvm.org/PR48296 shows an example where we delete all of the operands
of a phi without actually deleting the phi, and that is currently considered
invalid IR. The reduced test included here would crash for that reason.

A suggested follow-up is to loosen the assert to allow 0-operand phis
in unreachable blocks.

Differential Revision: https://reviews.llvm.org/D92247

bfd2c216

Add 'asserts' requiremnt to test/CodeGen/ARM/cortex-a57-misched-mla.mir · 234a5297
Dmitri Gribenko authored 4 years ago
```
'-debug-only=machine-scheduler' only works when asserts are enabled.
```
234a5297
[LangRef] missing link, minor fix · 8e504615
Juneyoung Lee authored 4 years ago

8e504615

[ConstantFold] Don't fold and/or i1 poison to poison (NFC) · 9c49dcc3

Juneyoung Lee authored 4 years ago

.. because it causes miscompilation when combined with select i1 -> and/or.

It is the select fold which is incorrect; but it is costly to disable the fold, so hack this one.

D92270

9c49dcc3

[llvm-objdump] Require x86 target for mcpu/attr test · c3d48467

David Spickett authored 4 years ago

This fixes test failure on clang-cmake-armv7-quick bot
with change c2ead57c.

This bot only builds Arm/AArch64 targets.

c3d48467

[InstCombine][X86] Add addsub tests showing failure to simplify demandedelts (PR46277) · 9c2b2952
Simon Pilgrim authored 4 years ago

9c2b2952
Try harder to get rid off cortex-a57-misched-mla.s · 25d54abc
Hans Wennborg authored 4 years ago

25d54abc

[VE] Optimize prologue/epilogue instructions · 686988a5

Kazushi (Jam) Marukawa authored 4 years ago

Optimize eliminate FP mechanism.  This time optimize a function which has
no call but fixed stack objects.  LLVM eliminates FP on such functions now.
Also, optimize GOT/PLT registers save/restore instructions if a given
function doesn't uses them.  In addition, remove generating mechanism of
`.cfi` instructions since those are taken from other architectures and not
inspected yet.  Update regression tests, also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D92251

686988a5

Try to fix bots after 112b3cb6 by removing cortex-a57-misched-mla.s · 273641fe
Hans Wennborg authored 4 years ago

273641fe

[VE] Change the behaviour of truncate · 44a679ea

Kazushi (Jam) Marukawa authored 4 years ago

Change the way to truncate i64 to i32 in I64 registers.  VE assumed
sext values previously.  Change it to zext values this time to make
it match to the LLVM behaviour.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D92226

44a679ea

[VPlan] Use VPUser to manage VPPredInstPHIRecipe operand (NFC). · fe83adb0

Florian Hahn authored 4 years ago

VPPredInstPHIRecipe is one of the recipes that was missed during the
initial conversion. This patch adjusts the recipe to also manage its
operand using VPUser.

fe83adb0

[VE] Specify vector alignments · 33eac0f2

Kazushi (Jam) Marukawa authored 4 years ago

Specify alignments for all vector types.  Update a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D92256

33eac0f2