Commits · 1d6ebdfb66b9d63d34f34ec6ac7ec57eff7cd24b · Lorenzo Albano / LLVM bpEVL

Dec 03, 2020

[dfsan] Rename ShadowTy/ZeroShadow with prefix Primitive · bd726d27

Jianzhou Zhao authored Dec 02, 2020

This is a child diff of D92261.

After supporting field/index-level shadow, the existing shadow with type
i16 works for only primitive types.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92459

bd726d27

[RISCV] Add f16 to isFMAFasterThanFMulAndFAdd now that the Zfh extension is supported · e52a91e1
Craig Topper authored Dec 02, 2020

e52a91e1

[PowerPC] Add the hw sqrt test for vector type v4f32/v2f64 · 9bf0fea3

QingShan Zhang authored Dec 03, 2020

PowerPC ISA support the input test for vector type v4f32 and v2f64.
Replace the software compare with hw test will improve the perf.

Reviewed By: ChenZheng

Differential Revision: https://reviews.llvm.org/D90914

9bf0fea3

[SelectionDAG] Use is_contained (NFC) · 7a4af2a8
Kazu Hirata authored Dec 02, 2020

7a4af2a8

[RISCV] Initialize MergeBaseOffsetOptPass so it will work with print-before/after-all. · 8b403243

Craig Topper authored Dec 02, 2020

If its not in the PassRegistry it's not recognized as
a pass when we print before/after. Happened to notice while
I was working on a new pass.

8b403243

[RISCV] Support Zfh half-precision floating-point extension. · f7bc7c29

Hsiangkai Wang authored Jul 03, 2020

Support "Zfh" extension according to
https://github.com/riscv/riscv-isa-manual/blob/zfh/src/zfh.tex

Differential Revision: https://reviews.llvm.org/D90738

f7bc7c29

Small improvements to Intrinsic::getName · 80b0f74c

Xun Li authored Dec 02, 2020

While I was adding a new intrinsic instruction (not overloaded), I accidentally used CreateUnaryIntrinsic to create the intrinsics, which turns out to be passing the type list to getName, and ended up naming the intrinsics function with type suffix, which leads to wierd bugs latter on. It took me a long time to debug.
It seems a good idea to add an assertion in getName so that it fails if types are passed but it's not a overloaded function.
Also, the overloade version of getName is less efficient because it creates an std::string. We should avoid calling it if we know that there are no types provided.

Differential Revision: https://reviews.llvm.org/D92523

80b0f74c

[NFC][MC] TargetRegisterInfo::getSubReg is a MCRegister. · bab72dd5
Mircea Trofin authored Nov 19, 2020
```
Typing the API appropriately.

Differential Revision: https://reviews.llvm.org/D92341
```
bab72dd5

Dec 02, 2020

[ConstraintElimination] Make sure arguments of std:pow match. · 2304528b
Florian Hahn authored Dec 02, 2020
```
This should fix a build failure on some systems, e.g. solaris11-sparcv9
http://lab.llvm.org:8014/#/builders/22
```
2304528b

[X86] Add TLS_(base_)addrX32 for X32 mode · c9be4ef1

Harald van Dijk authored Dec 02, 2020

LLVM has TLS_(base_)addr32 for 32-bit TLS addresses in 32-bit mode, and
TLS_(base_)addr64 for 64-bit TLS addresses in 64-bit mode. x32 mode wants 32-bit
TLS addresses in 64-bit mode, which were not yet handled. This adds
TLS_(base_)addrX32 as copies of TLS_(base_)addr64, except that they use
tls32(base)addr rather than tls64(base)addr, and then restricts
TLS_(base_)addr64 to 64-bit LP64 mode, TLS_(base_)addrX32 to 64-bit ILP32 mode.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92346

c9be4ef1

Use PC-relative address for x32 TLS address · 18ce6123

H.J. Lu authored Jun 03, 2014

Since x32 supports PC-relative address, it shouldn't use EBX for TLS
address.  Instead of checking N.getValueType(), we should check
Subtarget->is32Bit().  This fixes PR 22676.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D16474

18ce6123

[CSSPGO] Pseudo probes for function calls. · 24d4291c

Hongtao Yu authored Dec 01, 2020

An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.

One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.

With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.

To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.

Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.

Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91756

24d4291c

[dfsan] Rename CachedCombinedShadow to be CachedShadow · dad5d958

Jianzhou Zhao authored Dec 02, 2020

At D92261, this type will be used to cache both combined shadow and
converted shadow values.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92458

dad5d958

[XCOFF][AIX] Alternative path in EHStreamer for platforms do not have uleb128 support · 2c63e760

jasonliu authored Dec 02, 2020

Summary:
Not all system assembler supports `.uleb128 label2 - label1` form.
When the target do not support this form, we have to take
alternative manual calculation to get the offsets from them.

Reviewed By: hubert.reinterpretcast

Diffierential Revision: https://reviews.llvm.org/D92058

2c63e760

[Inline] prevent inlining on stack protector mismatch · bc044a88

Nick Desaulniers authored Dec 02, 2020

It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an attribute((no_stack_protector)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

SSP attributes can be ordered by strength. Weakest to strongest, they
are: ssp, sspstrong, sspreq.  Callees with differing SSP attributes may be
inlined into each other, and the strongest attribute will be applied to the
caller. (No change)

After this change:
* A callee with no SSP attributes will no longer be inlined into a
  caller with SSP attributes.
* The reverse is also true: a callee with an SSP attribute will not be
  inlined into a caller with no SSP attributes.
* The alwaysinline attribute overrides these rules.

Functions that get synthesized by the compiler may not get inlined as a
result if they are not created with the same stack protector function
attribute as their callers.

Alternative approach to https://reviews.llvm.org/D87956

.

Fixes pr/47479.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: rnk, MaskRay

Differential Revision: https://reviews.llvm.org/D91816

bc044a88

[XCOFF][AIX] Generate LSDA data and compact unwind section on AIX · a65d8c5d

jasonliu authored Dec 02, 2020

Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
   It doesn't use the GCC personality rountine, because the
   interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D91455

a65d8c5d

[JumpThreading][VectorUtils] avoid infinite loop on unreachable IR · 9d6d24c2

Sanjay Patel authored Dec 02, 2020

https://llvm.org/PR48362

It's possible that we could stub this out sooner somewhere
within JumpThreading, but I'm not sure how to do that, and
then we would still have potential danger in other callers.

I can't find a way to trigger this using 'instsimplify',
however, because that already has a bailout on unreachable
blocks.

9d6d24c2

[X86] EltsFromConsecutiveLoads - remove old FIXME comment. NFC. · f0193623
Simon Pilgrim authored Dec 02, 2020
```
Its unlikely an undef element in a zero vector will be any use.
```
f0193623

[X86] combineX86ShufflesRecursively - remove old FIXME comment. NFC. · 3900ec6f

Simon Pilgrim authored Dec 02, 2020

Its unlikely an undef element in a zero vector will be any use, and SimplifyDemandedVectorElts now calls combineX86ShufflesRecursively so its unlikely we actually have a dependency on these specific elements.

3900ec6f

[X86] EltsFromConsecutiveLoads - pull out repeated NumLoadedElts. NFCI. · 0dab7ecc
Simon Pilgrim authored Dec 02, 2020

0dab7ecc
Remove `-Wunused-result` and `-Wpedantic` warnings from GCC. NFC. · 21d74172
Michael Liao authored Dec 02, 2020

21d74172

[LV] Epilogue Vectorization with Optimal Control Flow (Recommit) · a7e2c269

Bardia Mahjour authored Dec 02, 2020

This is yet another attempt at providing support for epilogue
vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none
and reviews D30247 and D88819.

Similar to D88819, this patch achieve epilogue vectorization by
executing a single vplan twice: once on the main loop and a second
time on the epilogue loop (using a different VF). However it's able
to handle more loops, and generates more optimal control flow for
cases where the trip count is too small to execute any code in vector
form.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D89566

a7e2c269

[SLP] use 'match' for binop/select; NFC · 56fd29e9

Sanjay Patel authored Dec 02, 2020

This might be a small improvement in readability, but the
real motivation is to make it easier to adapt the code to
deal with intrinsics like 'maxnum' and/or integer min/max.

There is potentially help in doing that with D92086, but
we might also just add specialized wrappers here to deal
with the expected patterns.

56fd29e9

[OpenMPIRBuilder] forward arguments as pointers to outlined function · 240dd924

Alex Zinenko authored Nov 26, 2020

OpenMPIRBuilder::createParallel outlines the body region of the parallel
construct into a new function that accepts any value previously defined outside
the region as a function argument. This function is called back by OpenMP
runtime function __kmpc_fork_call, which expects trailing arguments to be
pointers. If the region uses a value that is not of a pointer type, e.g. a
struct, the produced code would be invalid. In such cases, make createParallel
emit IR that stores the value on stack and pass the pointer to the outlined
function instead. The outlined function then loads the value back and uses as
normal.

Reviewed By: jdoerfert, llitchev

Differential Revision: https://reviews.llvm.org/D92189

240dd924

[ThinLTO] Import symver directives for imported symbols (PR48214) · 437c4653

Hans Wennborg authored Nov 30, 2020

When importing symbols from another module, also import any
corresponding symver directives.

Differential revision: https://reviews.llvm.org/D92335

437c4653

Simplify append to module inline asm string in IRLinker::run() · 45d8a784

Hans Wennborg authored Dec 02, 2020

This also removes the empty extra "module asm" that would be created,
and updates the test to reflect that while making it more explicit.

Broken out from https://reviews.llvm.org/D92335

45d8a784

[VE] Add vand, vor, and vxor intrinsic instructions · dd0159bd

Kazushi (Jam) Marukawa authored Dec 02, 2020

Add vand, vor, and vxor intrinsic instructions and regression tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D92454

dd0159bd

[SystemZ] Adding extra extended mnemonics for SystemZ target · f03c21df

Anirudh Prasad authored Dec 02, 2020

This patch consists of the addition of some common additional
extended mnemonics to the SystemZ target.

- These are jnop, jct, jctg, jas, jasl, jxh, jxhg, jxle,
  jxleg, bru, brul, br*, br*l.
- These mnemonics and the instructions they map to are
  defined here, Chapter 4 - Branching with extended
  mnemonic codes.
- Except for jnop (which is a variant of brc 0, label), every
  other mnemonic is marked as a MnemonicAlias since there is
  already a "defined" instruction with the same encoding
  and/or condition mask values.
- brc 0, label doesn't have a defined extended mnemonic, thus
  jnop is defined using as an InstAlias. Furthermore, the
  applyMnemonicAliases function is called in the overridden
  parseInstruction function in SystemZAsmParser.cpp to ensure
  any mnemonic aliases are applied before any further
  processing on the instruction is done.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D92185

f03c21df

[SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute · 71bd59f0

David Sherwood authored Oct 07, 2020

In this patch I have added support for a new loop hint called
vectorize.scalable.enable that says whether we should enable scalable
vectorization or not. If a user wants to instruct the compiler to
vectorize a loop with scalable vectors they can now do this as
follows:

  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2
  ...
  !2 = !{!2, !3, !4}
  !3 = !{!"llvm.loop.vectorize.width", i32 8}
  !4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

Setting the hint to false simply reverts the behaviour back to the
default, using fixed width vectors.

Differential Revision: https://reviews.llvm.org/D88962

71bd59f0

[llvm-readobj, libSupport] - Refine the implementation of the code that dumps build attributes. · 137a25f0

Georgii Rymar authored Nov 30, 2020

This implementation of `ELFDumper<ELFT>::printAttributes()` in llvm-readobj has issues:
1) It crashes when the content of the attribute section is empty.
2) It uses `unwrapOrError` and `reportWarning` calls, though
   ideally we want to use `reportUniqueWarning`.
3) It contains a TODO about redundant format version check.

`lib/Support/ELFAttributeParser.cpp` uses a hardcoded constant instead of the named constant.

This patch fixes all these issues.

Differential revision: https://reviews.llvm.org/D92318

137a25f0

[AMDGPU] Stop adding an implicit def of vcc_hi for wave32 · d28624a2
Jay Foad authored Dec 01, 2020
```
This doesn't seem to be needed for anything.

Differential Revision: https://reviews.llvm.org/D92400
```
d28624a2

[PowerPC] Fix FLT_ROUNDS_ on little endian · ffa2dce5

Qiu Chaofan authored Dec 02, 2020

In lowering of FLT_ROUNDS_, FPSCR content will be moved into FP register
and then GPR, and then truncated into word.

For subtargets without direct move support, it will store and then load.
The load address needs adjustment (+4) only on big-endian targets. This
patch fixes it on using generic opcodes on little-endian and subtargets
with direct-move.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D91845

ffa2dce5

[PowerPC] Promote the i1 to i64 for SINT_TO_FP/FP_TO_SINT · 47f784ac

QingShan Zhang authored Dec 02, 2020

i1 is the native type for PowerPC if crbits is enabled. However, we need
to promote the i1 to i64 as we didn't have the pattern for i1.

Reviewed By: Qiu Chao Fang

Differential Revision: https://reviews.llvm.org/D92067

47f784ac

[LSR][NFC] don't collect chains when isNumRegsMajorCostOfLSR is false. · 3cb7d624
Chen Zheng authored Nov 26, 2020
```
Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D92159
```
3cb7d624

[WebAssembly] Support select and block for reference types · 60653e24

Heejin Ahn authored Nov 30, 2020

This adds missing `select` instruction support and block return type
support for reference types. Also refactors WebAssemblyInstrRef.td and
rearranges tests in reference-types.s. Tests don't include `exnref`
types, because we currently don't support `exnref` for `ref.null` and
the type will be removed soon anyway.

Reviewed By: tlively, sbc100, wingo

Differential Revision: https://reviews.llvm.org/D92359

60653e24

s/instantate/instantiate/ throughout. NFCI. · e181a6ae

Arthur O'Dwyer authored Dec 01, 2020

The static_assert in "libcxx/include/memory" was the main offender here,
but then I figured I might as well `git grep -i instantat` and fix all
the instances I found. One was in user-facing HTML documentation;
the rest were in comments or tests.

e181a6ae

[NFC][PowerPC] code refactor: split IsReassociable to fma and add. · 95d6042d
Chen Zheng authored Nov 24, 2020
```
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D92070
```
95d6042d

[VE] Add vcmp, vmax, and vmin intrinsic instructions · c1762bcf

Kazushi (Jam) Marukawa authored Dec 01, 2020

Add vcmp, vmax, and vmin intrinsic instructions and regression tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D92387

c1762bcf

[msan] Replace 8 by kShadowTLSAlignment · 405ea2b9
Jianzhou Zhao authored Nov 23, 2020
```
Reviewed-by: eugenis

Differential Revision: https://reviews.llvm.org/D92275
```
405ea2b9

[AArch64][GlobalISel] Don't write to WZR in non-flag-setting G_BRCOND case · c82f002c

Jessica Paquette authored Nov 02, 2020

We are avoiding writing to WZR just about everywhere else.

Also update the code to use MachineIRBuilder for the sake of consistency.

We also didn't have a GlobalISel testcase for this path, so add a simple one
now.

Differential Revision: https://reviews.llvm.org/D90626

c82f002c