- Dec 03, 2020
-
-
Jianzhou Zhao authored
This is a child diff of D92261. After supporting field/index-level shadow, the existing shadow with type i16 works for only primitive types. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92459
-
Craig Topper authored
-
QingShan Zhang authored
PowerPC ISA support the input test for vector type v4f32 and v2f64. Replace the software compare with hw test will improve the perf. Reviewed By: ChenZheng Differential Revision: https://reviews.llvm.org/D90914
-
Kazu Hirata authored
-
Craig Topper authored
If its not in the PassRegistry it's not recognized as a pass when we print before/after. Happened to notice while I was working on a new pass.
-
Hsiangkai Wang authored
Support "Zfh" extension according to https://github.com/riscv/riscv-isa-manual/blob/zfh/src/zfh.tex Differential Revision: https://reviews.llvm.org/D90738
-
Xun Li authored
While I was adding a new intrinsic instruction (not overloaded), I accidentally used CreateUnaryIntrinsic to create the intrinsics, which turns out to be passing the type list to getName, and ended up naming the intrinsics function with type suffix, which leads to wierd bugs latter on. It took me a long time to debug. It seems a good idea to add an assertion in getName so that it fails if types are passed but it's not a overloaded function. Also, the overloade version of getName is less efficient because it creates an std::string. We should avoid calling it if we know that there are no types provided. Differential Revision: https://reviews.llvm.org/D92523
-
Mircea Trofin authored
Typing the API appropriately. Differential Revision: https://reviews.llvm.org/D92341
-
- Dec 02, 2020
-
-
Florian Hahn authored
This should fix a build failure on some systems, e.g. solaris11-sparcv9 http://lab.llvm.org:8014/#/builders/22
-
Harald van Dijk authored
LLVM has TLS_(base_)addr32 for 32-bit TLS addresses in 32-bit mode, and TLS_(base_)addr64 for 64-bit TLS addresses in 64-bit mode. x32 mode wants 32-bit TLS addresses in 64-bit mode, which were not yet handled. This adds TLS_(base_)addrX32 as copies of TLS_(base_)addr64, except that they use tls32(base)addr rather than tls64(base)addr, and then restricts TLS_(base_)addr64 to 64-bit LP64 mode, TLS_(base_)addrX32 to 64-bit ILP32 mode. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92346
-
H.J. Lu authored
Since x32 supports PC-relative address, it shouldn't use EBX for TLS address. Instead of checking N.getValueType(), we should check Subtarget->is32Bit(). This fixes PR 22676. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D16474
-
Hongtao Yu authored
An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work. One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution. With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst. To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use. Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`. Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91756
-
Jianzhou Zhao authored
At D92261, this type will be used to cache both combined shadow and converted shadow values. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92458
-
jasonliu authored
Summary: Not all system assembler supports `.uleb128 label2 - label1` form. When the target do not support this form, we have to take alternative manual calculation to get the offsets from them. Reviewed By: hubert.reinterpretcast Diffierential Revision: https://reviews.llvm.org/D92058
-
Nick Desaulniers authored
It's common for code that manipulates the stack via inline assembly or that has to set up its own stack canary (such as the Linux kernel) would like to avoid stack protectors in certain functions. In this case, we've been bitten by numerous bugs where a callee with a stack protector is inlined into an attribute((no_stack_protector)) caller, which generally breaks the caller's assumptions about not having a stack protector. LTO exacerbates the issue. While developers can avoid this by putting all no_stack_protector functions in one translation unit together and compiling those with -fno-stack-protector, it's generally not very ergonomic or as ergonomic as a function attribute, and still doesn't work for LTO. See also: https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/ https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u SSP attributes can be ordered by strength. Weakest to strongest, they are: ssp, sspstrong, sspreq. Callees with differing SSP attributes may be inlined into each other, and the strongest attribute will be applied to the caller. (No change) After this change: * A callee with no SSP attributes will no longer be inlined into a caller with SSP attributes. * The reverse is also true: a callee with an SSP attribute will not be inlined into a caller with no SSP attributes. * The alwaysinline attribute overrides these rules. Functions that get synthesized by the compiler may not get inlined as a result if they are not created with the same stack protector function attribute as their callers. Alternative approach to https://reviews.llvm.org/D87956 . Fixes pr/47479. Signed-off-by:
Nick Desaulniers <ndesaulniers@google.com> Reviewed By: rnk, MaskRay Differential Revision: https://reviews.llvm.org/D91816
-
jasonliu authored
Summary: AIX uses the existing EH infrastructure in clang and llvm. The major differences would be 1. AIX do not have CFI instructions. 2. AIX uses a new personality routine, named __xlcxx_personality_v1. It doesn't use the GCC personality rountine, because the interoperability is not there yet on AIX. 3. AIX do not use eh_frame sections. Instead, it would use a eh_info section (compat unwind section) to store the information about personality routine and LSDA data address. Reviewed By: daltenty, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D91455
-
Sanjay Patel authored
https://llvm.org/PR48362 It's possible that we could stub this out sooner somewhere within JumpThreading, but I'm not sure how to do that, and then we would still have potential danger in other callers. I can't find a way to trigger this using 'instsimplify', however, because that already has a bailout on unreachable blocks.
-
Simon Pilgrim authored
Its unlikely an undef element in a zero vector will be any use.
-
Simon Pilgrim authored
Its unlikely an undef element in a zero vector will be any use, and SimplifyDemandedVectorElts now calls combineX86ShufflesRecursively so its unlikely we actually have a dependency on these specific elements.
-
Simon Pilgrim authored
-
Michael Liao authored
-
Bardia Mahjour authored
This is yet another attempt at providing support for epilogue vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none and reviews D30247 and D88819. Similar to D88819, this patch achieve epilogue vectorization by executing a single vplan twice: once on the main loop and a second time on the epilogue loop (using a different VF). However it's able to handle more loops, and generates more optimal control flow for cases where the trip count is too small to execute any code in vector form. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D89566
-
Sanjay Patel authored
This might be a small improvement in readability, but the real motivation is to make it easier to adapt the code to deal with intrinsics like 'maxnum' and/or integer min/max. There is potentially help in doing that with D92086, but we might also just add specialized wrappers here to deal with the expected patterns.
-
Alex Zinenko authored
OpenMPIRBuilder::createParallel outlines the body region of the parallel construct into a new function that accepts any value previously defined outside the region as a function argument. This function is called back by OpenMP runtime function __kmpc_fork_call, which expects trailing arguments to be pointers. If the region uses a value that is not of a pointer type, e.g. a struct, the produced code would be invalid. In such cases, make createParallel emit IR that stores the value on stack and pass the pointer to the outlined function instead. The outlined function then loads the value back and uses as normal. Reviewed By: jdoerfert, llitchev Differential Revision: https://reviews.llvm.org/D92189
-
Hans Wennborg authored
When importing symbols from another module, also import any corresponding symver directives. Differential revision: https://reviews.llvm.org/D92335
-
Hans Wennborg authored
This also removes the empty extra "module asm" that would be created, and updates the test to reflect that while making it more explicit. Broken out from https://reviews.llvm.org/D92335
-
Kazushi (Jam) Marukawa authored
Add vand, vor, and vxor intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92454
-
Anirudh Prasad authored
This patch consists of the addition of some common additional extended mnemonics to the SystemZ target. - These are jnop, jct, jctg, jas, jasl, jxh, jxhg, jxle, jxleg, bru, brul, br*, br*l. - These mnemonics and the instructions they map to are defined here, Chapter 4 - Branching with extended mnemonic codes. - Except for jnop (which is a variant of brc 0, label), every other mnemonic is marked as a MnemonicAlias since there is already a "defined" instruction with the same encoding and/or condition mask values. - brc 0, label doesn't have a defined extended mnemonic, thus jnop is defined using as an InstAlias. Furthermore, the applyMnemonicAliases function is called in the overridden parseInstruction function in SystemZAsmParser.cpp to ensure any mnemonic aliases are applied before any further processing on the instruction is done. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D92185
-
David Sherwood authored
In this patch I have added support for a new loop hint called vectorize.scalable.enable that says whether we should enable scalable vectorization or not. If a user wants to instruct the compiler to vectorize a loop with scalable vectors they can now do this as follows: br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2 ... !2 = !{!2, !3, !4} !3 = !{!"llvm.loop.vectorize.width", i32 8} !4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true} Setting the hint to false simply reverts the behaviour back to the default, using fixed width vectors. Differential Revision: https://reviews.llvm.org/D88962
-
Georgii Rymar authored
This implementation of `ELFDumper<ELFT>::printAttributes()` in llvm-readobj has issues: 1) It crashes when the content of the attribute section is empty. 2) It uses `unwrapOrError` and `reportWarning` calls, though ideally we want to use `reportUniqueWarning`. 3) It contains a TODO about redundant format version check. `lib/Support/ELFAttributeParser.cpp` uses a hardcoded constant instead of the named constant. This patch fixes all these issues. Differential revision: https://reviews.llvm.org/D92318
-
Jay Foad authored
This doesn't seem to be needed for anything. Differential Revision: https://reviews.llvm.org/D92400
-
Qiu Chaofan authored
In lowering of FLT_ROUNDS_, FPSCR content will be moved into FP register and then GPR, and then truncated into word. For subtargets without direct move support, it will store and then load. The load address needs adjustment (+4) only on big-endian targets. This patch fixes it on using generic opcodes on little-endian and subtargets with direct-move. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D91845
-
QingShan Zhang authored
i1 is the native type for PowerPC if crbits is enabled. However, we need to promote the i1 to i64 as we didn't have the pattern for i1. Reviewed By: Qiu Chao Fang Differential Revision: https://reviews.llvm.org/D92067
-
Chen Zheng authored
Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D92159
-
Heejin Ahn authored
This adds missing `select` instruction support and block return type support for reference types. Also refactors WebAssemblyInstrRef.td and rearranges tests in reference-types.s. Tests don't include `exnref` types, because we currently don't support `exnref` for `ref.null` and the type will be removed soon anyway. Reviewed By: tlively, sbc100, wingo Differential Revision: https://reviews.llvm.org/D92359
-
Arthur O'Dwyer authored
The static_assert in "libcxx/include/memory" was the main offender here, but then I figured I might as well `git grep -i instantat` and fix all the instances I found. One was in user-facing HTML documentation; the rest were in comments or tests.
-
Chen Zheng authored
Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92070
-
Kazushi (Jam) Marukawa authored
Add vcmp, vmax, and vmin intrinsic instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D92387
-
Jianzhou Zhao authored
Reviewed-by: eugenis Differential Revision: https://reviews.llvm.org/D92275
-
Jessica Paquette authored
We are avoiding writing to WZR just about everywhere else. Also update the code to use MachineIRBuilder for the sake of consistency. We also didn't have a GlobalISel testcase for this path, so add a simple one now. Differential Revision: https://reviews.llvm.org/D90626
-