- Aug 09, 2018
-
-
Reid Kleckner authored
The inalloca parameter has to be the only parameter passed in memory. Changing the convention to fastcc can break that. At some point we should teach global opt how to optimize ABI attributes like inalloca and maybe byval. These attributes are mainly used to match C ABIs. They are harder for LLVM to optimize and they don't always generate the best code. Fixes PR38487 llvm-svn: 339360
-
Sanjay Patel authored
Similar to rL337966 - if the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. AArch only goes right, and PPC only goes left. x86 has both, so no diffs there. Differential Revision: https://reviews.llvm.org/D50091 llvm-svn: 339359
-
Michael Berg authored
Summary: This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation Reviewers: spatel, wristow, arsenm Reviewed By: spatel Subscribers: wdng Differential Revision: https://reviews.llvm.org/D50195 llvm-svn: 339357
-
Evandro Menezes authored
Enable `FeatureZCZeroing`, `FeatureHasSlowFPVMLx`, `FeatureExpandMLx`, `FeatureProfUnpredicate`, `FeatureSlowVDUP32`, `FeatureSlowVGETLNi32`, `FeatureSplatVFPToNeon`, `FeatureHasRetAddrStack`, `FeatureSlowFPBrcc` for all Exynos processors. llvm-svn: 339356
-
Evandro Menezes authored
Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a processor check. Otherwise, NFC. llvm-svn: 339354
-
Andrea Di Biagio authored
This patch introduces tablegen class MCStatement. Currently, an MCStatement can be either a return statement, or a switch statement. ``` MCStatement: MCReturnStatement MCOpcodeSwitchStatement ``` A MCReturnStatement expands to a return statement, and the boolean expression associated with the return statement is described by a MCInstPredicate. An MCOpcodeSwitchStatement is a switch statement where the condition is a check on the machine opcode. It allows the definition of multiple checks, as well as a default case. More details on the grammar implemented by these two new constructs can be found in the diff for TargetInstrPredicates.td. This patch makes it easier to read the body of auto-generated TargetInstrInfo predicates. In future, I plan to reuse/extend the MCStatement grammar to describe more complex target hooks. For now, this is just a first step (mostly a minor cosmetic change to polish the new predicates framework). Differential Revision: https://reviews.llvm.org/D50457 llvm-svn: 339352
-
Bjorn Pettersson authored
Summary: The interface to get size and spill size of a register was moved from MCRegisterInfo to TargetRegisterInfo over a year ago. Afaik the old interface has bee around to give out-of-tree targets a chance to adapt to the new interface. One problem with the old MCRegisterClass::PhysRegSize was that it represented the size of a register as "size in bits" / 8. So a register had to be a multiple of eight bits wide for the size to be correct (and the byte size for the target needed to be eight bits). Reviewers: kparzysz, qcolombet Reviewed By: kparzysz Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47199 llvm-svn: 339350
-
Sanjay Patel authored
llvm-svn: 339349
-
Simon Pilgrim authored
As requested in D50392, pull the magic constant calculations out into a helper function. llvm-svn: 339346
-
Sjoerd Meijer authored
Differential Revision: https://reviews.llvm.org/D50454 llvm-svn: 339340
-
Simon Pilgrim authored
Exposed by D50328 Differential Revision: https://reviews.llvm.org/D50328 llvm-svn: 339337
-
Simon Pilgrim authored
As discussed on D41794, we have many cases where we fail to combine shuffles as the input operands have other uses. This patch permits these shuffles to be combined as long as they don't introduce additional variable shuffle masks, which should reduce instruction dependencies and allow the total number of shuffles to still drop without increasing the constant pool. However, this may mean that some memory folds may no longer occur, and on pre-AVX require the occasional extra register move. This also exposes some poor PMULDQ/PMULUDQ codegen which was doing unnecessary upper/lower calculations which will in fact fold to zero/undef - the fix will be added in a followup commit. Differential Revision: https://reviews.llvm.org/D50328 llvm-svn: 339335
-
Andrew V. Tischenko authored
Differential Revision: https://reviews.llvm.org/D49861 llvm-svn: 339321
-
Jonas Hahnfeld authored
According to PTX ISA .volatile has the same memory synchronization semantics as .relaxed.sys, so it can be used to implement monotonic atomic loads and stores. This is important for OpenMP's atomic construct where - 'read's and 'write's are lowered to atomic loads and stores, and - an update of float or double types are lowered into a cmpxchg loop. (Note that PTX could do better because it has atom.add.f{32,64} but LLVM's atomicrmw instruction only allows integer types.) Higher levels of atomicity (like acquire and release) need additional synchronization properties which were added with PTX ISA 6.0 / sm_70. So using these instructions still results in an error. Differential Revision: https://reviews.llvm.org/D50391 llvm-svn: 339316
-
Roger Ferrer Ibanez authored
This pseudo-instruction is similar to la but uses PC-relative addressing unconditionally. This is, la is only different to lla when using -fPIC. This pseudo-instruction seems often forgotten in several specs but it is definitely mentioned in binutils opcodes/riscv-opc.c. The semantics are defined both in page 37 of the "RISC-V Reader" book but also in function macro found in gas/config/tc-riscv.c. This is a very first step towards adding PIC support for Linux in the RISC-V backend. The lla pseudo-instruction expands to a sequence of auipc + addi with a couple of pc-rel relocations where the second points to the first one. This is described in https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#pc-relative-symbol-addresses For now, this patch only introduces support of that pseudo instruction at the assembler parser. Differential Revision: https://reviews.llvm.org/D49661 llvm-svn: 339314
-
JF Bastien authored
Summary: DenseMap's operator[] performs an insertion if the entry isn't found. The second phase of ConstantMerge isn't trying to insert anything: it's just looking to see if the first phased performed an insertion. Use find instead, avoiding insertion of every single global initializer in the map of constants. This has the side-effect of making all entries in CMap non-null (because only global declarations would have null initializers, and that would be a bug). Subscribers: dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D50476 llvm-svn: 339309
-
Philip Reames authored
llvm-svn: 339308
-
Paul Robinson authored
Differential Revision: https://reviews.llvm.org/D50466 llvm-svn: 339302
-
Sanjay Patel authored
isNegatibleForFree() should not matter here (as the test diffs show) because it's always a win to replace an fsub+fadd with fneg. The problem in D50195 persists because either (1) we are doing these folds in the wrong order or (2) we're missing another fold for fadd. llvm-svn: 339299
-
Sanjay Patel authored
I don't know if it's possible to expose this diff in a test, but we should always try simplifications (no new nodes created) before more complicated transforms for efficiency (similar to what we do in IR). llvm-svn: 339298
-
Petr Hosek authored
LLVM triple normalization is handling "unknown" and empty components differently; for example given "x86_64-unknown-linux-gnu" and "x86_64-linux-gnu" which should be equivalent, triple normalization returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's config.sub returns "x86_64-unknown-linux-gnu" for both "x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the triple normalization to behave the same way, replacing empty triple components with "unknown". This addresses PR37129. Differential Revision: https://reviews.llvm.org/D50219 llvm-svn: 339294
-
- Aug 08, 2018
-
-
Jonas Devlieghere authored
On Darwin we pin the DWARF line tables to version 2. Stop doing so for DWARF v5 and later. Differential revision: https://reviews.llvm.org/D49381 llvm-svn: 339288
-
Eli Friedman authored
Normally, if any registers are spilled, we prefer to spill lr on Thumb1 so we can fold the "bx lr" into the "pop". However, if there are tail calls involved, restoring lr is expensive, so skip the optimization in that case. The spill of r7 in the new test also isn't necessary, but that's mostly orthogonal to this patch. (It's the same code in ARMFrameLowering, but it's not related to tail calls.) Differential Revision: https://reviews.llvm.org/D49459 llvm-svn: 339283
-
Zachary Turner authored
Template manglings use a fresh back-referencing context, so we need to do the same. This fixes several existing tests which are marked as FIXME, so those are now actually run. llvm-svn: 339275
-
Ties Stuij authored
llvm-svn: 339274
-
Krzysztof Parzyszek authored
Differential Revision: https://reviews.llvm.org/D50405 llvm-svn: 339272
-
Matt Arsenault authored
I think this is the only situation where the callsite will have a null instruction. llvm-svn: 339271
-
Matt Arsenault authored
llvm-svn: 339270
-
Jonas Devlieghere authored
When reading a custom WASM section, it was possible that its name extended beyond the size of the section. This resulted in a bogus value for the section size due to the size overflowing. Fixes heap buffer overflow detected by OSS-fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=8190 Differential revision: https://reviews.llvm.org/D50387 llvm-svn: 339269
-
Jonas Devlieghere authored
When using APPLE extensions, don't duplicate the compiler invocation's flags both in AT_producer and AT_APPLE_flags. Differential revision: https://reviews.llvm.org/D50453 llvm-svn: 339268
-
Sanjay Patel authored
This is a sibling to the simplify from: https://reviews.llvm.org/rL339174 llvm-svn: 339267
-
Sanjay Patel authored
This is a sibling to the simplify from: rL339171 llvm-svn: 339266
-
Simon Pilgrim authored
The isConstOrConstSplat result is only used in a ISD::matchUnaryPredicate call which can perform the equivalent iteration just as quickly. llvm-svn: 339262
-
Zaara Syeda authored
This patch aims to improve the codegen for vector loads involving the scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X) to utilize: LXSD and LXSDX for i64 and f64 LXSIWAX for i32 (sign extension to i64) LXSIWZX for i32 and f64 Committing on behalf of Amy Kwan. Differential Revision: https://reviews.llvm.org/D48950 llvm-svn: 339260
-
Ties Stuij authored
Summary: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: ``` extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } ``` Compiled for thumb, this gives: ``` $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... ``` r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: ``` repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ ``` Reviewers: eli.friedman, olista01, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49727 llvm-svn: 339257
-
Alex Bradbury authored
Further improve compatibility with the GNU assembler. Differential Revision: https://reviews.llvm.org/D50217 Patch by Kito Cheng. llvm-svn: 339255
-
Simon Pilgrim authored
Provide a pass-through of the numerator for divide by one cases - this is the same approach we take in DAGCombiner::visitSDIVLike. I investigated whether we could achieve this by magic MULHU/SRL values but nothing appeared to work as we don't have a way for MULHU(x,c) -> x llvm-svn: 339254
-
Alex Bradbury authored
[RISCV] Add InstAlias definitions for add[w], and, xor, or, sll[w], srl[w], sra[w], slt and sltu with immediate Match the GNU assembler in supporting immediate operands for these instructions even when the reg-reg mnemonic is used. Differential Revision: https://reviews.llvm.org/D50046 Patch by Kito Cheng. llvm-svn: 339252
-
Sanjay Patel authored
This accounts for the missing IR fold noted in D50195. We don't need any fast-math to enable the negation transform. FP negation can always be folded into an fmul/fdiv constant to eliminate the fneg. I've limited this to one-use to ensure that we are eliminating an instruction rather than replacing fneg by a potentially expensive fdiv or fmul. Differential Revision: https://reviews.llvm.org/D50417 llvm-svn: 339248
-
Simon Pilgrim authored
As requested in D50392, this is a minor refactor to BuildExactSDIV to stop taking the uniform constant APInt divisor and instead extract it locally. I also cleanup the operands and valuetypes to better match BuildUDiv (and BuildSDIV in the near future). llvm-svn: 339246
-