- Jun 07, 2019
-
-
Jinsong Ji authored
When we call checkResourceLimit in bumpCycle or bumpNode, and we know the resource count has just reached the limit (the equations are equal). We should return true to mark that we are resource limited for next schedule, or else we might continue to schedule in favor of latency for 1 more schedule and create a schedule that actually overbook the resource. When we call checkResourceLimit to estimate the resource limite before scheduling, we don't need to return true even if the equations are equal, as it shouldn't limit the schedule for it . Differential Revision: https://reviews.llvm.org/D62345 llvm-svn: 362805
-
Stefan Stipanovic authored
llvm-svn: 362802
-
David Bolvansky authored
llvm-svn: 362801
-
Matt Arsenault authored
This could fail, which looked concerning. However nothing was actually using the results of this. I assume this was intended to use the anti-feature of analyzeBranch of removing instructions, but wasn't actually calling it with AllowModify = true. Fixes bug 42162. llvm-svn: 362800
-
Joerg Sonnenberger authored
llvm-svn: 362799
-
Nico Weber authored
lib.exe doesn't allow creating .lib files with object files that have differing machine types. Update llvm-lib to match. The motivation is to make it possible to infer the machine type of a .lib file in lld, so that it can warn when e.g. a 32-bit .lib file is passed to a 64-bit link (PR38965). Fixes PR38782. Differential Revision: https://reviews.llvm.org/D62913 llvm-svn: 362798
-
Sanjay Patel authored
This is a potentially large perf win for AVX1 targets because of the way we auto-vectorize to 256-bit but then expect the backend to legalize/optimize for the half-implemented AVX1 ISA. On the motivating example from PR37428 (even though this patch doesn't solve the vector shift issue): https://bugs.llvm.org/show_bug.cgi?id=37428 ...there's a 16% speedup when compiling with "-mavx" (perf tested on Haswell) because we eliminate the remaining 256-bit vblendv ops. I added comments on a couple of tests that require further work. If we have 256-bit logic ops separating the vselect and extract, we should probably narrow everything to 128-bit, but that requires a larger pattern match. Differential Revision: https://reviews.llvm.org/D62969 llvm-svn: 362797
-
Nico Weber authored
llvm-svn: 362796
-
Nico Weber authored
llvm-svn: 362795
-
Nico Weber authored
llvm-svn: 362794
-
Simon Tatham authored
Change D60691 caused some knock-on failures that weren't caught by the existing tests. Firstly, selecting a CPU that should have had a restricted FPU (e.g. `-mcpu=cortex-m4`, which should have 16 d-regs and no double precision) could give the unrestricted version, because `ARM::getFPUFeatures` returned a list of features including subtracted ones (here `-fp64`,`-d32`), but `ARMTargetInfo::initFeatureMap` threw away all the ones that didn't start with `+`. Secondly, the preprocessor macros didn't reliably match the actual compilation settings: for example, `-mfpu=softvfp` could still set `__ARM_FP` as if hardware FP was available, because the list of features on the cc1 command line would include things like `+vfp4`,`-vfp4d16` and clang didn't realise that one of those cancelled out the other. I've fixed both of these issues by rewriting `ARM::getFPUFeatures` so that it returns a list that enables every FP-related feature compatible with the selected FPU and disables every feature not compatible, which is more verbose but means clang doesn't have to understand the dependency relationships between the backend features. Meanwhile, `ARMTargetInfo::handleTargetFeatures` is testing for all the various forms of the FP feature names, so that it won't miss cases where it should have set `HW_FP` to feed into feature test macros. That in turn caused an ordering problem when handling `-mcpu=foo+bar` together with `-mfpu=something_that_turns_off_bar`. To fix that, I've arranged that the `+bar` suffixes on the end of `-mcpu` and `-march` cause feature names to be put into a separate vector which is concatenated after the output of `getFPUFeatures`. Another side effect of all this is to fix a bug where `clang -target armv8-eabi` by itself would fail to set `__ARM_FEATURE_FMA`, even though `armv8` (aka Arm v8-A) implies FP-Armv8 which has FMA. That was because `HW_FP` was being set to a value including only the `FPARMV8` bit, but that feature test macro was testing only the `VFP4FPU` bit. Now `HW_FP` ends up with all the bits set, so it gives the right answer. Changes to tests included in this patch: * `arm-target-features.c`: I had to change basically all the expected results. (The Cortex-M4 test in there should function as a regression test for the accidental double-precision bug.) * `arm-mfpu.c`, `armv8.1m.main.c`: switched to using `CHECK-DAG` everywhere so that those tests are no longer sensitive to the order of cc1 feature options on the command line. * `arm-acle-6.5.c`: been updated to expect the right answer to that FMA test. * `Preprocessor/arm-target-features.c`: added a regression test for the `mfpu=softvfp` issue. Reviewers: SjoerdMeijer, dmgreen, ostannard, samparker, JamesNagurne Reviewed By: ostannard Subscribers: srhines, javed.absar, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62998 llvm-svn: 362791
-
Sam Elliott authored
Summary: This allows some integer bitwise operations to instead be performed by hardware fp instructions. This is correct because the RISC-V spec requires the F and D extensions to use the IEEE-754 standard representation, and fp register loads and stores to be bit-preserving. This is tested against the soft-float ABI, but with hardware float extensions enabled, so that the tests also ensure the optimisation also fires in this case. Reviewers: asb, luismarques Reviewed By: asb Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62900 llvm-svn: 362790
-
Valery Pykhtin authored
[AMDGPU] Constrain the AMDGPU inliner on maximum number of basic blocks in a caller function (compile time performance) Differential revision: https://reviews.llvm.org/D62917 llvm-svn: 362789
-
Dmitri Gribenko authored
I replaced the circular library dependency with a forward declaration, but it is only a workaround, not a real fix. llvm-svn: 362782
-
Cullen Rhodes authored
Summary: This patch fixes a bug in the assembler that permitted a type suffix on predicate registers when not expected. For instance, the following was previously valid: faddv h0, p0.q, z1.h This bug was present in all SVE instructions containing predicates with no type suffix and no predication form qualifier, i.e. /z or /m. The latter instructions are already caught with an appropiate error message by the assembler, e.g.: .text <stdin>:1:13: error: not expecting size suffix cmpne p1.s, p0.b/z, z2.s, 0 ^ A similar issue for SVE vector registers was fixed in: https://reviews.llvm.org/D59636 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62942 llvm-svn: 362780
-
Cullen Rhodes authored
Patch by Sander de Smalen (sdesmalen) Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62941 llvm-svn: 362779
-
George Rimar authored
This is https://bugs.llvm.org/show_bug.cgi?id=42122. If an object file has a size less than program header's file [offset + size] (i.e. if we have overflow), llvm-objcopy crashes instead of reporting a error. The patch fixes this issue. Differential revision: https://reviews.llvm.org/D62898 llvm-svn: 362778
-
George Rimar authored
This is a refactoring follow-up for D62809 "Change how we handle implicit sections.". It allows to simplify the code. Differential revision: https://reviews.llvm.org/D62912 llvm-svn: 362777
-
Pengfei Wang authored
Support intel -march=cooperlake in llvm Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62836 llvm-svn: 362776
-
Sam Parker authored
Removed unused (in non-debug builds) variable. llvm-svn: 362775
-
Sam Parker authored
Patch which introduces a target-independent framework for generating hardware loops at the IR level. Most of the code has been taken from PowerPC CTRLoops and PowerPC has been ported over to use this generic pass. The target dependent parts have been moved into TargetTransformInfo, via isHardwareLoopProfitable, with HardwareLoopInfo introduced to transfer information from the backend. Three generic intrinsics have been introduced: - void @llvm.set_loop_iterations Takes as a single operand, the number of iterations to be executed. - i1 @llvm.loop_decrement(anyint) Takes the maximum number of elements processed in an iteration of the loop body and subtracts this from the total count. Returns false when the loop should exit. - anyint @llvm.loop_decrement_reg(anyint, anyint) Takes the number of elements remaining to be processed as well as the maximum numbe of elements processed in an iteration of the loop body. Returns the updated number of elements remaining. llvm-svn: 362774
-
Dylan McKay authored
In r356860, the legalization logic for BSWAP was modified to ISD::ROTL, rather than the old ISD::{SHL, SRL, OR} nodes. This works fine on AVR for 8-bit rotations, but 16-bit rotations are currently unimplemented - they always trigger an assertion error in the AVRExpandPseudoInsts pass ("RORW unimplemented"). This patch instructions the legalizer to expand 16-bit rotations into the previous SHL, SRL, OR pattern it did previously. This fixes the 'issue-cannot-select-bswap.ll' test. Interestingly, this test failure seems flaky - it passes successfully on the avr-build-01 buildbot, but fails locally on my Arch Linux install. llvm-svn: 362773
-
Michael Pozulp authored
llvm-svn: 362772
-
Michael Pozulp authored
[llvm-objdump] Print source when subsequent lines in the translation unit come from the same line in two different headers. Reviewers: grimar, rupprecht, jhenderson Reviewed By: grimar, jhenderson Subscribers: llvm-commits, jhenderson Tags: #llvm Differential Revision: https://reviews.llvm.org/D62461 llvm-svn: 362771
-
Michael Pozulp authored
Summary: Fixes Bug 41904 https://bugs.llvm.org/show_bug.cgi?id=41904 Reviewers: jhenderson, rupprecht, grimar, MaskRay Reviewed By: jhenderson, rupprecht, MaskRay Subscribers: dexonsmith, rupprecht, kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62275 llvm-svn: 362768
-
Fangrui Song authored
We should keep the symbol type (STT_GNU_IFUNC) for a local ifunc because it may result in an IRELATIVE reloc that the dynamic loader will use to resolve the address at startup time. There is another problem that is not fixed by this patch: a PC relative relocation should also create a relocation with the ifunc symbol. llvm-svn: 362767
-
Michael Pozulp authored
Subscribers: mgorny, mgrang, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62992 llvm-svn: 362766
-
Michael Pozulp authored
llvm-svn: 362763
-
Fangrui Song authored
llvm-svn: 362762
-
Matt Arsenault authored
llvm-svn: 362761
-
Matt Arsenault authored
This already forced a skip for VMEM, so it should also be done for flat. I'm somewhat skeptical about the benefit of this though. llvm-svn: 362760
-
Nemanja Ivanovic authored
Use the PPC vector min/max instructions for computing the corresponding operation as these should be faster than the compare/select sequences we currently emit. Differential revision: https://reviews.llvm.org/D47332 llvm-svn: 362759
-
Matt Arsenault authored
SIInsertSkips really doesn't understand the control flow, and makes very stupid assumptions about the block layout. This was able to get away with not skipping return blocks, since usually after structurization there is only one placed at the end of the function. Tail duplication can break this assumption. llvm-svn: 362754
-
David Tenty authored
As per the Developer Policy, upon obtaining commit access. llvm-svn: 362753
-
- Jun 06, 2019
-
-
Cameron McInally authored
llvm-svn: 362752
-
Alexey Lapshin authored
Incorrect Debug Variable Range was calculated while "COMPUTING LIVE DEBUG VARIABLES" stage. Range for Debug Variable("i") computed according to current state of instructions inside of basic block. But Register Allocator creates new instructions which were not taken into account when Live Debug Variables computed. In the result DBG_VALUE instruction for the "i" variable was put after these newly inserted instructions. This is incorrect. Debug Value for the loop counter should be inserted before any loop instruction. Differential Revision: https://reviews.llvm.org/D62650 llvm-svn: 362750
-
Alexander Timofeev authored
"Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749
-
Cameron McInally authored
llvm-svn: 362748
-
Craig Topper authored
This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg. We already commuted the unmasked and zero masked versions. I've added 512-bit stack folding tests for most of the instructions affected. I've tested needing commuting and not commuting across unmasked, merged masked, and zero masked. The 128/256 bit instructions should behave similarly. llvm-svn: 362746
-
Sanjay Patel authored
llvm-svn: 362742
-