- May 03, 2021
-
-
Anirudh Prasad authored
- Previously, https://reviews.llvm.org/D101308 removed prefixes from register while printing them out. This was especially needed for inline asm statements which used input/output operands. - However, the backend SystemZAsmParser, accepts both prefixed registers and prefix-less registers as part of its implementation - This patch aims to change that by ensuring that prefixed registers are only allowed for the ATT dialect. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D101665
-
Alexey Bataev authored
Need to check if target allows/supports masked gathers before trying to estimate its cost, otherwise we may fail to vectorize some of the patterns because of too pessimistic cost model. Part of D57059. Differential Revision: https://reviews.llvm.org/D101297
-
Dávid Bolvanský authored
Zext doesn't change the number of trailing zeros, so narrow cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true'. Proofs: https://alive2.llvm.org/ce/z/o2dnjY Solves https://bugs.llvm.org/show_bug.cgi?id=50172 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D101582
-
Jon Roelofs authored
Since googlebench builds as c++11, the change there is incorrect and breaks the googlebench build when the STL implementation is strict about std::enable_if_t not being available in lesser c++ versions. partial revert of: 1bd6123b (https://reviews.llvm.org/D74384) Differential Revision: https://reviews.llvm.org/D101583
-
Alexey Bataev authored
This reverts commit b5f64768 to fix a compiler crash revealed by buildbots.
-
Alexey Bataev authored
Need to check if target allows/supports masked gathers before trying to estimate its cost, otherwise we may fail to vectorize some of the patterns because of too pessimistic cost model. Part of D57059. Differential Revision: https://reviews.llvm.org/D101297
-
Konstantin Zhuravlyov authored
-
Florian Hahn authored
As we gradually move more elements of LV to VPlan, we are trying to reduce the number of places that still has to check IR of the original loop. This patch adjusts the code to fix cross iteration phis to get the PHIs to fix directly from the VPlan that is executed. We still need the original PHI to check for first-order recurrences, but we can get rid of that once we model that explicitly in VPlan as well. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D99293
-
LLVM GN Syncbot authored
-
Abhina Sreeskantharajan authored
This patch adds the basic functions needed for controlling auto conversion on z/OS. Auto conversion is enabled on untagged input file to ASCII by making the assumption that all untagged files are EBCDIC encoded. Output files are auto converted to EBCDIC IBM-1047. This change also enables conversion for stdin/stdout/stderr. For more information on how fcntl controls codepage https://www.ibm.com/docs/en/zos/2.4.0?topic=descriptions-fcntl-bpx1fct-bpx4fct-control-open-file-descriptors Reviewed By: anirudhp Differential Revision: https://reviews.llvm.org/D100483
-
Sanjay Patel authored
If we don't demand high bits, then we also don't care about those high bits of a left-shift operand regardless of shift amount. I noticed the sext/trunc pattern in a motivating example. It seems like there should be a low-bits with right-shift sibling, but I haven't looked at that yet. https://alive2.llvm.org/ce/z/JuS6jc https://rise4fun.com/Alive/Trm (not sure how to use 'width' with Alive1) https://alive2.llvm.org/ce/z/gRadbF Differential Revision: https://reviews.llvm.org/D101489
-
David Green authored
Similarly to D101096, this makes sure that MMO operands get propagated through from MVE gathers/scatters to the Machine Instructions. This allows extra scheduling freedom, not forcing the instructions to act as scheduling barriers. We create MMO's with an unknown size, specifying that they can load from anywhere in memory, similar to the masked_gather or X86 intrinsics. Differential Revision: https://reviews.llvm.org/D101219
-
Fraser Cormack authored
Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101518
-
Christian Kühnel authored
as proposed by @FlashSheridan in https://reviews.llvm.org/rG7f9717b922d4
-
Sebastian Neubauer authored
SITargetLowering::LowerFormalArguments asserts that none of these features are used for graphics calling conventions, so AnnotateKernelFeatures should not add them. Differential Revision: https://reviews.llvm.org/D101534
-
Reshabh Sharma authored
Add address sanitizer instrumentation support for accesses to global and constant address spaces in AMDGPU. It strictly avoids instrumenting the stack and assumes x86 as the host. Reviewed by: vitalybuka Differential Revision: https://reviews.llvm.org/D99071
-
Konstantin Zhuravlyov authored
This reverts commit 54aad636. Includes fix for note-amd-valid-v3.s test.
-
Sergio Perez Gonzalez authored
Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101133
-
David Green authored
We create MMO's for the VLDn/VSTn intrinsics in ARMTargetLowering:: getTgtMemIntrinsic, but they do not currently make it ll the way through ISel. This changes that in the various places it needs changing, making sure that the MMO is propagate through to the final instruction. This can help in scheduling, not treating the VLD2/VST2 as a scheduling barrier. Differential Revision: https://reviews.llvm.org/D101096
-
Stelios Ioannou authored
Setting the preffered function alignment to 16 for Cortex A53/A55 improves performance in a wide range of benchmarks. This brings it in line with the Cortex-A53/A55 tuning that is used in GCC (gcc/config/aarch64/aarch64.c). Differential Revision: https://reviews.llvm.org/D101636 Change-Id: I2ce47fe7ab5e3b54f49c89038d8da4e404742de2
-
- May 02, 2021
-
-
Craig Topper authored
This allows for a much more efficient encoding for small negative numbers by storing the sign bit first and negating the rest of the bits. This was already being used for OPC_CheckInteger. For every in tree target this affects, the table got smaller. R600GenDAGISel.inc saw the largest reduction of 7K. I did have to add a new opcode for StringIntegers used for register class ids and subregister indices since we don't have the integer value to encode. The enum name is emitted directly into the table. Previously assumed the enum would expand to a positive 7-bit number. We might be able to just shift that right by 1 and assume it is a positive 6 bit number, but that will need more investigation.
-
Craig Topper authored
This shrinks the immediate that isel table needs to emit for these instructions. Hoping this allows me to change OPC_EmitInteger to use a better variable length encoding for representing negative numbers. Similar to what was done a few months ago for OPC_CheckInteger. The alternative encoding uses less bytes for negative numbers, but increases the number of bytes need to encode 64 which was a very common number in the RISCV table due to SEW=64. By using Log2 this becomes 6 and is no longer a problem.
-
Arthur Eubanks authored
As opposed to going through the Aliasee type. For opaque pointers, we're trying to remove uses of PointerType::getElementType(). Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D101715
-
Florian Hahn authored
This patch introduces a helper to obtain an iterator range for the PHI-like recipes in a block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100101
-
Nikita Popov authored
We currently can't determine any exit counts here, because there is no "controlling exit".
-
Juneyoung Lee authored
This is a patch that folds select of select to salvage some optimizations after select -> and/or folding is disabled. ``` select (select a, true, b), c, false -> select a, c, false select c, (select a, true, b), false -> select c, a, false if c implies that b is false (isImpliedCondition). ``` https://alive2.llvm.org/ce/z/ANatjt, https://alive2.llvm.org/ce/z/rv8zTB ``` sel (sel c, a, false), true, (sel !c, b, false) -> sel c, a, b sel (sel !c, a, false), true, (sel c, b, false) -> sel c, b, a ``` https://alive2.llvm.org/ce/z/U2kp-t, https://alive2.llvm.org/ce/z/bc88EE See D101191 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101375
-
Juneyoung Lee authored
-
Juneyoung Lee authored
-
Arthur Eubanks authored
To reduce dependence on pointee types for opaque pointers. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D101706
-
Juneyoung Lee authored
This is an NFC that reruns update_test_checks.py on the tests that are going to be updated in D101191.
-
Juneyoung Lee authored
This is a patch that adds ctpop intrinsics to propagatesPoison. Splitted from D101191
-
Juneyoung Lee authored
This update supports the following transformation: ``` select(extract(mul_with_overflow(a, _), _), (a == 0), false) => and(extract(mul_with_overflow(a, _), _), (a == 0)) ``` which is correct because if `a` was poison the select's condition was also poison. This update is splitted from D101423.
-
LLVM GN Syncbot authored
-
Juneyoung Lee authored
As discussed in D101191, this patch adds a poison-safe folding of overflow bit check: ``` %Op0 = icmp ne i4 %X, 0 %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y) %Op1 = extractvalue { i4, i1 } %Agg, 1 %ret = select i1 %Op0, i1 %Op1, i1 false => %Y.fr = freeze %Y %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y.fr) %Op1 = extractvalue { i4, i1 } %Agg, 1 %ret = %Op1 ``` https://alive2.llvm.org/ce/z/zgPUGT https://alive2.llvm.org/ce/z/h2gZ_6 Note that there are cases where inserting freeze is not necessary: e.g. %Y is `noundef`. In this case, LLVM is already good because `%ret` is already successfully folded into `and`, triggering the pre-existing optimization in InstSimplify: https://godbolt.org/z/v6qena15K Differential Revision: https://reviews.llvm.org/D101423
-
Juneyoung Lee authored
-
- May 01, 2021
-
-
Harald van Dijk authored
X32 uses 32-bit ELF object files with 32-bit alignment, so the .note.gnu.property section needs to be emitted as it is for X86. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101689
-
Nikita Popov authored
If V & Mask != 0, we know that at least one of the bits in Mask must be set, so the value must be >= the lowest bit in Mask.
-
Nikita Popov authored
-
Roman Lebedev authored
Introduce basic schedule model for AMD Zen 3 CPU's, a.k.a `znver3`. This is fully built from scratch, from llvm-mca measurements and documented reference materials. Nothing was copied from `znver2`/`znver1`. I believe this is in a reasonable state of completion for inclusion, probably better than D52779 `bdver2` was :) Namely: * uops are pretty spot-on (at least what llvm-mca can measure) {F16422596} * latency is also pretty spot-on (at least what llvm-mca can measure) {F16422601} * throughput is within reason {F16422607} I haven't run much benchmarks with this, however RawSpeed benchmarks says this is beneficial: {F16603978} {F16604029} I'll call out the obvious problems there: * i didn't really bother with X87 instructions * i didn't really bother with obviously-microcoded/system instructions * There are large discrepancy in throughput for `mr` and `rm` instructions. I'm not really sure if it's a modelling defect that needs to be fixed, or it's a defect of measurments. * Pipe distributions are probably bad :) I can't do much here until AMD allows that to be fixed by documenting the appropriate counters and updating libpfm That being said, as @RKSimon notes: >>! In D94395#2647381, @RKSimon wrote: > I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions <...> so how much worse this could possibly be?! Things that aren't there: * Various tunings: zero idioms, etc. That is follow-ups. Differential Revision: https://reviews.llvm.org/D94395
-
Nikita Popov authored
This seems to be a leftover from when the BackedgeTakenInfo stored multiple exit counts with manual memory management. At some point this was switchted to a simple vector, and there should be no need to micro-manage the clearing anymore. We can simply drop the loop from the map and the the destructor do its job.
-