- Sep 15, 2021
-
-
Filipp Zhinkin authored
Enabled mul folding optimization that was previously disabled by being incorrect. To preserve correctness, mul's operand that is not compared with zero in select's condition is now frozen. Related bug: https://bugs.llvm.org/show_bug.cgi?id=51286 Correctness: https://alive2.llvm.org/ce/z/bHef7J https://alive2.llvm.org/ce/z/QcR7sf https://alive2.llvm.org/ce/z/vvBLzt https://alive2.llvm.org/ce/z/jGDXgq https://alive2.llvm.org/ce/z/3Pe8Z4 https://alive2.llvm.org/ce/z/LGga8M https://alive2.llvm.org/ce/z/CTG5fs Differential Revision: https://reviews.llvm.org/D108408
-
Sanjay Patel authored
-
Simon Pilgrim authored
Based off the worse case numbers generated by D103695, the AVX2/512 bit reversing/counting costs were higher than necessary (based off instruction counts instead of actual throughput).
-
Martin Storsjö authored
This codepath hadn't been exercised in a build with asserts before. Differential Revision: https://reviews.llvm.org/D109778
-
Martin Storsjö authored
This was requested in D38253, but missed back then. Differential Revision: https://reviews.llvm.org/D109046
-
Nico Weber authored
-
Nicolas Vasilache authored
Summary: Making the late transformations opt-in results in less surprising behavior when composing multiple calls to the codegen strategy. Reviewers: Subscribers: Differential revision: https://reviews.llvm.org/D109820
-
Nicolas Vasilache authored
AliasInfo can now use union-find for a much more efficient implementation. This brings no functional changes but large performance gains on more complex examples. Differential Revision: https://reviews.llvm.org/D109819
-
David Green authored
Under some situations under Thumb1, we could be stuck in an infinite loop recombining the same instruction. This puts a limit on that, not combining SUBC with SUBE repeatedly.
-
Florian Hahn authored
Add a set of test cases where redundant stores may be removable, depending on whether a local allocation gets captured before performing a load.
-
David Green authored
This extends the reduction logic in the vectorizer to handle intrinsic versions of min and max, both the floating point variants already created by instcombine under fastmath and the integer variants from D98152. As a bonus this allows us to match a chain of min or max operations into a single reduction, similar to how add/mul/etc work. Differential Revision: https://reviews.llvm.org/D109645
-
Simon Pilgrim authored
When searching for hidden identity shuffles (added at rG41146bfe82aecc79961c3de898cda02998172e4b), only peek through bitcasts to the source operand if it is a vector type as well.
-
Simon Atanasyan authored
Identified in D109359.
-
Justas Janickas authored
Adds support for a feature macro `__opencl_c_images` in C++ for OpenCL 2021 enabling a respective optional core feature from OpenCL 3.0. This change aims to achieve compatibility between C++ for OpenCL 2021 and OpenCL 3.0. Differential Revision: https://reviews.llvm.org/D109002
-
Cullen Rhodes authored
Identified in D109359. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D109755
-
David Green authored
-
Matthias Springer authored
E.g.: ``` %2 = memref.alloc() {alignment = 128 : i64} : memref<256x256xf32> %3 = memref.alloc() {alignment = 128 : i64} : memref<256x256xf32> // ... (%3 is not written to) linalg.copy(%3, %2) : memref<256x256xf32>, memref<256x256xf32> vector.transfer_write %11, %2[%c0, %c0] {in_bounds = [true, true]} : vector<256x256xf32>, memref<256x256xf32> ``` Avoid copies of %3 if %3 came directly from an InitTensorOp. Differential Revision: https://reviews.llvm.org/D109742
-
Florian Hahn authored
This is a first step towards addressing the last remaining limitation of the VPlan version of sinkScalarOperands: the legacy version can partially sink operands. For example, if a GEP has uniform users outside the sink target block, then the legacy version will sink all scalar GEPs, other than the one for lane 0. This patch works towards addressing this case in the VPlan version by detecting such cases and duplicating the sink candidate. All users outside of the sink target will be updated to use the uniform clone. Note that this highlights an issue with VPValue naming. If we duplicate a replicate recipe, they will share the same underlying IR value and both VPValues will have the same name ir<%gep>. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104254
-
Xiang1 Zhang authored
[X86][InlineAsm] Use mem size information (*word ptr) for "global variable + registers" memory expression in inline asm. Differential Revision: https://reviews.llvm.org/D109739
-
Alex Zinenko authored
Create a new document that explain both stages of the process in a single place, merge and deduplicate the content from the two previous documents. Also extend the documentation to account for the recent changes in pass structure due to standard dialect splitting and translation being more flexible. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D109605
-
Tobias Gysi authored
Update the doc due to recent path changes an point to a helper script.
-
Amara Emerson authored
G_PTR_ADD (G_PTR_ADD X, C), Y) -> (G_PTR_ADD (G_PTR_ADD(X, Y), C) Improves CTMark -Os on AArch64: Program before after diff sqlite3 286932 287024 0.0% kc 432512 432508 -0.0% SPASS 412788 412764 -0.0% pairlocalalign 249460 249416 -0.0% bullet 475740 475512 -0.0% 7zip-benchmark 568864 568356 -0.1% consumer-typeset 419088 418648 -0.1% tramp3d-v4 367628 367224 -0.1% clamscan 383184 382732 -0.1% lencod 430028 429284 -0.2% Geomean difference -0.1% Differential Revision: https://reviews.llvm.org/D109528
-
Markus Lavin authored
Added '-print-pipeline-passes' printing of parameters for those passes declared with *_WITH_PARAMS macro in PassRegistry.def. Note that it only prints the parameters declared inside *_WITH_PARAMS as in a few cases there appear to be additional parameters not parsable. The following passes are now covered (i.e. all of those with *_WITH_PARAMS in PassRegistry.def). LoopExtractorPass - loop-extract HWAddressSanitizerPass - hwsan EarlyCSEPass - early-cse EntryExitInstrumenterPass - ee-instrument LowerMatrixIntrinsicsPass - lower-matrix-intrinsics LoopUnrollPass - loop-unroll AddressSanitizerPass - asan MemorySanitizerPass - msan SimplifyCFGPass - simplifycfg LoopVectorizePass - loop-vectorize MergedLoadStoreMotionPass - mldst-motion GVN - gvn StackLifetimePrinterPass - print<stack-lifetime> SimpleLoopUnswitchPass - simple-loop-unswitch Differential Revision: https://reviews.llvm.org/D109310
-
serge-sans-paille authored
This check should ensure we don't reproduce the problem fixed by 02df443d More accurately, it checks every llvm::Any::TypeId symbol in libLLVM-x.so and make sure they have weak linkage and are not local to the library, which would lead to duplicate definition if another weak version of the symbol is defined in another linked library. Differential Revision: https://reviews.llvm.org/D109252
-
Esme-Yi authored
Summary: This patch implements parsing sections for obj2yaml on AIX. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D98003
-
Hongtao Yu authored
Invalid frame addresses exist in call stack samples due to bad unwinding. This could happen to frame-pointer-based unwinding and the callee functions that do not have the frame pointer chain set up. It isn't common when the program is built with the frame pointer omission disabled, but can still happen with third-party static libs built with frame pointer omitted. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D109638
-
Mehdi Amini authored
This reverts commit 81f8ad17. This seems to break the shared libs build (linaro-flang-aarch64-sharedlibs bot) with: undefined reference to `Fortran::semantics::IsCoarray(Fortran::semantics::Symbol const&) (from tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/tools.cpp.o) When linking lib/libFortranEvaluate.so.14git
-
Mehdi Amini authored
This seems in-line with the intent and how we build tools around it. Update the description for the flag accordingly. Also use an injected thread pool in MLIROptMain, now we will create threads up-front and reuse them across split buffers. Differential Revision: https://reviews.llvm.org/D109802
-
cwz920716 authored
Both copy/alloc ops are using memref dialect after this change. Reviewed By: silvas, mehdi_amini Differential Revision: https://reviews.llvm.org/D109480
-
LLVM GN Syncbot authored
-
Nico Weber authored
This reverts commit 49992c04. The test is still failing on Windows, see comments on https://reviews.llvm.org/D108893
-
Philip Reames authored
-
Matt Arsenault authored
The fmul is a canonicalizing operation, and fneg is not so this would break denormals that need flushing and also would not quiet signaling nans. Fold to fsub instead, which is also canonicalizing.
-
Hongtao Yu authored
Pseudo probe instrumentation was missing from O0 build. It is needed in cases where some source files are built in O0 while the others are built in optimize mode. Reviewed By: wenlei, wlei, wmi Differential Revision: https://reviews.llvm.org/D109531
-
Thomas Lively authored
Rather than depending on the hex dump from obj2yaml. Now the test shows the expected function body in a human readable format. Differential Revision: https://reviews.llvm.org/D109730
-
Matt Arsenault authored
This simple heuristic uses the estimated live range length combined with the number of registers in the class to switch which heuristic to use. This was taking the raw number of registers in the class, even though not all of them may be available. AMDGPU heavily relies on dynamically reserved numbers of registers based on user attributes to satisfy occupancy constraints, so the raw number is highly misleading. There are still a few problems here. In the original testcase that made me notice this, the live range size is incorrect after the scheduler rearranges instructions, since the instructions don't have the original InstrDist offsets. Additionally, I think it would be more appropriate to use the number of disjointly allocatable registers in the class. For the AMDGPU register tuples, there are a large number of registers in each tuple class, but only a small fraction can actually be allocated at the same time since they all overlap with each other. It seems we do not have a query that corresponds to the number of independently allocatable registers. Relatedly, I'm still debugging some allocation failures where overlapping tuples seem to not be handled correctly. The test changes are mostly noise. There are a handful of x86 tests that look like regressions with an additional spill, and a handful that now avoid a spill. The worst looking regression is likely test/Thumb2/mve-vld4.ll which introduces a few additional spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll shows a massive improvement by completely eliminating a large number of spills inside a loop.
-
Fangrui Song authored
The last user has been removed from llvm-zorg for Android.
-
Matthias Springer authored
Do not generate FillOps when these would be entirely overwritten. Differential Revision: https://reviews.llvm.org/D109741
-
Matt Arsenault authored
ConstantOffsetExtractor::Find was infinitely recursing on the add referencing itself.
-
Matt Arsenault authored
-