- Mar 15, 2021
-
-
Hongtao Yu authored
Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D98439
-
- Mar 14, 2021
-
-
Chenguang Wang authored
Current ArgPromotion implementation does not copy it: https://godbolt.org/z/zzTKof Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D93927
-
Simonas Kazlauskas authored
This is an alternative to D98120. Herein, instead of deleting the transformation entirely, we check that the underlying objects are both the same and therefore this transformation wouldn't incur a provenance change, if applied. https://alive2.llvm.org/ce/z/SYF_yv Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98588
-
Luo, Yuanke authored
The load/store instruction will be transformed to amx intrinsics in the pass of AMX type lowering. Prohibiting the pointer cast make that pass happy. Differential Revision: https://reviews.llvm.org/D98247
-
- Mar 13, 2021
-
-
Nikita Popov authored
This fixes a regression from the MemDep-based implementation: MemDep completely ignores lifetime.start intrinsics that aren't MustAlias -- this is probably unsound, but it does mean that the MemDep based implementation successfully eliminated memcpy's from lifetime.start if the memcpy happens at an offset, rather than the base address of the alloca. Add a special case for the case where the lifetime.start spans the whole alloca (which is pretty much the only kind of lifetime.start that frontends ever emit), as we don't need to figure out our exact aliasing relationship in that case, the whole alloca is dead prior to the call. If this doesn't cover all practically relevant cases, then it would be possible to make use of the recently added PartialAlias clobber offsets to make this more precise.
-
Sanjay Patel authored
The structure of this fold is suspect vs. most of instcombine because it creates instructions and tries to delete them immediately after. If we don't have the operand types for the icmps, then we are not behaving as assumed. And as shown in PR49475, we can inf-loop.
-
Roman Lebedev authored
The added test case crashes before this fix: ``` opt: /repositories/llvm-project/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp:5172: BasicBlock::iterator (anonymous namespace)::LSRInstance::AdjustInsertPositionForExpand(BasicBlock::iterator, const (anonymous namespace)::LSRFixup &, const (anonymous namespace)::LSRUse &, llvm::SCEVExpander &) const: Assertion `!isa<PHINode>(LowestIP) && !LowestIP->isEHPad() && !isa<DbgInfoIntrinsic>(LowestIP) && "Insertion point must be a normal instruction"' failed. ``` This is fully analogous to the previous commit, with the pointer constant replaced to be something non-null. The comparison here can be strength-reduced, but the second operand of the comparison happens to be identical to the constant pointer in the `catch` case of `landingpad`. While LSRInstance::CollectLoopInvariantFixupsAndFormulae() already gave up on uses in blocks ending up with EH pads, it didn't consider this case. Eventually, `LSRInstance::AdjustInsertPositionForExpand()` will be called, but the original insertion point it will get is the user instruction itself, and it doesn't want to deal with EH pads, and asserts as much. It would seem that this basically never happens in-the-wild, otherwise it would have been reported already, so it seems safe to take the cautious approach, and just not deal with such users.
-
Nikita Popov authored
Rather than checking for simple equality, check for MustAlias, as we do in other transforms. This catches equivalent GEPs.
-
Nikita Popov authored
If a memset destination is overwritten by a memcpy and the sizes are exactly the same, then the memset is simply dead. We can directly drop it, instead of replacing it with a memset of zero size, which is particularly ugly for the case of a dynamic size.
-
- Mar 12, 2021
-
-
Wei Mi authored
".llvm." suffix". The recommit fixed a bug that symbols with "." at the beginning is not properly handled in the last commit. Original commit message: Currently IndirectCallPromotion simply strip everything after the first "." in LTO mode, in order to match the symbol name and the name with ".llvm." suffix in the value profile. However, if -funique-internal-linkage-names and thinlto are both enabled, the name may have both ".__uniq." suffix and ".llvm." suffix, and the current mechanism will strip them both, which is unexpected. The patch fixes the problem. Differential Revision: https://reviews.llvm.org/D98389
-
Nikita Popov authored
This removes some (but not all) uses of type-less CreateGEP() and CreateInBoundsGEP() APIs, which are incompatible with opaque pointers. There are a still a number of tricky uses left, as well as many more variation APIs for CreateGEP.
-
Florian Hahn authored
This patch fixes a crash when trying to get a scalar value using VPTransformState::get() for uniform induction values or truncated induction values. IVs and truncated IVs can be uniform and the updated code accounts for that, fixing the crash. This should fix https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=31981
-
Sanjay Patel authored
The test is reduced from a C source example in: https://llvm.org/PR49541 It's possible that the test could be reduced further or the predicate generalized further, but it seems to require a few ingredients (including the "late" SimplifyCFG options on the RUN line) to fall into the infinite-loop trap.
-
Hans Wennborg authored
This broke the check-profile tests on Mac, see comment on the code review. > This is no longer needed, we can add __llvm_profile_runtime directly > to llvm.compiler.used or llvm.used to achieve the same effect. > > Differential Revision: https://reviews.llvm.org/D98325 This reverts commit c7712087. Also reverting the dependent follow-up commit: Revert "[InstrProfiling] Generate runtime hook for ELF platforms" > When using -fprofile-list to selectively apply instrumentation only > to certain files or functions, we may end up with a binary that doesn't > have any counters in the case where no files were selected. However, > because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the > runtime would still be pulled in and incur some non-trivial overhead, > especially in the case when the continuous or runtime counter relocation > mode is being used. A better way would be to pull in the profile runtime > only when needed by declaring the __llvm_profile_runtime symbol in the > translation unit only when needed. > > This approach was already used prior to 9a041a75, but we changed it > to always generate the __llvm_profile_runtime due to a TAPI limitation. > Since TAPI is only used on Mach-O platforms, we could use the early > emission of __llvm_profile_runtime there, and on other platforms we > could change back to the earlier approach where the symbol is generated > later only when needed. We can stop passing -u__llvm_profile_runtime to > the linker on Linux and Fuchsia since the generated undefined symbol in > each translation unit that needed it serves the same purpose. > > Differential Revision: https://reviews.llvm.org/D98061 This reverts commit 87fd09b2.
-
Serguei Katkov authored
As readnone function they become movable and LICM can hoist them out of a loop. As a result in LCSSA form phi node of type token is created. No one is ready that GCRelocate first operand is phi node but expects to be token. GVN test were also updated, it seems it does not do what is expected. Test for LICM is also added. This reverts commit f352463a.
-
Johannes Doerfert authored
Since D86233 we have `mustprogress` which, in combination with `readonly`, implies `willreturn`. The idea is that every side-effect has to be modeled as a "write". Consequently, `readonly` means there is no side-effect, and `mustprogress` guarantees that we cannot "loop" forever without side-effect. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94125
-
- Mar 11, 2021
-
-
Nikita Popov authored
Splitting this out as the change is non-trivial: The way this code handled pointer types doesn't really make sense, as GEPs can only apply an offset to the outermost pointer, but can't drill down into interior pointer types (which would require dereferencing memory). Instead give special treatment to the first (pointer) index. I've hardcoded it to zero as that's the only way the function is used right now, but handling non-zero indexes would be straightforward. The original goal here was to have an element type for CreateGEP.
-
Petr Hosek authored
When using -fprofile-list to selectively apply instrumentation only to certain files or functions, we may end up with a binary that doesn't have any counters in the case where no files were selected. However, because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the runtime would still be pulled in and incur some non-trivial overhead, especially in the case when the continuous or runtime counter relocation mode is being used. A better way would be to pull in the profile runtime only when needed by declaring the __llvm_profile_runtime symbol in the translation unit only when needed. This approach was already used prior to 9a041a75, but we changed it to always generate the __llvm_profile_runtime due to a TAPI limitation. Since TAPI is only used on Mach-O platforms, we could use the early emission of __llvm_profile_runtime there, and on other platforms we could change back to the earlier approach where the symbol is generated later only when needed. We can stop passing -u__llvm_profile_runtime to the linker on Linux and Fuchsia since the generated undefined symbol in each translation unit that needed it serves the same purpose. Differential Revision: https://reviews.llvm.org/D98061
-
Valery N Dmitriev authored
Associative reduction matcher in SLP begins with select instruction but when it reached call to llvm.umax (or alike) via def-use chain the latter also matched as UMax kind. The routine's later code assumes matched instruction to be a select and thus it merely died on the first encountered cast that did not fit. Differential Revision: https://reviews.llvm.org/D98432
-
Wenlei He authored
For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining. Differential Revision: https://reviews.llvm.org/D98187
-
Hiroshi Yamauchi authored
1. PGOMemOPSizeOpt grabs only the first, up to five (by default) entries from the value profile metadata and preserves the remaining entries for the fallback memop call site. If there are more than five entries, the rest of the entries would get dropped. This is fine for PGOMemOPSizeOpt itself as it only promotes up to 3 (by default) values, but potentially not for other downstream passes that may use the value profile metadata. 2. PGOMemOPSizeOpt originally assumed that only values 0 through 8 are kept track of. When the range buckets were introduced, it was changed to skip the range buckets, but since it does not grab all entries (only five), if some range buckets exist in the first five entries, it could potentially cause fewer promotion opportunities (eg. if 4 out of 5 were range buckets, it may be able to promote up to one non-range bucket, as opposed to 3.) Also, combined with 1, it means that wrong entries may be preserved, as it didn't correctly keep track of which were entries were skipped. To fix this, PGOMemOPSizeOpt now grabs all the entries (up to the maximum number of value profile buckets), keeps track of which entries were skipped, and preserves all the remaining entries. Differential Revision: https://reviews.llvm.org/D97592
-
Stephen Tozer authored
Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This reverts commit c0f3dfb9. Reverted due to an error on the clang-x64-windows-msvc buildbot.
-
Nikita Popov authored
Explicitly pass loaded type when creating loads, in preparation for the deprecation of these APIs. There are still a couple of uses left.
-
gbtozers authored
This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic operations with two or more non-const operands; this includes the GetElementPtr instruction, and most Binary Operator instructions. These salvages produce DIArgList locations and are only valid for dbg.values, as currently variadic DIExpressions must use DW_OP_stack_value. This functionality is also only added for salvageDebugInfoForDbgValues; other functions that directly call salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be updated in a later patch. Differential Revision: https://reviews.llvm.org/D91722
-
Nikita Popov authored
Relative to the previous implementation, this always uses aliasesUnknownInst() instead of aliasesPointer() to correctly handle atomics. The added test case was previously miscompiled. ----- Even when MemorySSA-based LICM is used, an AST is still populated for scalar promotion. As the AST has quadratic complexity, a lot of time is spent in this step despite the existing access count limit. This patch optimizes the identification of promotable stores. The idea here is pretty simple: We're only interested in must-alias mod sets of loop invariant pointers. As such, only populate the AST with loop-invariant loads and stores (anything else is definitely not promotable) and then discard any sets which alias with any of the remaining, definitely non-promotable accesses. If we promoted something, check whether this has made some other accesses loop invariant and thus possible promotion candidates. This is much faster in practice, because we need to perform AA queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable) instead of O(NumTotal^2), and NumPromotable tends to be small. Additionally, promotable accesses have loop invariant pointers, for which AA is cheaper. This has a signicant positive compile-time impact. We save ~1.8% geomean on CTMark at O3, with 6% on lencod in particular and 25% on individual files. Conceptually, this change is NFC, but may not be so in practice, because the AST is only an approximation, and can produce different results depending on the order in which accesses are added. However, there is at least no impact on the number of promotions (licm.NumPromoted) in test-suite O3 configuration with this change. Differential Revision: https://reviews.llvm.org/D89264
-
Djordje Todorovic authored
By using the original-di check with debugify in the combination with the llvm/utils/llvm-original-di-preservation.py it becomes very user friendly tool. An example of the HTML page with the issues related to debug info can be found at [0]. [0] https://djolertrk.github.io/di-checker-html-report-example/ Differential Revision: https://reviews.llvm.org/D82546
-
Petr Hosek authored
This is no longer needed, we can add __llvm_profile_runtime directly to llvm.compiler.used or llvm.used to achieve the same effect. Differential Revision: https://reviews.llvm.org/D98325
-
Ruiling Song authored
This is useful for debugging which pointers are updated during remapping process. Differential Revision: https://reviews.llvm.org/D95775
-
- Mar 10, 2021
-
-
kuterd authored
This patch makes uses of the context bridges introduced in D83299 to make AAValueConstantRange call site specific. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83744
-
Mauri Mustonen authored
Add support to widen select instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer. Previously select instructions get handled by the wrong recipe and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97136
-
Matteo Favaro authored
The isOverwrite function is making sure to identify if two stores are fully overlapping and ideally we would like to identify all the instances of OW_Complete as they'll yield possibly killable stores. The current implementation is incapable of spotting instances where the earlier store is offsetted compared to the later store, but still fully overlapped. The limitation seems to lie on the computation of the base pointers with the GetPointerBaseWithConstantOffset API that often yields different base pointers even if the stores are guaranteed to partially overlap (e.g. the alias analysis is returning AliasResult::PartialAlias). The patch relies on the offsets computed and cached by BatchAAResults (available after D93529) to determine if the offsetted overlapping is OW_Complete. Differential Revision: https://reviews.llvm.org/D97676
-
Sriraman Tallam authored
D96109 was recently submitted which contains the refactored implementation of -funique-internal-linakge-names by adding the unique suffixes in clang rather than as an LLVM pass. Deleting the former implementation in this change. Differential Revision: https://reviews.llvm.org/D98234
-
gbtozers authored
This patch refactors out the salvaging of GEP and BinOp instructions into separate functions, in preparation for further changes to the salvaging of these instructions coming in another patch; there should be no functional change as a result of this refactor. Differential Revision: https://reviews.llvm.org/D92851
-
Daniil Seredkin authored
[InstCombine][SimplifyLibCalls] An extra sqrtf was produced because of transformations in optimizePow function See: https://bugs.llvm.org/show_bug.cgi?id=47613 There was an extra sqrt call because shrinking emitted a new powf and at the same time optimizePow replaces the previous pow with sqrt and as the result we have two instructions that will be in worklist of InstCombie despite the fact that %powf is not used by anyone (it is alive because of errno). As the result we have two instructions: %powf = call fast float @powf(float %x, float 5.000000e-01) %sqrt = call fast double @sqrt(double %dx) %powf will be converted to %sqrtf on a later iteration. As a quick fix for that I moved shrinking to the end of optimizePow so that pow is replaced with sqrt at first that allows not to emit a new shrunk powf. Differential Revision: https://reviews.llvm.org/D98235
-
Jianzhou Zhao authored
This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D98268
-
Dávid Bolvanský authored
Follow up for fhahn's D98284. Also fixes a case from PR47644. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98346
-
Florian Hahn authored
Currently DSE misses cases where the size is a non-const IR value, even if they match. For example, this means that llvm.memcpy/llvm.memset calls are not eliminated, even if they write the same number of bytes. This patch extends isOverwite to try to get IR values for the number of bytes written from the analyzed instructions. If the values match, alias checks are performed and the result is returned. At the moment this only covers llvm.memcpy/llvm.memset. In the future, we may enable MemoryLocation to also track variable sizes, but this simple approach should allow us to cover the important cases in DSE. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98284
-
Wei Mi authored
now -funique-internal-linkage-name flag is available, and we want to flip it on by default since it is beneficial to have separate sample profiles for different internal symbols with the same name. As a preparation, we want to avoid regression caused by the flip. When we flip -funique-internal-linkage-name on, the profile is collected from binary built without -funique-internal-linkage-name so it has no uniq suffix, but the IR in the optimized build contains the suffix. This kind of mismatch may introduce transient regression. To avoid such mismatch, we introduce a NameTable section flag indicating whether there is any name in the profile containing uniq suffix. Compiler will decide whether to keep uniq suffix during name canonicalization depending on the NameTable section flag. The flag is only available for extbinary format. For other formats, by default compiler will keep uniq suffix so they will only experience transient regression when -funique-internal-linkage-name is just flipped. Another type of regression is caused by places where we miss to call getCanonicalFnName. Those places are fixed. Differential Revision: https://reviews.llvm.org/D96932
-
Philip Reames authored
-