- Oct 23, 2020
-
-
Arthur Eubanks authored
An alwaysinline function may not get inlined in inliner-wrapper due to the inlining order. Previously for the following, the inliner would first inline @a() into @b(), ``` define void @a() { entry: call void @b() ret void } define void @b() alwaysinline { entry: br label %for.cond for.cond: call void @a() br label %for.cond } ``` making @b() recursive and unable to be inlined into @a(), ending at ``` define void @a() { entry: call void @b() ret void } define void @b() alwaysinline { entry: br label %for.cond for.cond: call void @b() br label %for.cond } ``` Running always-inliner first makes sure that we respect alwaysinline in more cases. Fixes https://bugs.llvm.org/show_bug.cgi?id=46945. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D86988
-
Han Shen authored
This reverts commit adfb5415. This is reverted because it caused an chrome error: https://crbug.com/1140168
-
Wei Mi authored
to their parent classes. SampleProfileReaderExtBinary/SampleProfileWriterExtBinary specify the typical section layout currently used by SampleFDO. Currently a lot of section reader/writer stay in the two classes. However, as we expect to have more types of SampleFDO profiles, we hope those new types of profiles can share the common sections while configuring their own sections easily with minimal change. That is why I move some common stuff from SampleProfileReaderExtBinary/SampleProfileWriterExtBinary to SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase so new profiles class inheriting from the base class can reuse them. Differential Revision: https://reviews.llvm.org/D89524
-
Jessica Paquette authored
Move the code which adjusts the immediate/predicate on a G_ICMP to AArch64PostLegalizerLowering. This - Reduces the number of places we need to test for optimized compares in the selector. We know that the compare should have been simplified by the time it hits the selector, so we can avoid testing this in selects, brconds, etc. - Allows us to potentially fold more compares (previously, this optimization was only done after calling `tryFoldCompare`, this may allow us to hit some more TST cases) - Simplifies the selection code in `emitIntegerCompare` significantly; we can just use an emitSUBS function. - Allows us to avoid checking that the predicate has been updated after `emitIntegerCompare`. Also add a utility header file for things that may be useful in the selector and various combiners. No need for an implementation file at this point, since it's just one constexpr function for now. I've run into a couple cases where having one of these would be handy, so might as well add it here. There are a couple functions in the selector that can probably be factored out into here. Differential Revision: https://reviews.llvm.org/D89823
-
- Oct 22, 2020
-
-
Jessica Paquette authored
There are a lot of combines in AArch64PostLegalizerCombiner which exist to facilitate instruction matching in the selector. (E.g. matching for G_ZIP and other shuffle vector pseudos) It still makes sense to select these instructions at -O0. Matching earlier in a combiner can reduce complexity in the selector significantly. For example, a good portion of our selection code for compares would be a lot easier to represent in a combine. This patch moves matching combines into a "AArch64PostLegalizerLowering" combiner which runs at all optimization levels. Also, while we're here, improve the documentation for the AArch64PostLegalizerCombiner, and fix up the filepath in its file comment. And also add a 'r' which somehow got dropped from a bunch of function names. https://reviews.llvm.org/D89820
-
Nikita Popov authored
Per asbirlea's comment, assert that only instructions, constants and arguments are passed to this API. Simplify returning true would not be correct for special Value subclasses like MemoryAccess.
-
Nikita Popov authored
Visited phi blocks only need to be added for the duration of the recursive alias queries, they should not leak into following code. Once again, while this also improves analysis precision, this is mainly intended to clarify the applicability scope of VisitedPhiBBs.
-
Nikita Popov authored
We only need the VisitedPhiBBs to disambiguate comparisons of values from two different loop iterations. If we're comparing two phis from the same basic block in lock-step, the compared values will always be on the same iteration. While this also increases precision, this is mainly intended to clarify the scope of VisitedPhiBBs.
-
Venkataramanan Kumar authored
Differential Revision: https://reviews.llvm.org/D88154
-
Vedant Kumar authored
This reverts commit 26ee8aff. It's necessary to insert bitcast the pointer operand of a lifetime marker if it has an opaque pointer type. rdar://70560161
-
Arthur Eubanks authored
Some clang tests use this. Reviewed By: akhuang Differential Revision: https://reviews.llvm.org/D89931
-
David Blaikie authored
Testing reveals that lldb and gdb have some problems with supporting DW_OP_convert - gdb with Split DWARF tries to resolve the CU-relative DIE offset relative to the skeleton DIE. lldb tries to treat the offset as absolute, which judging by the llvm-dsymutil support for DW_OP_convert, I guess works OK in MachO? (though probably llvm-dsymutil is producing invalid DWARF by resolving the relative reference to an absolute one?). Specifically this disables DW_OP_convert usage in DWARFv5 if: * Tuning for GDB and using Split DWARF * Tuning for LLDB and not targeting MachO
-
Layton Kifer authored
Delete duplicate implementation getSelectFoldableConstant and replace with ConstantExpr::getBinOpIdentity. Differential Revision: https://reviews.llvm.org/D89839
-
Nikita Popov authored
When performing a call slot optimization to a GEP destination, it will currently usually fail, because the GEP is directly before the memcpy and as such does not dominate the call. We should move it above the call if that satisfies the domination requirement. I think that a constant-index GEP is the only useful thing to move here, as otherwise isDereferenceablePointer couldn't look through it anyway. As such I'm not trying to generalize this further. Differential Revision: https://reviews.llvm.org/D89623
-
Ettore Tiotto authored
Make member function const where possible, use LLVM_DEBUG to print debug traces rather than a custom option, pass by reference to avoid null checking, ... Reviewed By: fhann Differential Revision: https://reviews.llvm.org/D89895
-
Vedant Kumar authored
When InstCombine removes an alloca, it erases the dbg.{addr,declare} instructions which refer to the alloca. It would be better to instead remove all debug intrinsics which describe the contents of the dead alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s. This effectively undoes work performed in an InstCombine run earlier in the pipeline by LowerDbgDeclare, which inserts DW_OP_deref dbg.values before CallInst users of an alloca. The motivating example looks like: ``` define void @foo(i32 %0) { %a = alloca i32 ; This alloca is erased. store i32 %0, i32* %a dbg.value(i32 %0, "arg0") ; This dbg.value survives. dbg.value(i32* %a, "arg0", DW_OP_deref) call void @trivially_inlinable_no_op(i32* %a) ret void } ``` If the DW_OP_deref dbg.value is not erased, it becomes dbg.value(undef) after inlining, making "arg0" unavailable. But we already have dbg.value descriptions of the alloca's value (from LowerDbgDeclare), so the DW_OP_deref dbg.value cannot serve its purpose of describing an initialization of the alloca by some callee. It invalidates other useful dbg.values, causing large gaps in location coverage, so we should delete it (even though doing so may cause stale dbg.values to appear, if there's a dead store to `%a` in @trivially_inlinable_no_op). OTOH, it wouldn't be correct to delete all dbg.value descriptions of an alloca. Note that it's possible to describe a variable that takes on different pointer values, e.g.: ``` void use(int *); void t(int a, int b) { int *local = &a; // dbg.value(i32* %a.addr, "local") local = &b; // dbg.value(i32* undef, "local") use(&a); // (note: %b.addr is optimized out) local = &a; // dbg.value(i32* %a.addr, "local") } ``` In this example, the alloca for "b" is erased, but we need to describe the value of "local" as <unavailable> before the call to "use". This prevents "local" from appearing to be equal to "&a" at the callsite. rdar://66592859 Differential Revision: https://reviews.llvm.org/D85555
-
Nikita Popov authored
Non-instruction defs like arguments, constants or global values always dominate all instructions/uses inside the function. This case currently needs to be treated separately by the caller, see https://reviews.llvm.org/D89623#inline-832818 for an example. This patch makes the dominator tree APIs accept a Value instead of an Instruction and always returns true for the non-Instruction case. A complication here is that BasicBlocks are also Values. For that reason we can't support the dominates(Value *, BasicBlock *) variant, as it would conflict with dominates(BasicBlock *, BasicBlock *), which has different semantics. For the other two APIs we assert that the passed value is not a BasicBlock. Differential Revision: https://reviews.llvm.org/D89632
-
Tim Corringham authored
Add new loop metadata amdgpu.loop.unroll.threshold to allow the initial AMDGPU specific unroll threshold value to be specified on a loop by loop basis. The intention is to be able to to allow more nuanced hints, e.g. specifying a low threshold value to indicate that a loop may be unrolled if cheap enough rather than using the all or nothing llvm.loop.unroll.disable metadata. Differential Revision: https://reviews.llvm.org/D84779
-
Mircea Trofin authored
Also updated the users of the APIs; and a drive-by small change to RDFRegister.cpp Differential Revision: https://reviews.llvm.org/D89912
-
Arthur Eubanks authored
It was already disabled under -Oz in buildFunctionSimplificationPipeline(), but not in buildModuleOptimizationPipeline()/addPGOInstrPasses(). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D89927
-
Piotr Sobczak authored
This commit marks i16 MULH as expand in AMDGPU backend, which is necessary after the refactoring in D80485. Differential Revision: https://reviews.llvm.org/D89965
-
Evgeny Leviant authored
Differential revision: https://reviews.llvm.org/D89939
-
Simon Pilgrim authored
Reported by cppcheck
-
Simon Pilgrim authored
Avoid unnecessary copy in X86AsmParser::ParseIntelOperand
-
Jeremy Morse authored
Both FastRegAlloc and LiveDebugVariables/greedy need to cope with DBG_INSTR_REFs. None of them actually need to take any action, other than passing DBG_INSTR_REFs through: variable location information doesn't refer to any registers at this stage. LiveDebugVariables stashes the instruction information in a tuple, then re-creates it later. This is only necessary as the register allocator doesn't expect to see any debug instructions while it's working. No equivalence classes or interval splitting is required at all! No changes are needed for the fast register allocator, as it just ignores debug instructions. The test added checks that both of them preserve DBG_INSTR_REFs. This also expands ScheduleDAGInstrs.cpp to treat DBG_INSTR_REFs the same as DBG_VALUEs when rescheduling instructions around. The current movement of DBG_VALUEs around is less than ideal, but it's not a regression to make DBG_INSTR_REFs subject to the same movement. Differential Revision: https://reviews.llvm.org/D85757
-
Matt Arsenault authored
The VGPRs used for SGPR spills need to be reserved, even if we aren't speculatively reserving one. This was broken by 117e5609.
-
Matt Arsenault authored
We don't support funclets for exception handling and I hit this when manually reducing MIR.
-
Matt Arsenault authored
If the end instruction of the scheduling region was a DBG_VALUE, the uses of the debug instruction were tracked as if they were real uses. This would then hit the deadDefHasNoUse assertion in addVRegDefDeps if the only use was the debug instruction.
-
Jon Chesterfield authored
[OpenMP] Emit calls to int64_t functions for amdgcn Two functions, syncwarp and active_thread_mask, return lanemask_t. Currently this is assumed to be int32, which is true for nvptx. Patch makes the type target architecture dependent. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89746
-
Simon Pilgrim authored
Add the MVT equivalent handling for EVT changeTypeToInteger/changeVectorElementType/changeVectorElementTypeToInteger. All the SimpleVT code already exists inside the EVT equivalents, but by splitting this out we can use these directly inside MVT types without converting to/from EVT.
-
Jeremy Morse authored
This patch touches two optimizations, TwoAddressInstruction and X86's FixupLEAs pass, both of which optimize by re-creating instructions. For LEAs, various bits of arithmetic are better represented as LEAs on X86, while TwoAddressInstruction sometimes converts instrs into three address instructions if it's profitable. For debug instruction referencing, both of these require substitutions to be created -- the old instruction number must be pointed to the new instruction number, as illustrated in the added test. If this isn't done, any variable locations based on the optimized instruction are conservatively dropped. Differential Revision: https://reviews.llvm.org/D85756
-
Max Kazantsev authored
-
Max Kazantsev authored
This better reflects what this variable is about.
-
Tianqing Wang authored
For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D89301
-
Max Kazantsev authored
This better reflects what this logic actually does.
-
Max Kazantsev authored
This reverts commit 3fce5ea7. `make check` broken.
-
Sjoerd Meijer authored
This improves simplifications for pattern `icmp (X+Y), (X+Z)` -> `icmp Y,Z` if only one of the operands has NSW set, e.g.: icmp slt (x + 0), (x +nsw 1) We can still safely rewrite this to: icmp slt 0, 1 because we know that the LHS can't overflow if the RHS has NSW set and C1 < C2 && C1 >= 0, or C2 < C1 && C1 <= 0 This simplification is useful because ScalarEvolutionExpander which is used to generate code for SCEVs in different loop optimisers is not always able to put back NSW flags across control-flow, thus inhibiting CFG simplifications. Differential Revision: https://reviews.llvm.org/D89317
-
Fangrui Song authored
findNearestCommonDominator never returns nullptr.
-
Jonas Devlieghere authored
Make these types conform to the LLVM Coding Standards: > Type names (including classes, structs, enums, typedefs, etc) should > be nouns and start with an upper-case letter.
-
Tony authored
- Make the SIMemoryLegalizer insertAcquire function be in the same order for each target to be consistent. Differential Revision: https://reviews.llvm.org/D89880
-