- May 21, 2020
-
-
Craig Topper authored
[LegalizeDAG] Modify ExpandLegalINT_TO_FP to swap data for little/big endian instead of the pointers. Will make it easier to pass the pointer info and alignment correctly to the loads/stores. While there also make the i32 stores independent and use a token factor to join before the load.
-
Juneyoung Lee authored
Summary: If an induction variable is frozen and used, SCEV yields imprecise result because it doesn't say anything about frozen variables. Due to this reason, performance degradation happened after https://reviews.llvm.org/D76483 is merged, causing SCEV yield imprecise result and preventing LSR to optimize a loop. The suggested solution here is to add a pass which canonicalizes frozen variables inside a loop. To be specific, it pushes freezes out of the loop by freezing the initial value and step values instead & dropping nsw/nuw flags from instructions used by freeze. This solution was also mentioned at https://reviews.llvm.org/D70623 . Reviewers: spatel, efriedma, lebedev.ri, fhahn, jdoerfert Reviewed By: fhahn Subscribers: nikic, mgorny, hiraditya, javed.absar, llvm-commits, sanwou01, nlopes Tags: #llvm Differential Revision: https://reviews.llvm.org/D77523
-
Eli Friedman authored
The offsets were wrong. The result is now the same as what the compiler would generate for a function that spills lr normally. Differential Revision: https://reviews.llvm.org/D80238
-
Eli Friedman authored
If we don't know anything about the alignment of a pointer, Align(1) is still correct: all pointers are at least 1-byte aligned. Included in this patch is a bugfix for an issue discovered during this cleanup: pointers with "dereferenceable" attributes/metadata were assumed to be aligned according to the type of the pointer. This wasn't intentional, as far as I can tell, so Loads.cpp was fixed to stop making this assumption. Frontends may need to be updated. I updated clang's handling of C++ references, and added a release note for this. Differential Revision: https://reviews.llvm.org/D80072
-
Francis Visoiu Mistrih authored
With the new SVE stack layout, we now need to provide a Darwin variant for all the calling conventions based on the main AAPCS CSR save order. This also changes APCS_SwiftError to have a Darwin and a non-Darwin version, assuming it could be used on other platforms these days, and restricts the AArch64_CXX_TLS calling convention to Darwin. Differential Revision: https://reviews.llvm.org/D73805
-
Stanislav Mekhanoshin authored
Even though series of cmd/cndmask can produce quite a lot of code that is still better than a loop. In case of doubles we would even produce two loops. Differential Revision: https://reviews.llvm.org/D80032
-
Craig Topper authored
Previously this code just used a default constructed MachinePointerInfo. But we know the accesses are to a fixed stack object or at least somewhere on the stack. While there fix the alignment passed to the full vector load/stores. I don't think this function is currently exercised in tree so I don't know how to test it. I just noticed it when I removed non-constant index support in this function. Differential Revision: https://reviews.llvm.org/D80058
-
- May 20, 2020
-
-
Nico Weber authored
Demangling Itanium symbols either consumes the whole input or fails, but Microsoft symbols can be successfully demangled with just some of the input. Add an outparam that enables clients to know how much of the input was consumed, and use this flag to give llvm-undname an opt-in warning on partially consumed symbols. Differential Revision: https://reviews.llvm.org/D80173
-
Roman Lebedev authored
---------------------------------------- define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) { %0: %t0 = sub <2 x i4> { 0, 0 }, %src %t1 = sub i4 0, %a %t2 = insertelement <2 x i4> %t0, i4 %t1, i32 %x %t3 = sub <2 x i4> %b, %t2 ret <2 x i4> %t3 } => define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) { %0: %t2.neg = insertelement <2 x i4> %src, i4 %a, i32 %x %t3 = add <2 x i4> %t2.neg, %b ret <2 x i4> %t3 } Transformation seems to be correct!
-
Roman Lebedev authored
---------------------------------------- define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) { %0: %t0 = sub <2 x i4> { 0, 0 }, %x call void @use_v2i4(<2 x i4> %t0) %t1 = extractelement <2 x i4> %t0, i32 %y %t2 = sub i4 %z, %t1 ret i4 %t2 } => define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) { %0: %t0 = sub <2 x i4> { 0, 0 }, %x call void @use_v2i4(<2 x i4> %t0) %t1.neg = extractelement <2 x i4> %x, i32 %y %t2 = add i4 %t1.neg, %z ret i4 %t2 } Transformation seems to be correct!
-
aartbik authored
Summary: Fixes issue https://bugs.llvm.org/show_bug.cgi?id=45995 Reviewers: mehdi_amini, nicolasvasilache, reidtatge, craig.topper, ftynse, bkramer Reviewed By: craig.topper Subscribers: RKSimon, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80231
-
Arthur Eubanks authored
See https://reviews.llvm.org/D74651 for the preallocated IR constructs and LangRef changes. In X86TargetLowering::LowerCall(), if a call is preallocated, record each argument's offset from the stack pointer and the total stack adjustment. Associate the call Value with an integer index. Store the info in X86MachineFunctionInfo with the integer index as the key. This adds two new target independent ISDOpcodes and two new target dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}. The setup ISelDAG node takes in a chain and outputs a chain and a SrcValue of the preallocated call Value. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an %esp adjustment, the exact amount determined by looking in X86MachineFunctionInfo with the integer index key. The arg ISelDAG node takes in a chain, a SrcValue of the preallocated call Value, and the arg index int constant. It produces a chain and the pointer fo the arg. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a lea of the stack pointer plus an offset determined by looking in X86MachineFunctionInfo with the integer index key. Force any function containing a preallocated call to use the frame pointer. Does not yet handle a setup without a call, or a conditional call. Does not yet handle musttail. That requires a LangRef change first. Tried to look at all references to inalloca and see if they apply to preallocated. I've made preallocated versions of tests testing inalloca whenever possible and when they make sense (e.g. not alloca related, inalloca edge cases). Aside from the tests added here, I checked that this codegen produces correct code for something like ``` struct A { A(); A(A&&); ~A(); }; void bar() { foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8); } ``` by replacing the inalloca version of the .ll file with the appropriate preallocated code. Running the executable produces the same results as using the current inalloca implementation. Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77689
-
Arthur Eubanks authored
This reverts commit 810567dc. Some tests are unexpectedly passing
-
Hiroshi Yamauchi authored
Summary: Rename 'i' to 'I'. Factor out the optional field handling to getOptionalVal(). Split out of D79951. Reviewers: davidxl Subscribers: eraman, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80230
-
Arthur Eubanks authored
See https://reviews.llvm.org/D74651 for the preallocated IR constructs and LangRef changes. In X86TargetLowering::LowerCall(), if a call is preallocated, record each argument's offset from the stack pointer and the total stack adjustment. Associate the call Value with an integer index. Store the info in X86MachineFunctionInfo with the integer index as the key. This adds two new target independent ISDOpcodes and two new target dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}. The setup ISelDAG node takes in a chain and outputs a chain and a SrcValue of the preallocated call Value. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an %esp adjustment, the exact amount determined by looking in X86MachineFunctionInfo with the integer index key. The arg ISelDAG node takes in a chain, a SrcValue of the preallocated call Value, and the arg index int constant. It produces a chain and the pointer fo the arg. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a lea of the stack pointer plus an offset determined by looking in X86MachineFunctionInfo with the integer index key. Force any function containing a preallocated call to use the frame pointer. Does not yet handle a setup without a call, or a conditional call. Does not yet handle musttail. That requires a LangRef change first. Tried to look at all references to inalloca and see if they apply to preallocated. I've made preallocated versions of tests testing inalloca whenever possible and when they make sense (e.g. not alloca related, inalloca edge cases). Aside from the tests added here, I checked that this codegen produces correct code for something like ``` struct A { A(); A(A&&); ~A(); }; void bar() { foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8); } ``` by replacing the inalloca version of the .ll file with the appropriate preallocated code. Running the executable produces the same results as using the current inalloca implementation. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77689
-
Matt Arsenault authored
This was replicating the low bits into the high bits for G_ZEXT, rather than using 0.
-
Pierre-vh authored
Previously, the LowOverheadLoops pass couldn't handle VPT blocks with conditions, or with multiple VCTPs. This patch improves the LowOverheadLoops pass so it can handle those cases. It also adds support for VCMPs before the VCTP. Differential Revision: https://reviews.llvm.org/D78206
-
Sam Parker authored
Combine the two API calls into one by introducing a structure to hold the relevant data. This has the added benefit of moving the boiler plate code for arguments and flags, into the constructors. This is intended to be a non-functional change, but the complicated web of logic involved here makes it very hard to guarantee. Differential Revision: https://reviews.llvm.org/D79941
-
Georgii Rymar authored
Similar to a regular section chunk, a Fill should have this property. This patch implements it. Differential revision: https://reviews.llvm.org/D80190
-
Florian Hahn authored
SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. This patch was originally committed as b8a3c34e, but broke the modules build, as LoopAccessAnalysis was using the Expander. The code-gen part of LAA was moved to lib/Transforms recently, so this patch can be landed again. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537
-
Kang Zhang authored
Summary: For PowerPC, there are 3 passes has disabled the machine verification. ``` PPCTargetMachine.cpp: addPass(&LiveVariablesID, false); PPCTargetMachine.cpp: addPass(createPPCEarlyReturnPass(), false); PPCTargetMachine.cpp: addPass(createPPCBranchSelectionPass(), false); ``` This patch is to enable machine verification for above three passes. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D79840
-
Simon Pilgrim authored
Replace with forward declarations and move necessary includes down to source files. Exposes an implicit dependency on TargetMachine.h in llvm-opt-fuzzer.cpp
-
Jay Foad authored
This is the second attempt at landing this patch, after fixing the KeepOneInputPHIs behaviour to also keep zero input PHIs. Differential Revision: https://reviews.llvm.org/D80141
-
Stanislav Mekhanoshin authored
Differential Revision: https://reviews.llvm.org/D80256
-
QingShan Zhang authored
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression. However, during negating the expression, the cost might change as we are changing the DAG, and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore. This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression, and check the cost during negating the expression. It also reduce the duplicated code between getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638 Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D77319
-
Matt Arsenault authored
Relying on any MachineFunction state in the MachineFunctionInfo constructor is hazardous, because the construction time is unclear and determined by the first use. The function may be only partially constructed, which is part of why we have many of these hacky string attributes to track what we need for ABI lowering. For SelectionDAG, all stack objects are created up-front before calling convention lowering so stack objects are visible at construction time. For GlobalISel, none of the IR function has been visited yet and the allocas haven't been added to the MachineFrameInfo yet. This should fix failing to set flat_scratch_init in GlobalISel when needed. This pass really needs to be turned into some kind of analysis, but I haven't found a nice way use one here.
-
Matt Arsenault authored
This was looking for a compare condition, and copying the compare flags. I don't think this was ever correct outside of certain min/max patterns which aren't checked, but this probably predates select instructions having fast math flags.
-
Matt Arsenault authored
This should be directly implied from the register class, and there's no need to special case live ins here. This was getting the wrong answer for the queue ptr argument in callable functions, since it's not an explicit IR argument and is always uniform. Fixes not using scalar loads for the aperture in addrspacecast lowering, and any other places that use implicit SGPR arguments.
-
Matt Arsenault authored
-
Brian Cain authored
Writes to p3:0 do not produce new values, we should bar any .new consumer trying to use it as a producer.
-
- May 19, 2020
-
-
Matt Arsenault authored
-
Matt Arsenault authored
-
Eli Friedman authored
The handling of unwind info is broken, so disable it for now.
-
Benjamin Kramer authored
-
Lei Huang authored
Summary: Cleanup and commonize code used for spilling to the stack. Reviewers: stefanp, nemanjai, #powerpc, kamaub Reviewed By: nemanjai, #powerpc, kamaub Subscribers: kamaub, hiraditya, wuzish, shchenz, llvm-commits, kbarton Tags: #llvm, #powerpc Differential Revision: https://reviews.llvm.org/D79736
-
Thomas Lively authored
Summary: The code previously assumed the source of the bitcast in the combined pattern was a vector type, but this is not always true. This patch adds a check to avoid an assertion failure in that case. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80164
-
Thomas Lively authored
Summary: This reflects changes in the spec proposal made since basic arithmetic was first implemented. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D80174
-
Jay Foad authored
Differential Revision: https://reviews.llvm.org/D80141
-
Nikita Popov authored
After D76797 the dominator tree is no longer used in LVI, so we can remove it as a pass dependency, and also get rid of the dominator tree enabling/disabling logic in JumpThreading. Apart from cleaning up the code, this also clarifies LVI cache consistency, in that the LVI cache can no longer depend on whether the DT was or wasn't enabled due to pending DT updates at any given time. Differential Revision: https://reviews.llvm.org/D76985
-