- Mar 19, 2021
-
-
Andrew Young authored
When deleting operations in DCE, the algorithm uses a post-order walk of the IR to ensure that value uses were erased before value defs. Graph regions do not have the same structural invariants as SSA CFG, and this post order walk could delete value defs before uses. This problem is guaranteed to occur when there is a cycle in the use-def graph. This change stops DCE from visiting the operations and blocks in any meaningful order. Instead, we rely on explicitly dropping all uses of a value before deleting it. Reviewed By: mehdi_amini, rriddle Differential Revision: https://reviews.llvm.org/D98919
-
- Mar 15, 2021
-
-
Julian Gross authored
Create the memref dialect and move dialect-specific ops from std dialect to this dialect. Moved ops: AllocOp -> MemRef_AllocOp AllocaOp -> MemRef_AllocaOp AssumeAlignmentOp -> MemRef_AssumeAlignmentOp DeallocOp -> MemRef_DeallocOp DimOp -> MemRef_DimOp MemRefCastOp -> MemRef_CastOp MemRefReinterpretCastOp -> MemRef_ReinterpretCastOp GetGlobalMemRefOp -> MemRef_GetGlobalOp GlobalMemRefOp -> MemRef_GlobalOp LoadOp -> MemRef_LoadOp PrefetchOp -> MemRef_PrefetchOp ReshapeOp -> MemRef_ReshapeOp StoreOp -> MemRef_StoreOp SubViewOp -> MemRef_SubViewOp TransposeOp -> MemRef_TransposeOp TensorLoadOp -> MemRef_TensorLoadOp TensorStoreOp -> MemRef_TensorStoreOp TensorToMemRefOp -> MemRef_BufferCastOp ViewOp -> MemRef_ViewOp The roadmap to split the memref dialect from std is discussed here: https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667 Differential Revision: https://reviews.llvm.org/D98041
-
Alex Zinenko authored
This reverts commit b5d9a3c9. The commit introduced a memory error in canonicalization/operation walking that is exposed when compiled with ASAN. It leads to crashes in some "release" configurations.
-
Chris Lattner authored
We know that all ConstantLike operations have one result and no operands, so check this first before doing the trait check. This change speeds up Canonicalize on a CIRCT testcase by ~5%. Differential Revision: https://reviews.llvm.org/D98615
-
Chris Lattner authored
Two changes: 1) Change the canonicalizer to walk the function in top-down order instead of bottom-up order. This composes well with the "top down" nature of constant folding and simplification, reducing iterations and re-evaluation of ops in simple cases. 2) Explicitly enter existing constants into the OperationFolder table before canonicalizing. Previously we would "constant fold" them and rematerialize them, wastefully recreating a bunch fo constants, which lead to pointless memory traffic. Both changes together provide a 33% speedup for canonicalize on some mid-size CIRCT examples. One artifact of this change is that the constants generated in normal pattern application get inserted at the top of the function as the patterns are applied. Because of this, we get "inverted" constants more often, which is an aethetic change to the IR but does permute some testcases. Differential Revision: https://reviews.llvm.org/D98609
-
- Mar 11, 2021
-
-
Julian Gross authored
Buffer hoisting moves allocs upwards although it has dependency within its nested region. This patch fixes this issue. https://bugs.llvm.org/show_bug.cgi?id=49142 Differential Revision: https://reviews.llvm.org/D98248
-
River Riddle authored
The current implementation has some inefficiencies that become noticeable when running on large modules. This revision optimizes the code, and updates some out-dated idioms with newer utilities. The main components of this optimization include: * Add an overload of Block::eraseArguments that allows for O(N) erasure of disjoint arguments. * Don't process entry block arguments given that we don't erase them at this point. * Don't track individual operation results, given that we don't erase them. We can just track the parent operation. Differential Revision: https://reviews.llvm.org/D98309
-
- Mar 09, 2021
-
-
Lei Zhang authored
This makes it easy to compose the distribution computation with other affine computations. Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D98171
-
- Mar 03, 2021
-
-
River Riddle authored
The current implementation of Value involves a pointer int pair with several different kinds of owners, i.e. BlockArgumentImpl*, Operation *, TrailingOpResult*. This design arose from the desire to save memory overhead for operations that have a very small number of results (generally 0-2). There are, unfortunately, many problematic aspects of the current implementation that make Values difficult to work with or just inefficient. Operation result types are stored as a separate array on the Operation. This is very inefficient for many reasons: we use TupleType for multiple results, which can lead to huge amounts of memory usage if multi-result operations change types frequently(they do). It also means that simple methods like Value::getType/Value::setType now require complex logic to get to the desired type. Value only has one pointer bit free, severely limiting the ability to use it in things like PointerUnion/PointerIntPair. Given that we store the kind of a Value along with the "owner" pointer, we only leave one bit free for users of Value. This creates situations where we end up nesting PointerUnions to be able to use Value in one. As noted above, most of the methods in Value need to branch on at least 3 different cases which is both inefficient, possibly error prone, and verbose. The current storage of results also creates problems for utilities like ValueRange/TypeRange, which want to efficiently store base pointers to ranges (of which Operation* isn't really useful as one). This revision greatly simplifies the implementation of Value by the introduction of a new ValueImpl class. This class contains all of the state shared between all of the various derived value classes; i.e. the use list, the type, and the kind. This shared implementation class provides several large benefits: * Most of the methods on value are now branchless, and often one-liners. * The "kind" of the value is now stored in ValueImpl instead of Value This frees up all of Value's pointer bits, allowing for users to take full advantage of PointerUnion/PointerIntPair/etc. It also allows for storing more operation results as "inline", 6 now instead of 2, freeing up 1 word per new inline result. * Operation result types are now stored in the result, instead of a side array This drops the size of zero-result operations by 1 word. It also removes the memory crushing use of TupleType for operations results (which could lead up to hundreds of megabytes of "dead" TupleTypes in the context). This also allowed restructured ValueRange, making it simpler and one word smaller. This revision does come with two conceptual downsides: * Operation::getResultTypes no longer returns an ArrayRef<Type> This conceptually makes some usages slower, as the iterator increment is slightly more complex. * OpResult::getOwner is slightly more expensive, as it now requires a little bit of arithmetic From profiling, neither of the conceptual downsides have resulted in any perceivable hit to performance. Given the advantages of the new design, most compiles are slightly faster. Differential Revision: https://reviews.llvm.org/D97804
-
- Mar 02, 2021
-
-
KareemErgawy-TomTom authored
This patch continues detensorizing implementation by detensoring internal control flow in functions. In order to detensorize functions, all the non-entry block's arguments are detensored and branches between such blocks are properly updated to reflect the detensored types as well. Function entry block (signature) is left intact. This continues work towards handling github/google/iree#1159. Reviewed By: silvas Differential Revision: https://reviews.llvm.org/D97148
-
Vladislav Vinogradov authored
Just a pure method renaming. It is a preparation step for replacing "memory space as raw integer" with more generic "memory space as attribute", which will be done in separate commit. The `MemRefType::getMemorySpace` method will return `Attribute` and become the main API, while `getMemorySpaceAsInt` will be declared as deprecated and will be replaced in all in-tree dialects (also in separate commits). Reviewed By: mehdi_amini, rriddle Differential Revision: https://reviews.llvm.org/D97476
-
- Feb 27, 2021
-
-
River Riddle authored
This also exposed a bug in Dialect loading where it was not correctly identifying identifiers that had the dialect namespace as a prefix. Differential Revision: https://reviews.llvm.org/D97431
-
- Feb 26, 2021
-
-
Vinayaka Bandishti authored
Fixes a bug in affine fusion pipeline where an incorrect fusion is performed despite a Call Op that potentially modifies memrefs under consideration exists between source and target. Fixes part of https://bugs.llvm.org/show_bug.cgi?id=49220 Reviewed By: bondhugula, dcaballe Differential Revision: https://reviews.llvm.org/D97252
-
- Feb 25, 2021
-
-
Tung D. Le authored
This patch handles defining ops between the source and dest loop nests, and prevents loop nests with `iter_args` from being fused. If there is any SSA value in the dest loop nest whose defining op has dependence from the source loop nest, we cannot fuse the loop nests. If there is a `affine.for` with `iter_args`, prevent it from being fused. Reviewed By: dcaballe, bondhugula Differential Revision: https://reviews.llvm.org/D97030
-
- Feb 24, 2021
-
-
River Riddle authored
This avoids unnecessary async overhead in situations that won't benefit from it.
-
River Riddle authored
This prevents a bug in the pass instrumentation implementation where the main thread would end up with a different pass manager in different runs of the pass.
-
- Feb 23, 2021
-
-
River Riddle authored
llvm::parallelTransformReduce does not schedule work on the caller thread, which becomes very costly for the inliner where a majority of SCCs are small, often ~1 element. The switch to llvm::parallelForEach solves this, and also aligns the implementation with the PassManager (which realistically should share the same implementation). This change dropped compile time on an internal benchmark by ~1(25%) second. Differential Revision: https://reviews.llvm.org/D96086
-
Adam Straw authored
Affine parallel ops may contain and yield results from MemRefsNormalizable ops in the loop body. Thus, both affine.parallel and affine.yield should have the MemRefsNormalizable trait. Reviewed By: bondhugula Differential Revision: https://reviews.llvm.org/D96821
-
- Feb 22, 2021
-
-
Vivek authored
The current implementation of tilePerfectlyNested utility doesn't handle the non-unit step size. We have added support to perform tiling correctly even if the step size of the loop to be tiled is non-unit. Fixes https://bugs.llvm.org/show_bug.cgi?id=49188. Differential Revision: https://reviews.llvm.org/D97037
-
Vinayaka Bandishti authored
[MLIR][affine] Prevent fusion when ops with memory effect free are present between producer and consumer This commit fixes a bug in affine fusion pipeline where an incorrect fusion is performed despite a dealloc op is present between a producer and a consumer. This is done by creating a node for dealloc op in the MDG. Reviewed By: bondhugula, dcaballe Differential Revision: https://reviews.llvm.org/D97032
-
- Feb 21, 2021
-
-
Jacques Pienaar authored
Move over to ODS & use pass options.
-
- Feb 18, 2021
-
-
Alexander Belyaev authored
This commit introduced a cyclic dependency: Memref dialect depends on Standard because it used ConstantIndexOp. Std depends on the MemRef dialect in its EDSC/Intrinsics.h Working on a fix. This reverts commit 8aa6c376.
-
Julian Gross authored
Create the memref dialect and move several dialect-specific ops without dependencies to other ops from std dialect to this dialect. Moved ops: AllocOp -> MemRef_AllocOp AllocaOp -> MemRef_AllocaOp DeallocOp -> MemRef_DeallocOp MemRefCastOp -> MemRef_CastOp GetGlobalMemRefOp -> MemRef_GetGlobalOp GlobalMemRefOp -> MemRef_GlobalOp PrefetchOp -> MemRef_PrefetchOp ReshapeOp -> MemRef_ReshapeOp StoreOp -> MemRef_StoreOp TransposeOp -> MemRef_TransposeOp ViewOp -> MemRef_ViewOp The roadmap to split the memref dialect from std is discussed here: https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667 Differential Revision: https://reviews.llvm.org/D96425
-
- Feb 16, 2021
-
-
Adam Straw authored
Separating the AffineMapAccessInterface from AffineRead/WriteOp interface so that dialects which extend Affine capabilities (e.g. PlaidML PXA = parallel extensions for Affine) can utilize relevant passes (e.g. MemRef normalization). Reviewed By: bondhugula Differential Revision: https://reviews.llvm.org/D96284
-
Nicolas Vasilache authored
SliceAnalysis originally was developed in the context of affine.for within mlfunc. It predates the notion of region. This revision updates it to not hardcode specific ops like scf::ForOp. When rooted at an op, the behavior of the slice computation changes as it recurses into the regions of the op. This does not support gathering all values transitively depending on a loop induction variable anymore. Additional variants rooted at a Value are added to also support the existing behavior. Differential revision: https://reviews.llvm.org/D96702
-
- Feb 12, 2021
-
-
Alexander Belyaev authored
-
Alexander Belyaev authored
Differential Revision: https://reviews.llvm.org/D96579
-
- Feb 11, 2021
-
-
Mehdi Amini authored
Differential Revision: https://reviews.llvm.org/D96474
-
- Feb 10, 2021
-
-
Uday Bondhugula authored
Update affine.for loop unroll utility for iteration arguments support. Fix promoteIfSingleIteration as well. Fixes PR49084: https://bugs.llvm.org/show_bug.cgi?id=49084 Differential Revision: https://reviews.llvm.org/D96383
-
- Feb 09, 2021
-
-
River Riddle authored
These properties were useful for a few things before traits had a better integration story, but don't really carry their weight well these days. Most of these properties are already checked via traits in most of the code. It is better to align the system around traits, and improve the performance/cost of traits in general. Differential Revision: https://reviews.llvm.org/D96088
-
- Feb 06, 2021
-
-
Tung D. Le authored
This patch fixes the following bug when calling --affine-loop-fusion Input program: ```mlir func @should_not_fuse_since_top_level_non_affine_non_result_users( %in0 : memref<32xf32>, %in1 : memref<32xf32>) { %c0 = constant 0 : index %cst_0 = constant 0.000000e+00 : f32 affine.for %d = 0 to 32 { %lhs = affine.load %in0[%d] : memref<32xf32> %rhs = affine.load %in1[%d] : memref<32xf32> %add = addf %lhs, %rhs : f32 affine.store %add, %in0[%d] : memref<32xf32> } store %cst_0, %in0[%c0] : memref<32xf32> affine.for %d = 0 to 32 { %lhs = affine.load %in0[%d] : memref<32xf32> %rhs = affine.load %in1[%d] : memref<32xf32> %add = addf %lhs, %rhs: f32 affine.store %add, %in0[%d] : memref<32xf32> } return } ``` call --affine-loop-fusion, we got an incorrect output: ```mlir func @should_not_fuse_since_top_level_non_affine_non_result_users(%arg0: memref<32xf32>, %arg1: memref<32xf32>) { %c0 = constant 0 : index %cst = constant 0.000000e+00 : f32 store %cst, %arg0[%c0] : memref<32xf32> affine.for %arg2 = 0 to 32 { %0 = affine.load %arg0[%arg2] : memref<32xf32> %1 = affine.load %arg1[%arg2] : memref<32xf32> %2 = addf %0, %1 : f32 affine.store %2, %arg0[%arg2] : memref<32xf32> %3 = affine.load %arg0[%arg2] : memref<32xf32> %4 = affine.load %arg1[%arg2] : memref<32xf32> %5 = addf %3, %4 : f32 affine.store %5, %arg0[%arg2] : memref<32xf32> } return } ``` This happened because when analyzing the source and destination nodes, affine loop fusion ignored non-result ops sandwitched between them. In other words, the MemRefDependencyGraph in the affine loop fusion ignored these non-result ops. This patch solves the issue by adding these non-result ops to the MemRefDependencyGraph. Reviewed By: bondhugula Differential Revision: https://reviews.llvm.org/D95668
-
- Feb 05, 2021
-
-
River Riddle authored
This makes ignoring a result explicit by the user, and helps to prevent accidental errors with dropped results. Marking LogicalResult as no discard was always the intention from the beginning, but got lost along the way. Differential Revision: https://reviews.llvm.org/D95841
-
- Feb 04, 2021
-
-
Alex Zinenko authored
In dialect conversion infrastructure, source materialization applies as part of the finalization procedure to results of the newly produced operations that replace previously existing values with values having a different type. However, such operations may be created to replace operations created in other patterns. At this point, it is possible that the results of the _original_ operation are still in use and have mismatching types, but the results of the _intermediate_ operation that performed the type change are not in use leading to the absence of source materialization. For example, %0 = dialect.produce : !dialect.A dialect.use %0 : !dialect.A can be replaced with %0 = dialect.other : !dialect.A %1 = dialect.produce : !dialect.A // replaced, scheduled for removal dialect.use %1 : !dialect.A and then with %0 = dialect.final : !dialect.B %1 = dialect.other : !dialect.A // replaced, scheduled for removal %2 = dialect.produce : !dialect.A // replaced, scheduled for removal dialect.use %2 : !dialect.A in the same rewriting, but only the %1->%0 replacement is currently considered. Change the logic in dialect conversion to look up all values that were replaced by the given value and performing source materialization if any of those values is still in use with mismatching types. This is performed by computing the inverse value replacement mapping. This arguably expensive manipulation is performed only if there were some type-changing replacements. An alternative could be to consider all replaced operations and not only those that resulted in type changes, but it would harm pattern-level composability: the pattern that performed the non-type-changing replacement would have to be made aware of the type converter in order to call the materialization hook. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D95626
-
Mehdi Amini authored
We could extend this with an interface to allow dialect to perform a type conversion, but that would make the folder creating operation which isn't the case at the moment, and isn't necessarily always desirable. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D95991
-
- Feb 02, 2021
-
-
Alex Zinenko authored
In dialect conversion, signature conversions essentially perform block argument replacement and are added to the general value remapping. However, the replaced values were not tracked, so if a signature conversion was rolled back, the construction of operand lists for the following patterns could have obtained block arguments from the mapping and give them to the pattern leading to use-after-free. Keep track of signature conversions similarly to normal block argument replacement, and erase such replacements from the general mapping when the conversion is rolled back. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D95688
-
- Jan 29, 2021
-
-
Alexander Belyaev authored
Currently, for a scf.parallel (i,j,k) after the loop collapsing to 1D is done, the IVs would be traversed as for an scf.parallel(k,j,i). Differential Revision: https://reviews.llvm.org/D95693
-
- Jan 25, 2021
-
-
Diego Caballero authored
This patch adds support for producer-consumer fusion scenarios with multiple producer stores to the AffineLoopFusion pass. The patch introduces some changes to the producer-consumer algorithm, including: * For a given consumer loop, producer-consumer fusion iterates over its producer candidates until a fixed point is reached. * Producer candidates are gathered beforehand for each iteration of the consumer loop and visited in reverse program order (not strictly guaranteed) to maximize the number of loops fused per iteration. In general, these changes were needed to simplify the multi-store producer support and remove some of the workarounds that were introduced in the past to support more fusion cases under the single-store producer limitation. This patch also preserves the existing functionality of AffineLoopFusion with one minor change in behavior. Producer-consumer fusion didn't fuse scenarios with escaping memrefs and multiple outgoing edges (from a single store). Multi-store producer scenarios will usually (always?) have multiple outgoing edges so we couldn't fuse any with escaping memrefs, which would greatly limit the applicability of this new feature. Therefore, the patch enables fusion for these scenarios. Please, see modified tests for specific details. Reviewed By: andydavis1, bondhugula Differential Revision: https://reviews.llvm.org/D92876
-
- Jan 22, 2021
-
-
mikeurbach authored
This extracts the implementation of getType, setType, and getBody from FunctionSupport.h into the mlir::impl namespace and defines them generically in FunctionSupport.cpp. This allows them to be used elsewhere for any FunctionLike ops that use FunctionType for their type signature. Using the new helpers, FuncOpSignatureConversion is generalized to work with all such FunctionLike ops. Convenience helpers are added to configure the pattern for a given concrete FunctionLike op type. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D95021
-
- Jan 20, 2021
-
-
Diego Caballero authored
This reverts commit 7dd19885. ASAN issue.
-
Jacques Pienaar authored
-