- May 05, 2021
-
-
Alexander Belyaev authored
Differential Revision: https://reviews.llvm.org/D101861
-
Javier Setoain authored
While we figure out how to best add Standard support for scalable vectors, these instructions provide a workaround for basic arithmetic between scalable vectors. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D100837
-
William S. Moses authored
Differential Revision: https://reviews.llvm.org/D101798
-
Aart Bik authored
This revision migrates more code from Linalg into the new permanent home of SparseTensor. It replaces the test passes with proper compiler passes. NOTE: the actual removal of the last glue and clutter in Linalg will follow Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D101811
-
- May 04, 2021
-
-
William S. Moses authored
See: https://reviews.llvm.org/D101710
-
William S. Moses authored
Differential Revision: https://reviews.llvm.org/D101801
-
William S. Moses authored
Differential Revision: https://reviews.llvm.org/D101710
-
Tobias Gysi authored
Ensure the index operations are lowered on all linalg loop lowering paths. Differential Revision: https://reviews.llvm.org/D101827
-
Matthias Springer authored
TransferReadOps that are a scalar read + broadcast are handled by TransferReadToVectorLoadLowering. Differential Revision: https://reviews.llvm.org/D101808
-
Eugene Zhulenev authored
This fixes a performance regression in vec-mat vectorization Reviewed By: asaadaldien Differential Revision: https://reviews.llvm.org/D101795
-
Emilio Cota authored
This approximation matches the one in Eigen. ``` name old cpu/op new cpu/op delta BM_mlir_Log1p_f32/10 83.2ns ± 7% 34.8ns ± 5% -58.19% (p=0.000 n=84+71) BM_mlir_Log1p_f32/100 664ns ± 4% 129ns ± 4% -80.57% (p=0.000 n=82+82) BM_mlir_Log1p_f32/1k 6.75µs ± 4% 0.81µs ± 3% -88.07% (p=0.000 n=88+79) BM_mlir_Log1p_f32/10k 76.5µs ± 3% 7.8µs ± 4% -89.84% (p=0.000 n=80+80) BM_eigen_s_Log1p_f32/10 70.1ns ±14% 72.6ns ±14% +3.49% (p=0.000 n=116+112) BM_eigen_s_Log1p_f32/100 706ns ± 9% 717ns ± 3% +1.60% (p=0.018 n=117+80) BM_eigen_s_Log1p_f32/1k 8.26µs ± 1% 8.26µs ± 1% ~ (p=0.567 n=84+86) BM_eigen_s_Log1p_f32/10k 92.1µs ± 5% 92.6µs ± 6% +0.60% (p=0.047 n=115+115) BM_eigen_v_Log1p_f32/10 31.8ns ±24% 34.9ns ±17% +9.72% (p=0.000 n=98+96) BM_eigen_v_Log1p_f32/100 169ns ±10% 177ns ± 5% +4.66% (p=0.000 n=119+81) BM_eigen_v_Log1p_f32/1k 1.42µs ± 4% 1.46µs ± 8% +2.70% (p=0.000 n=93+113) BM_eigen_v_Log1p_f32/10k 14.4µs ± 5% 14.9µs ± 8% +3.61% (p=0.000 n=115+110) ``` Reviewed By: ezhulenev, ftynse Differential Revision: https://reviews.llvm.org/D101765
-
- May 03, 2021
-
-
MaheshRavishankar authored
Given the source and destination shapes, if they are static, or if the expanded/collapsed dimensions are unit-extent, it is possible to compute the reassociation maps that can be used to reshape one type into another. Add a utility method to return the reassociation maps when possible. This utility function can be used to fuse a sequence of reshape ops, given the type of the source of the producer and the final result type. This pattern supercedes a more constrained folding pattern added to DropUnitDims pass. Differential Revision: https://reviews.llvm.org/D101343
-
MaheshRavishankar authored
Convert subtensor and subtensor_insert operations to use their rank-reduced versions to drop unit dimensions. Differential Revision: https://reviews.llvm.org/D101495
-
thomasraoux authored
The current implementation had a bug as it was relying on the target vector dimension sizes to calculate where to insert broadcast. If several dimensions have the same size we may insert the broadcast on the wrong dimension. The correct broadcast cannot be inferred from the type of the source and destination vector. Instead when we want to extend transfer ops we calculate an "inverse" map to the projected permutation and insert broadcast in place of the projected dimensions. Differential Revision: https://reviews.llvm.org/D101738
-
Frederik Gossen authored
Differential Revision: https://reviews.llvm.org/D101771
-
Frederik Gossen authored
Add dedicated pass `convert-linalg-tiled-loops-to-scf` to lower `linalg.tiled_loop`s. Differential Revision: https://reviews.llvm.org/D101768
-
thomasraoux authored
Differential Revision: https://reviews.llvm.org/D101637
-
thomasraoux authored
Differential Revision: https://reviews.llvm.org/D101643
-
thomasraoux authored
Move TransposeOp lowering in its own populate function as in some cases it is better to keep it during ContractOp lowering to better canonicalize it rather than emiting scalar insert/extract. Differential Revision: https://reviews.llvm.org/D101647
-
Frederik Gossen authored
Differential Revision: https://reviews.llvm.org/D101747
-
William S. Moses authored
Differential Revision: https://reviews.llvm.org/D101705
-
- May 02, 2021
-
-
William S. Moses authored
1) Canonicalize IndexCast(SExt(x)) => IndexCast(x) 2) Provide constant folds of sign_extend and truncate Differential Revision: https://reviews.llvm.org/D101714
-
eopXD authored
Added canonicalization for vector_load and vector_store. An existing pattern SimplifyAffineOp can be reused to compose maps that supplies result into them. Added AffineVectorStoreOp and AffineVectorLoadOp into static_assert of SimplifyAffineOp to allow operation to use it. This fixes the bug filed: https://bugs.llvm.org/show_bug.cgi?id=50058 Reviewed By: bondhugula Differential Revision: https://reviews.llvm.org/D101691
-
- May 01, 2021
-
-
Aart Bik authored
(1) migrates the encoding from TensorDialect into the new SparseTensorDialect (2) replaces dictionary-based storage and builders with struct-like data Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D101669
-
- Apr 30, 2021
-
-
Ahmed Taei authored
Three patterns are added to convert into vector.multi_reduction into a sequence of vector.reduction as the following: - Transpose the inputs so inner most dimensions are always reduction. - Reduce rank of vector.multi_reduction into 2d with inner most reduction dim (get the 2d canical form) - 2D canonical form is converted into a sequence of vector.reduction. There are two things we might worth in a follow up diff: - An scf.for (maybe optionally) around vector.reduction instead of unrolling it. - Breakdown the vector.reduction into a sequence of vector.reduction (e.g tree-based reduction) instead of relying on how downstream dialects handle it. Note: this will requires passing target-vector-length Differential Revision: https://reviews.llvm.org/D101570
-
Aart Bik authored
This is the very first step toward removing the glue and clutter from linalg and replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps into SparseTensorOps of a sparse tensor dialect. This also provides a new home for sparse tensor related transformation. NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter) will follow but I am trying to keep the amount of changes per revision manageable. Differential Revision: https://reviews.llvm.org/D101573
-
- Apr 29, 2021
-
-
Mehdi Amini authored
This reverts commit a6d92a97. The build with -DBUILD_SHARED_LIBS=ON is broken.
-
Aart Bik authored
This is the very first step toward removing the glue and clutter from linalg and replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps into SparseTensorOps of a sparse tensor dialect. This also provides a new home for sparse tensor related transformation. NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter) will follow but I am trying to keep the amount of changes per revision manageable. Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D101488
-
Alex Zinenko authored
This enables to express more complex parallel loops in the affine framework, for example, in cases of tiling by sizes not dividing loop trip counts perfectly or inner wavefront parallelism, among others. One can't use affine.max/min and supply values to the nested loop bounds since the results of such affine.max/min operations aren't valid symbols. Making them valid symbols isn't an option since they would introduce selection trees into memref subscript arithmetic as an unintended and undesired consequence. Also add support for converting such loops to SCF. Drop some API that isn't used in the core repo from AffineParallelOp since its semantics becomes ambiguous in presence of max/min bounds. Loop normalization is currently unavailable for such loops. Depends On D101171 Reviewed By: bondhugula Differential Revision: https://reviews.llvm.org/D101172
-
Alex Zinenko authored
Introduce a basic support for parallelizing affine loops with reductions expressed using iteration arguments. Affine parallelism detector now has a flag to assume such reductions are parallel. The transformation handles a subset of parallel reductions that are can be expressed using affine.parallel: integer/float addition and multiplication. This requires to detect the reduction operation since affine.parallel only supports a fixed set of reduction operators. Reviewed By: chelini, kumasento, bondhugula Differential Revision: https://reviews.llvm.org/D101171
-
Lorenzo Chelini authored
-
Tres Popp authored
-
Tres Popp authored
FillOp allows complex ops, and filling a properly sized buffer with a default zero complex number is implemented. Differential Revision: https://reviews.llvm.org/D99939
-
Nicolas Vasilache authored
This revision adds support for vectorizing more general linalg operations with projected permutation maps. This is achieved by eagerly broadcasting the intermediate vector to the common size of the iteration domain of the linalg op. This allows a much more natural expression of generalized vectorization but may introduce additional computations until all the proper canonicalizations are implemented. This generalization modifies the vector.transfer_read/write permutation logic and exposes the fact that the logic employed in vector.contract was too ad-hoc. As a consequence, changes occur in the permutation / transposition logic for contraction. In turn this prompts supporting more cases in the lowering of contract to matrix intrinsics, which is required to make the corresponding tests pass. Differential revision: https://reviews.llvm.org/D101165
-
- Apr 28, 2021
-
-
MaheshRavishankar authored
Canonicalizations for subtensor operations defaulted to use the rank-reduced version of the operation, but the cast inserted to get back the original type would be illegal if the rank was actually reduced. Instead make the canonicalization not reduce the rank of the operation. Differential Revision: https://reviews.llvm.org/D101258
-
Alexander Belyaev authored
The current canonicalization did not remap operation results correctly and attempted to erase tiledLoop, which is incorrect if not all tensor results are folded.
-
Frederik Gossen authored
This reverts commit dca53610.
-
-
Alexander Belyaev authored
Tensor inputs, if not used in the body of TiledLoopOp, can be removed. memref::CastOp can be folded into TiledLoopOp as well. Differential Revision: https://reviews.llvm.org/D101445
-
Frederik Gossen authored
As a canonicalization, infer the resulting shape rank if possible. Differential Revision: https://reviews.llvm.org/D101377
-