- May 07, 2021
-
-
MaheshRavishankar authored
The pattern to convert subtensor ops to their rank-reduced versions (by dropping unit-dims in the result) can also convert to a zero-rank tensor. Handle that case. This also fixes a OOB access bug in the existing pattern for such cases. Differential Revision: https://reviews.llvm.org/D101949
-
Rob Suderman authored
Nearly complete alignment to spec v0.22 - Adds Div op - Concat inputs now variadic - Removes Placeholder op Note: TF side PR https://github.com/tensorflow/tensorflow/pull/48921 deletes Concat legalizations to avoid breaking TensorFlow CI. This must be merged only after the TF PR has merged. Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D101958
-
Amy Zhuang authored
Reviewed By: vinayaka-polymage Differential Revision: https://reviews.llvm.org/D101794
-
- May 06, 2021
-
-
Lei Zhang authored
Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D102009
-
River Riddle authored
It is currently stored in the high bits, which is disallowed on certain platforms (e.g. android). This revision switches the representation to use the low bits instead, fixing crashes/breakages on those platforms. Differential Revision: https://reviews.llvm.org/D101969
-
thomasraoux authored
-
thomasraoux authored
This expose a lambda control instead of just a boolean to control unit dimension folding. This however gives more control to user to pick a good heuristic. Folding reshapes helps fusion opportunities but may generate sub-optimal generic ops. Differential Revision: https://reviews.llvm.org/D101917
-
Denys Shabalin authored
Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D101998
-
thomasraoux authored
-
thomasraoux authored
Differential Revision: https://reviews.llvm.org/D101955
-
Christian Sigg authored
Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D101757
-
Navdeep Kumar authored
Add warp synchronous matrix-multiply accumulate ops in GPU and NVVM dialect. Add following three ops to GPU dialect :- 1.) subgroup_mma_load_matrix 2.) subgroup_mma_store_matrix 3.) subgroup_mma_compute Add following three ops to NVVM dialect :- 1.) wmma.m16n16k16.load.[a,b,c].[f16,f32].row.stride 2.) wmma.m16n16k16.store.d.[f16,f32].row.stride 3.) wmma.m16n16k16.mma.row.row.[f16,f32].[f16,f32] Reviewed By: bondhugula, ftynse, ThomasRaoux Differential Revision: https://reviews.llvm.org/D95330
-
Emilio Cota authored
Instead of just checking that we emit something. Differential Revision: https://reviews.llvm.org/D101940
-
MaheshRavishankar authored
Differential Revision: https://reviews.llvm.org/D101956
-
MaheshRavishankar authored
Fixing a minor bug which lead to element type of the output being modified when folding reshapes with generic op. Differential Revision: https://reviews.llvm.org/D101942
-
- May 05, 2021
-
-
Emilio Cota authored
This approximation matches the one in Eigen. ``` name old cpu/op new cpu/op delta BM_mlir_Expm1_f32/10 90.9ns ± 4% 52.2ns ± 4% -42.60% (p=0.000 n=74+87) BM_mlir_Expm1_f32/100 837ns ± 3% 231ns ± 4% -72.43% (p=0.000 n=79+69) BM_mlir_Expm1_f32/1k 8.43µs ± 3% 1.58µs ± 5% -81.30% (p=0.000 n=77+83) BM_mlir_Expm1_f32/10k 83.8µs ± 3% 15.4µs ± 5% -81.65% (p=0.000 n=83+69) BM_eigen_s_Expm1_f32/10 68.8ns ±17% 72.5ns ±14% +5.40% (p=0.000 n=118+115) BM_eigen_s_Expm1_f32/100 694ns ±11% 717ns ± 2% +3.34% (p=0.000 n=120+75) BM_eigen_s_Expm1_f32/1k 7.69µs ± 2% 7.97µs ±11% +3.56% (p=0.000 n=95+117) BM_eigen_s_Expm1_f32/10k 88.0µs ± 1% 89.3µs ± 6% +1.45% (p=0.000 n=74+106) BM_eigen_v_Expm1_f32/10 44.3ns ± 6% 45.0ns ± 8% +1.45% (p=0.018 n=81+111) BM_eigen_v_Expm1_f32/100 351ns ± 1% 360ns ± 9% +2.58% (p=0.000 n=73+99) BM_eigen_v_Expm1_f32/1k 3.31µs ± 1% 3.42µs ± 9% +3.37% (p=0.000 n=71+100) BM_eigen_v_Expm1_f32/10k 33.7µs ± 8% 34.1µs ± 9% +1.04% (p=0.007 n=99+98) ``` Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D101852
-
Rob Suderman authored
Implements support for undialated depthwise convolution using the existing depthwise convolution operation. Once convolutions migrate to yaml defined versions we can rewrite for cleaner implementation. Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D101579
-
Philipp Krones authored
This untangles the MCContext and the MCObjectFileInfo. There is a circular dependency between MCContext and MCObjectFileInfo. Currently this dependency also exists during construction: You can't contruct a MOFI without a MCContext without constructing the MCContext with a dummy version of that MOFI first. This removes this dependency during construction. In a perfect world, MCObjectFileInfo wouldn't depend on MCContext at all, but only be stored in the MCContext, like other MC information. This is future work. This also shifts/adds more information to the MCContext making it more available to the different targets. Namely: - TargetTriple - ObjectFileType - SubtargetInfo Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101462
-
Javier Setoain authored
These instructions map to SVE-specific instrinsics that accept a predicate operand to support control flow in vector code. Differential Revision: https://reviews.llvm.org/D100982
-
Sergei Grechanik authored
This patch adds support for vectorizing loops with 'iter_args' implementing known reductions along the vector dimension. Comparing to the non-vector-dimension case, two additional things are done during vectorization of such loops: - The resulting vector returned from the loop is reduced to a scalar using `vector.reduce`. - In some cases a mask is applied to the vector yielded at the end of the loop to prevent garbage values from being written to the accumulator. Vectorization of reduction loops is disabled by default. To enable it, a map from loops to array of reduction descriptors should be explicitly passed to `vectorizeAffineLoops`, or `vectorize-reductions=true` should be passed to the SuperVectorize pass. Current limitations: - Loops with a non-unit step size are not supported. - n-D vectorization with n > 1 is not supported. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D100694
-
Tobias Gysi authored
The old index op handling let the new index operations point back to the producer block. As a result, after fusion some index operations in the fused block had back references to the old producer block resulting in illegal IR. The patch now relies on a block and value mapping to avoid such back references. Differential Revision: https://reviews.llvm.org/D101887
-
Uday Bondhugula authored
Using a free function verify(<Op>) is error prone. Rename it. Differential Revision: https://reviews.llvm.org/D101886
-
Alexander Belyaev authored
Differential Revision: https://reviews.llvm.org/D101861
-
Javier Setoain authored
While we figure out how to best add Standard support for scalable vectors, these instructions provide a workaround for basic arithmetic between scalable vectors. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D100837
-
William S. Moses authored
Differential Revision: https://reviews.llvm.org/D101798
-
Aart Bik authored
This revision migrates more code from Linalg into the new permanent home of SparseTensor. It replaces the test passes with proper compiler passes. NOTE: the actual removal of the last glue and clutter in Linalg will follow Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D101811
-
- May 04, 2021
-
-
River Riddle authored
We weren't properly visiting region successors when the terminator wasn't return like, which could create incorrect results in the analysis. This revision ensures that we properly visit region successors, to avoid optimistically assuming a value is constant when it isn't. Differential Revision: https://reviews.llvm.org/D101783
-
Rob Suderman authored
All linalg.init operations must be fed into a linalg operation before subtensor. The inserted linalg.fill guarantees it executes correctly. Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D101848
-
William S. Moses authored
See: https://reviews.llvm.org/D101710
-
William S. Moses authored
Differential Revision: https://reviews.llvm.org/D101801
-
William S. Moses authored
Differential Revision: https://reviews.llvm.org/D101710
-
Tobias Gysi authored
Ensure the index operations are lowered on all linalg loop lowering paths. Differential Revision: https://reviews.llvm.org/D101827
-
Adrian Kuegel authored
Differential Revision: https://reviews.llvm.org/D96776
-
Matthias Springer authored
TransferReadOps that are a scalar read + broadcast are handled by TransferReadToVectorLoadLowering. Differential Revision: https://reviews.llvm.org/D101808
-
natashaknk authored
Lowerings equal and arithmetic_right_shift for elementwise ops to linalg dialect using linalg.generic Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D101804
-
Eugene Zhulenev authored
This fixes a performance regression in vec-mat vectorization Reviewed By: asaadaldien Differential Revision: https://reviews.llvm.org/D101795
-
Emilio Cota authored
This approximation matches the one in Eigen. ``` name old cpu/op new cpu/op delta BM_mlir_Log1p_f32/10 83.2ns ± 7% 34.8ns ± 5% -58.19% (p=0.000 n=84+71) BM_mlir_Log1p_f32/100 664ns ± 4% 129ns ± 4% -80.57% (p=0.000 n=82+82) BM_mlir_Log1p_f32/1k 6.75µs ± 4% 0.81µs ± 3% -88.07% (p=0.000 n=88+79) BM_mlir_Log1p_f32/10k 76.5µs ± 3% 7.8µs ± 4% -89.84% (p=0.000 n=80+80) BM_eigen_s_Log1p_f32/10 70.1ns ±14% 72.6ns ±14% +3.49% (p=0.000 n=116+112) BM_eigen_s_Log1p_f32/100 706ns ± 9% 717ns ± 3% +1.60% (p=0.018 n=117+80) BM_eigen_s_Log1p_f32/1k 8.26µs ± 1% 8.26µs ± 1% ~ (p=0.567 n=84+86) BM_eigen_s_Log1p_f32/10k 92.1µs ± 5% 92.6µs ± 6% +0.60% (p=0.047 n=115+115) BM_eigen_v_Log1p_f32/10 31.8ns ±24% 34.9ns ±17% +9.72% (p=0.000 n=98+96) BM_eigen_v_Log1p_f32/100 169ns ±10% 177ns ± 5% +4.66% (p=0.000 n=119+81) BM_eigen_v_Log1p_f32/1k 1.42µs ± 4% 1.46µs ± 8% +2.70% (p=0.000 n=93+113) BM_eigen_v_Log1p_f32/10k 14.4µs ± 5% 14.9µs ± 8% +3.61% (p=0.000 n=115+110) ``` Reviewed By: ezhulenev, ftynse Differential Revision: https://reviews.llvm.org/D101765
-
- May 03, 2021
-
-
MaheshRavishankar authored
Given the source and destination shapes, if they are static, or if the expanded/collapsed dimensions are unit-extent, it is possible to compute the reassociation maps that can be used to reshape one type into another. Add a utility method to return the reassociation maps when possible. This utility function can be used to fuse a sequence of reshape ops, given the type of the source of the producer and the final result type. This pattern supercedes a more constrained folding pattern added to DropUnitDims pass. Differential Revision: https://reviews.llvm.org/D101343
-
Aart Bik authored
Test passes either way, but this is full name of dialect Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D101774
-
MaheshRavishankar authored
Convert subtensor and subtensor_insert operations to use their rank-reduced versions to drop unit dimensions. Differential Revision: https://reviews.llvm.org/D101495
-