Commits · 05a89312d812bb5dcec6deca8f1e28a198ce1167 · Lorenzo Albano / LLVM bpEVL

May 07, 2021

[mlir][Linalg] Allow folding to rank-zero tensor when using rank-reducing subtensors. · 05a89312

MaheshRavishankar authored May 06, 2021

The pattern to convert subtensor ops to their rank-reduced versions
(by dropping unit-dims in the result) can also convert to a zero-rank
tensor. Handle that case.
This also fixes a OOB access bug in the existing pattern for such
cases.

Differential Revision: https://reviews.llvm.org/D101949

05a89312

May 06, 2021

[mlir][spirv] NFC: Replace OwningSPIRVModuleRef with OwningOpRef · 41bc54cc
Lei Zhang authored May 06, 2021
```
Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D102009
```
41bc54cc
[mlir][vector] Fix typo · 71eb32d9
thomasraoux authored May 06, 2021

71eb32d9

[mlir][linalg][NFC] Make reshape folding control more fine grain · 52525cb2

thomasraoux authored May 06, 2021

This expose a lambda control instead of just a boolean to control unit
dimension folding.
This however gives more control to user to pick a good heuristic.
Folding reshapes helps fusion opportunities but may generate sub-optimal
generic ops.

Differential Revision: https://reviews.llvm.org/D101917

52525cb2

[mlir][NFC] Fix warning in VectorTransforms.cpp · 933551ea
thomasraoux authored May 06, 2021

933551ea
[mlir][vector] add pattern to cast away lead unit dimension for broadcast op · 0b303da6
thomasraoux authored May 05, 2021
```
Differential Revision: https://reviews.llvm.org/D101955
```
0b303da6
[mlir] Add support for ops with regions in 'gpu-async-region' rewriter. · a0d019fc
Christian Sigg authored May 05, 2021
```
Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D101757
```
a0d019fc

[MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops · 875eb523

Navdeep Kumar authored May 06, 2021

Add warp synchronous matrix-multiply accumulate ops in GPU and NVVM
dialect. Add following three ops to GPU dialect :-
  1.) subgroup_mma_load_matrix
  2.) subgroup_mma_store_matrix
  3.) subgroup_mma_compute
Add following three ops to NVVM dialect :-
  1.) wmma.m16n16k16.load.[a,b,c].[f16,f32].row.stride
  2.) wmma.m16n16k16.store.d.[f16,f32].row.stride
  3.) wmma.m16n16k16.mma.row.row.[f16,f32].[f16,f32]

Reviewed By: bondhugula, ftynse, ThomasRaoux

Differential Revision: https://reviews.llvm.org/D95330

875eb523

[mlir][Linalg] Fix element type of results when folding reshapes. · b6060b76

MaheshRavishankar authored May 05, 2021

Fixing a minor bug which lead to element type of the output being
modified when folding reshapes with generic op.

Differential Revision: https://reviews.llvm.org/D101942

b6060b76

May 05, 2021

[mlir] Add polynomial approximation for math::ExpM1 · 0edc4bc8

Emilio Cota authored May 05, 2021

This approximation matches the one in Eigen.

```
name                      old cpu/op  new cpu/op  delta
BM_mlir_Expm1_f32/10      90.9ns ± 4%  52.2ns ± 4%  -42.60%    (p=0.000 n=74+87)
BM_mlir_Expm1_f32/100      837ns ± 3%   231ns ± 4%  -72.43%    (p=0.000 n=79+69)
BM_mlir_Expm1_f32/1k      8.43µs ± 3%  1.58µs ± 5%  -81.30%    (p=0.000 n=77+83)
BM_mlir_Expm1_f32/10k     83.8µs ± 3%  15.4µs ± 5%  -81.65%    (p=0.000 n=83+69)
BM_eigen_s_Expm1_f32/10   68.8ns ±17%  72.5ns ±14%   +5.40%  (p=0.000 n=118+115)
BM_eigen_s_Expm1_f32/100   694ns ±11%   717ns ± 2%   +3.34%   (p=0.000 n=120+75)
BM_eigen_s_Expm1_f32/1k   7.69µs ± 2%  7.97µs ±11%   +3.56%   (p=0.000 n=95+117)
BM_eigen_s_Expm1_f32/10k  88.0µs ± 1%  89.3µs ± 6%   +1.45%   (p=0.000 n=74+106)
BM_eigen_v_Expm1_f32/10   44.3ns ± 6%  45.0ns ± 8%   +1.45%   (p=0.018 n=81+111)
BM_eigen_v_Expm1_f32/100   351ns ± 1%   360ns ± 9%   +2.58%    (p=0.000 n=73+99)
BM_eigen_v_Expm1_f32/1k   3.31µs ± 1%  3.42µs ± 9%   +3.37%   (p=0.000 n=71+100)
BM_eigen_v_Expm1_f32/10k  33.7µs ± 8%  34.1µs ± 9%   +1.04%    (p=0.007 n=99+98)
```

Reviewed By: ezhulenev

Differential Revision: https://reviews.llvm.org/D101852

0edc4bc8

[MC] Untangle MCContext and MCObjectFileInfo · 632ebc4a

Philipp Krones authored May 05, 2021

This untangles the MCContext and the MCObjectFileInfo. There is a circular
dependency between MCContext and MCObjectFileInfo. Currently this dependency
also exists during construction: You can't contruct a MOFI without a MCContext
without constructing the MCContext with a dummy version of that MOFI first.
This removes this dependency during construction. In a perfect world,
MCObjectFileInfo wouldn't depend on MCContext at all, but only be stored in the
MCContext, like other MC information. This is future work.

This also shifts/adds more information to the MCContext making it more
available to the different targets. Namely:

- TargetTriple
- ObjectFileType
- SubtargetInfo

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101462

632ebc4a

[mlir][ArmSVE] Add masked arithmetic operations · 95861216

Javier Setoain authored Apr 19, 2021

These instructions map to SVE-specific instrinsics that accept a
predicate operand to support control flow in vector code.

Differential Revision: https://reviews.llvm.org/D100982

95861216

[mlir][Affine][Vector] Support vectorizing reduction loops · d80b04ab

Sergei Grechanik authored May 05, 2021

This patch adds support for vectorizing loops with 'iter_args'
implementing known reductions along the vector dimension. Comparing to
the non-vector-dimension case, two additional things are done during
vectorization of such loops:
- The resulting vector returned from the loop is reduced to a scalar
  using `vector.reduce`.
- In some cases a mask is applied to the vector yielded at the end of
  the loop to prevent garbage values from being written to the
  accumulator.

Vectorization of reduction loops is disabled by default. To enable it, a
map from loops to array of reduction descriptors should be explicitly passed to
`vectorizeAffineLoops`, or `vectorize-reductions=true` should be passed
to the SuperVectorize pass.

Current limitations:
- Loops with a non-unit step size are not supported.
- n-D vectorization with n > 1 is not supported.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D100694

d80b04ab

[mlir][linalg] Fix bug in the fusion on tensors index op handling. · 4a6ee23d

Tobias Gysi authored May 05, 2021

The old index op handling let the new index operations point back to the
producer block. As a result, after fusion some index operations in the
fused block had back references to the old producer block resulting in
illegal IR. The patch now relies on a block and value mapping to avoid
such back references.

Differential Revision: https://reviews.llvm.org/D101887

4a6ee23d

[mlir] Use ReassociationIndices instead of affine maps in linalg.reshape. · 2865d114
Alexander Belyaev authored May 05, 2021
```
Differential Revision: https://reviews.llvm.org/D101861
```
2865d114

[mlir][ArmSVE] Add basic arithmetic operations · 001d601a

Javier Setoain authored May 05, 2021

While we figure out how to best add Standard support for scalable
vectors, these instructions provide a workaround for basic arithmetic
between scalable vectors.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D100837

001d601a

[MLIR][SCF] Combine adjacent scf.if with same condition · f4a2dbfe
William S. Moses authored May 03, 2021
```
Differential Revision: https://reviews.llvm.org/D101798
```
f4a2dbfe

[mlir][sparse] Introduce proper sparsification passes · a2c9d4bb

Aart Bik authored May 03, 2021

This revision migrates more code from Linalg into the new permanent home of
SparseTensor. It replaces the test passes with proper compiler passes.

NOTE: the actual removal of the last glue and clutter in Linalg will follow

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D101811

a2c9d4bb

May 04, 2021

[MLIR] Add not icmp canonicalization documentation · cb395b84
William S. Moses authored May 04, 2021
```
See: https://reviews.llvm.org/D101710
```
cb395b84
[MLIR][SCF] Assume uses of condition in the body of scf.while is true · 8e211bf1
William S. Moses authored May 03, 2021
```
Differential Revision: https://reviews.llvm.org/D101801
```
8e211bf1
[MLIR] Replace a not of a comparison with appropriate comparison · 93297e4b
William S. Moses authored May 02, 2021
```
Differential Revision: https://reviews.llvm.org/D101710
```
93297e4b

[mlir][linalg] Always lower index operations during loop lowering. · 05d2297b

Tobias Gysi authored May 04, 2021

Ensure the index operations are lowered on all linalg loop lowering paths.

Differential Revision: https://reviews.llvm.org/D101827

05d2297b

[mlir] Fix bug in TransferOpReduceRank when all dims are broadcasts · aa582819

Matthias Springer authored May 04, 2021

TransferReadOps that are a scalar read + broadcast are handled by TransferReadToVectorLoadLowering.

Differential Revision: https://reviews.llvm.org/D101808

aa582819

[mlir] Linalg: add vector transfer lowering patterns to the contraction lowering · 9b67096f

Eugene Zhulenev authored May 03, 2021

This fixes a performance regression in vec-mat vectorization

Reviewed By: asaadaldien

Differential Revision: https://reviews.llvm.org/D101795

9b67096f

[mlir] Add polynomial approximation for math::Log1p · 1c0374e7

Emilio Cota authored May 03, 2021

This approximation matches the one in Eigen.

```
name                      old cpu/op  new cpu/op  delta
BM_mlir_Log1p_f32/10      83.2ns ± 7%  34.8ns ± 5%  -58.19%    (p=0.000 n=84+71)
BM_mlir_Log1p_f32/100      664ns ± 4%   129ns ± 4%  -80.57%    (p=0.000 n=82+82)
BM_mlir_Log1p_f32/1k      6.75µs ± 4%  0.81µs ± 3%  -88.07%    (p=0.000 n=88+79)
BM_mlir_Log1p_f32/10k     76.5µs ± 3%   7.8µs ± 4%  -89.84%    (p=0.000 n=80+80)
BM_eigen_s_Log1p_f32/10   70.1ns ±14%  72.6ns ±14%   +3.49%  (p=0.000 n=116+112)
BM_eigen_s_Log1p_f32/100   706ns ± 9%   717ns ± 3%   +1.60%   (p=0.018 n=117+80)
BM_eigen_s_Log1p_f32/1k   8.26µs ± 1%  8.26µs ± 1%     ~       (p=0.567 n=84+86)
BM_eigen_s_Log1p_f32/10k  92.1µs ± 5%  92.6µs ± 6%   +0.60%  (p=0.047 n=115+115)
BM_eigen_v_Log1p_f32/10   31.8ns ±24%  34.9ns ±17%   +9.72%    (p=0.000 n=98+96)
BM_eigen_v_Log1p_f32/100   169ns ±10%   177ns ± 5%   +4.66%   (p=0.000 n=119+81)
BM_eigen_v_Log1p_f32/1k   1.42µs ± 4%  1.46µs ± 8%   +2.70%   (p=0.000 n=93+113)
BM_eigen_v_Log1p_f32/10k  14.4µs ± 5%  14.9µs ± 8%   +3.61%  (p=0.000 n=115+110)
```

Reviewed By: ezhulenev, ftynse

Differential Revision: https://reviews.llvm.org/D101765

1c0374e7

May 03, 2021

[mlir][Linalg] Add a utility method to get reassociations maps for reshape. · a6e09391

MaheshRavishankar authored May 03, 2021

Given the source and destination shapes, if they are static, or if the
expanded/collapsed dimensions are unit-extent, it is possible to
compute the reassociation maps that can be used to reshape one type
into another. Add a utility method to return the reassociation maps
when possible.

This utility function can be used to fuse a sequence of reshape ops,
given the type of the source of the producer and the final result
type. This pattern supercedes a more constrained folding pattern added
to DropUnitDims pass.

Differential Revision: https://reviews.llvm.org/D101343

a6e09391

[mlir][Linalg] Use rank-reduced versions of subtensor and subtensor insert when possible. · fd15e2b8

MaheshRavishankar authored May 03, 2021

Convert subtensor and subtensor_insert operations to use their
rank-reduced versions to drop unit dimensions.

Differential Revision: https://reviews.llvm.org/D101495

fd15e2b8

[mlir][linalg] Fix vectorization bug in vector transfer indexing map calculation · 9621c1ef

thomasraoux authored May 03, 2021

The current implementation had a bug as it was relying on the target vector
dimension sizes to calculate where to insert broadcast. If several dimensions
have the same size we may insert the broadcast on the wrong dimension. The
correct broadcast cannot be inferred from the type of the source and
destination vector.

Instead when we want to extend transfer ops we calculate an "inverse" map to the
projected permutation and insert broadcast in place of the projected dimensions.

Differential Revision: https://reviews.llvm.org/D101738

9621c1ef

[MLIR][Linalg] Avoid forward declaration in `Loops.cpp` · 456efbc0
Frederik Gossen authored May 03, 2021
```
Differential Revision: https://reviews.llvm.org/D101771
```
456efbc0

[MLIR][Linalg] Lower `linalg.tiled_loop` in a separate pass · ec339163

Frederik Gossen authored May 03, 2021

Add dedicated pass `convert-linalg-tiled-loops-to-scf` to lower
`linalg.tiled_loop`s.

Differential Revision: https://reviews.llvm.org/D101768

ec339163

[mlir][vector] Extend vector transfer unrolling to support permutations and broadcast · f44c76d6
thomasraoux authored May 03, 2021
```
Differential Revision: https://reviews.llvm.org/D101637
```
f44c76d6
[mlir][vector] Add canonicalization for extract/insert -> shapecast · 7417541f
thomasraoux authored May 03, 2021
```
Differential Revision: https://reviews.llvm.org/D101643
```
7417541f

[mlir][vector][NFC] split TransposeOp lowerning out of contractLowering · be8e2801

thomasraoux authored May 03, 2021

Move TransposeOp lowering in its own populate function as in some cases
it is better to keep it during ContractOp lowering to better
canonicalize it rather than emiting scalar insert/extract.

Differential Revision: https://reviews.llvm.org/D101647

be8e2801

[MLIR][Linalg] Lower `linalg.tiled_loop` to `scf` loops · d2a291a5
Frederik Gossen authored May 03, 2021
```
Differential Revision: https://reviews.llvm.org/D101747
```
d2a291a5
[MLIR] Canonicalize sub/add of a constant and another sub/add of a constant · 039bdcc0
William S. Moses authored May 01, 2021
```
Differential Revision: https://reviews.llvm.org/D101705
```
039bdcc0

May 02, 2021

[MLIR] Canonicalization of Integer Cast Operations · 78720296

William S. Moses authored May 02, 2021

1) Canonicalize IndexCast(SExt(x)) => IndexCast(x)
2) Provide constant folds of sign_extend and truncate

Differential Revision: https://reviews.llvm.org/D101714

78720296

[mlir] [affine] add canonicalization for affine.vector_load, vector_store · 0c1ff26b

eopXD authored May 02, 2021

Added canonicalization for vector_load and vector_store. An existing
pattern SimplifyAffineOp can be reused to compose maps that supplies
result into them. Added AffineVectorStoreOp and AffineVectorLoadOp
into static_assert of SimplifyAffineOp to allow operation to use it.

This fixes the bug filed: https://bugs.llvm.org/show_bug.cgi?id=50058

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D101691

0c1ff26b

May 01, 2021

[mlir][sparse] sparse tensor type encoding migration (new home, new builders) · 0a292199

Aart Bik authored Apr 30, 2021

(1) migrates the encoding from TensorDialect into the new SparseTensorDialect
(2) replaces dictionary-based storage and builders with struct-like data

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D101669

0a292199

Apr 30, 2021

Add patterns to lower vector.multi_reduction into a sequence of vector.reduction · 499e89fc

Ahmed Taei authored Apr 29, 2021

Three patterns are added to convert into vector.multi_reduction into a
sequence of vector.reduction as the following:

- Transpose the inputs so inner most dimensions are always reduction.
- Reduce rank of vector.multi_reduction into 2d with inner most
reduction dim (get the 2d canical form)
- 2D canonical form is converted into a sequence of vector.reduction.

There are two things we might worth in a follow up diff:

- An scf.for (maybe optionally) around vector.reduction instead of unrolling it.
- Breakdown the vector.reduction into a sequence of vector.reduction
(e.g tree-based reduction) instead of relying on how downstream dialects
handle it.
  Note: this will requires passing target-vector-length

Differential Revision: https://reviews.llvm.org/D101570

499e89fc

[mlir][sparse] migrate sparse operations into new sparse tensor dialect · 319072f4

Aart Bik authored Apr 29, 2021

This is the very first step toward removing the glue and clutter from linalg and
replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps
into SparseTensorOps of a sparse tensor dialect. This also provides a new home for
sparse tensor related transformation.

NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter)
will follow but I am trying to keep the amount of changes per revision manageable.

Differential Revision: https://reviews.llvm.org/D101573

319072f4