[mlir][ArmSME] Fix loop bounds of masked loads/stores (#78983)
Previously, for masked tile loads/stores we directly used the dimension size from the `vector.create_mask` operation as the upper bound of the `scf.for` over the tile slices. This was not correct, as `create_mask` allows operands to be greater than the size of the vector dimension, in which case the for loop bounds should be clamped to the number of tile slices.
Loading
Please sign in to comment