[mlir][ArmSME] Add support for vector.transfer_read with transpose (#67527)
This patch adds support for lowering a vector.transfer_read with a transpose permutation map to a vertical tile load, for example: vector.transfer_read ... permutation_map: (d0, d1) -> (d1, d0) is converted to: arm_sme.tile_load ... <vertical> On SME the transpose can be done in-flight, rather than as a separate operation as in the TransferReadPermutationLowering, which would do the following: %0 = vector.transfer_read ... vector.transpose %0, [1, 0] ... The lowering doesn't support masking yet and the transfer_read must be in-bounds. It also intentionally doesn't handle simple loads as transfer_write currently does, as the generic TransferReadToVectorLoadLowering can lower these to simple vector.load ops, which can already be lowered to ArmSME. A subsequent patch will update the existing transfer_write lowering, this is a separate patch as there is currently no lowering for vector.transfer_read.
Loading
Please sign in to comment