[AArch64] Improve codegen for get.active.lane.mask when SVE is available
When lowering the get.active.lane.mask intrinsic with a fixed-width predicate vector result, we can actually make use of the SVE whilelo instruction when SVE is enabled. We do this by carefully choosing a sensible VT for the whilelo instruction, then promoting it to an integer vector, i.e. nxv16i1 -> nx16i8. We can then extract a v16i8 subvector and truncate back to the original return type, i.e. v16i1. This leads to a significant improvement in code quality. Differential Revision: https://reviews.llvm.org/D116664
Loading
Please register or sign in to comment