[AArch64] break non-temporal loads over 256 into 256-loads and a smaller load
Currently over 256 non-temporal loads are broken inefficently. For example, `v17i32` gets broken into 2 128-bit loads. It is better if we can use 256-bit loads instead. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133421
Loading
Please sign in to comment