[AArch64] Add more efficient bitwise vector reductions.
Improves the codegen for VECREDUCE_{AND,OR,XOR} operations on AArch64. Currently, these are fully scalarized, except if the vector is a <N x i1>. This patch improves the codegen down to O(log(N)) where N is the length of the vector for vectors whose elements are not i1, by repeatedly applying the bitwise operations to the two halves of the vector. <N x i1> bitwise reductions are handled using VECREDUCE_{UMAX,UMIN,ADD} instead. I had to update quite a few codegen tests with these changes, with a general downward trend in instruction count. Since the vector reductions already have tests, I haven't added any new tests myself. Differential Revision: https://reviews.llvm.org/D148185
Loading
Please register or sign in to comment