[AArch64] Use NEON's tbl1 for 16xi8 and 8xi8 build vector with mask.
When using Clang's __builtin_shufflevector with a 16xi8 or 8xi8 source and runtime mask on an AArch64 target, LLVM currently generates 16 or 8 extract+and+insert operations. This patch replaces these inserts with (a vector AND +) NEON's tbl1 intruction. Issue: https://github.com/llvm/llvm-project/issues/60515 Differential Revision: https://reviews.llvm.org/D146212
Loading
Please sign in to comment