[X86][SSE] Combine v16i8 SHL by constants to multiplies
Pre-AVX512 (which can perform a quick extend/shift/truncate), extending to 2 v8i16 for the PMULLW and then truncating is more performant than relying on the generic PBLENDVB vXi8 shift path and uses a similar amount of mask constant pool data. Differential Revision: https://reviews.llvm.org/D48963 llvm-svn: 336513
Loading
Please register or sign in to comment