[X86] Teach lowerShuffleAsBlend to use bit blend for v16i8/v32i8/v16i16 when...
[X86] Teach lowerShuffleAsBlend to use bit blend for v16i8/v32i8/v16i16 when avx512vl is enabled but not avx512bw. Probably not super important since there are no real CPUs with avx512vl and not avx512bw. But vpternlog should be better than vblendvb. I do wonder if we should use vpternlog even with BWI. We currently use vblendmb or vpblendmw by putting the mask into a GPR and moving it to a k-register. But I don't think we hoist the GPR to k-register copy in machine LICM. Using VPTERNLOG would use a constant pool load, but has the advantage that we're pretty good at hoisting and rematerializing those. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83156
Loading
Please sign in to comment