[X86][AVX] Attempt to scale masked shuffles to match the root type
Improve the chances of folding the writemask into the combined shuffle by scaling a wider shuffle mask to match the root's original type. This creates a few minor issues with variable shuffles, preventing combines of shuffles because of the more limited support binary shuffle types. In most cases we're probably better off combining the shuffles and losing the writemask fold, but this isn't always going to be true.
Loading
Please sign in to comment