AMDGPU: Custom lower vector_shuffle for v4i16/v4f16
Ordinarily it is lowered as a build_vector of each extract_vector_elt, which in turn get lowered to bitcasts and bit shifts. Very little understand the lowered extract pattern, resulting in much worse code. We treat concat_vectors of v2i16 as legal, so prefer that. llvm-svn: 364959
Loading
Please sign in to comment