[X86] Rewrite `getScalarizationOverhead()`
All of our insert/extract ops work on 128-bit lanes. For `Insert`, we need to extract affected 128-bit lane, unless it's being fully overwritten (FIXME: do we need to be careful about legalization-induced padding that we obviously don't demand?), perform insertions, and then insert the 128-bit lane back. But hold on. If we are operating on an 256-bit legal vector, and thus have two 128-bit subvectors, and are fully overwriting them both, we don't actually need to insert *both* subvectors, only the second one, into the implicitly-widened first one. Also, `Insert` wasn't actually querying the costs, but just assuming them to be `1`. `getShuffleCost(TTI::SK_ExtractSubvector)` notes: ``` // Note that in general, the insertion starting at the beginning of a vector // isn't free, because we need to preserve the rest of the wide vector. ``` ... so as far as i can tell, we didn't account for that. I was hoping this would allow vectorization at a higher VF at one case i looked at, but the subvector insertion cost is still dis-advising that. The change for `Extract` is NFC, and is for consistency only, i wanted to get rid of of that weird explicit discounting of insertion of 0'th element, since the general code should already deal with that. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D137913
Loading
Please sign in to comment