[X86] Improve costmodel for scalar byte swaps
Currently we model i16 bswap as very high cost (`10`), which doesn't seem right, with all other being at `1`. Regardless of `MOVBE`, i16 reg-reg bswap is lowered into (an extending move plus) rot-by-8: https://godbolt.org/z/8jrq7fMTj I think it should at worst have throughput of `1`: Since i32/i64 already have cost of `1`, `MOVBE` doesn't improve their costs any further. BUT, `MOVBE` must have at least a single memory operand, with other being a register. Which means, if we have a bswap of load, iff load has a single use, we'll fold bswap into load. Likewise, if we have store of a bswap, iff bswap has a single use, we'll fold bswap into store. So i think we should treat such a bswap as free, unless of course we know that for the particular CPU they are performing badly. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D101924
Loading
Please sign in to comment