[X86][Costmodel] `getReplicationShuffleCost()`: implement cost model for 32/64...
[X86][Costmodel] `getReplicationShuffleCost()`: implement cost model for 32/64 bit-wide elements with AVX512F This models lowering to `vpermd`/`vpermq`/`vpermps`/`vpermpd`, that take a single input vector and a single index vector, and are cross-lane. So far i haven't seen evidence that replication ever results in demanding more than a single input vector per output vector. This results in *shockingly* lesser costs :) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113350
Loading
Please sign in to comment