[CostModel][X86] Improve v8i32 MUL costs on AVX1 targets to account for slower btver2
BTVER2 has a 2 cycle throughput for v4i32 multiplies (same as SSE41 targets), which is only partially hidden by the subvector extracts/insert when splitting v8i32.
Loading
Please sign in to comment