Skip to content
Commit 437958d9 authored by Craig Topper's avatar Craig Topper
Browse files

[X86] Improve SMULO/UMULO codegen for vXi8 vectors.

The default expansion creates a MUL and either a MULHS/MULHU. Each
of those separately expand to sequences that use one or more
PMULLW instructions as well as additional instructions to
extend the types to vXi16. The MULHS/MULHU expansion computes the
whole 16-bit product, but only keeps the high part.

We can improve the lowering of SMULO/UMULO for some cases by using the MULHS/MULHU
expansion, but keep both the high and low parts. And we can use
those parts to calculate the overflow.

For AVX512 we might have vXi1 overflow outputs. We can improve those by using
vpcmpeqw to produce a k register if AVX512BW is enabled. This is a little better
than truncating the high result to use vpcmpeqb. If we don't have avx512bw we
can extend up to v16i32 to use vpcmpeqd to produce a k register.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97624
parent 00c0c8c8
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment