[X86] Custom type legalize v2i32 smulo/umulo to use a single pmuldq/pmuludq.
With SSE4.1 and above we were using 3 multiply instructions. This was due to type legalization widening to v4i32 and the low half being done with pmulld while the high half used two pmuldq/pmuludq. Instead of that, we can use a single pmuludq/pmuldq to calculate the full product at once, extract the high and low bits and compare to check for overflow. I've restricted SMULO to sse4.1 to get pmuldq. We can probably do a fixup to pmuludq on earlier targets, but that's for another day. I was going through my git stash and found an early version of this patch from a year or two ago so I went ahead and finished it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130432
Loading
Please sign in to comment