Skip to content
Commit 76268ac6 authored by Benjamin Kramer's avatar Benjamin Kramer
Browse files

X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available.

pmuludq is slow, but it turns out that all the unpacking and packing of the
scalarized mul is even slower. 10% speedup on loop-vectorized paq8p.

llvm-svn: 170985
parent b2f0a2bd
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment