Skip to content
Commit e2086e74 authored by Evan Cheng's avatar Evan Cheng
Browse files

Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during ...

Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during                                                                                                                                    
isel lowering to fold the zero-extend's and take advantage of no-stall
back to back vmul + vmla:
 vmull q0, d4, d6
 vmlal q0, d5, d6
is faster than
 vaddl q0, d4, d5
 vmovl q1, d6                                                                                                                                                                             
 vmul  q0, q0, q1

This allows us to vmull + vmlal for:
    f = vmull_u8(   vget_high_u8(s), c);
    f = vmlal_u8(f, vget_low_u8(s),  c);

rdar://9197392

llvm-svn: 128444
parent 03325c4b
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment