Commit c88f724f authored Mar 20, 2015 by Sanjay Patel

[X86] Prefer blendps over insertps codegen for one special case

With this patch, for this one exact case, we'll generate:

  blendps %xmm0, %xmm1, $1

instead of:

  insertps %xmm0, %xmm1, $0

If there's a memory operand available for load folding and we're
optimizing for size, we'll still generate the insertps.

The detailed performance data motivation for this may be found in D7866; 
in summary, blendps has 2-3x throughput vs. insertps on widely used chips.

Differential Revision: http://reviews.llvm.org/D8332

llvm-svn: 232850

parent 03ad6161

Show whitespace changes

Inline Side-by-side

Please to comment