Commit 610a2f65 authored Jul 13, 2016 by Sanjay Patel

[x86][SSE/AVX] optimize pcmp results better (PR28484)

We know that pcmp produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading.

One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the
same throughput potential as load+and on those cores, but that should be handled as a CPU-specific later transformation if
it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the
test cases is filed as PR28505:
https://llvm.org/bugs/show_bug.cgi?id=28505

Differential Revision: http://reviews.llvm.org/D22225

llvm-svn: 275276

parent 9dfe4e7c

Show whitespace changes

Inline Side-by-side

Please to comment