Commit 51e434fc authored Jun 24, 2021 by Sjoerd Meijer

[AArch64] Custom lower <4 x i8> loads

This custom lowers <4 x i8> vector loads using a 32-bit load, followed by 2
SSHLL instructions to extend it to e.g. a <4 x i32> vector. Before, it was
really inefficient and expensive to construct a <4 x i32> for this as 4 byte
loads and 4 moves were used. With this improvement SLP vectorisation might for
example become profitable, see D103629.

Differential Revision: https://reviews.llvm.org/D104782

parent 18d7e822

Show whitespace changes

Inline Side-by-side

Please to comment