[AArch64] Improve codegen for "trunc <4 x i64> to <4 x i8>" for all cases
To achieve this, we need this observation: `uzp1` is just a `xtn` that operates on two registers For example, given the following register with type v2i64: LSB_______MSB x0 x1 x2 x3 Applying xtn on it we get: x0 x2 This is equivalent to bitcast it to v4i32, and then applying uzp1 on it: x0 x1 x2 x3 | uzp1 v x0 x2 <value from other register> We can transform xtn to uzp1 by this observation, and vice versa. This observation only works on little endian target. Big endian target has a problem: the uzp1 cannot be replaced by xtn since there is a discrepancy in the behavior of uzp1 between the little endian and big endian. To illustrate, take the following for example: LSB____________________MSB x0 x1 x2 x3 On little endian, uzp1 grabs x0 and x2, which is right; on big endian, it grabs x3 and x1, which doesn't match what I saw on the document. But, since I'm new to AArch64, take my word with a pinch of salt. This bevavior is observed on gdb, maybe there's issue in the order of the value printed by it ? Whatever the reason is, the execution result given by qemu just doesn't match. So I disable this on big endian target temporarily until we find the crux. Fixes #57502 Reviewed By: dmgreen, mingmingl Co-authored-by:Mingming Liu <mingmingl@google.com> Differential Revision: https://reviews.llvm.org/D133850
Loading
Please sign in to comment