Commit 62fc58a6 authored Oct 13, 2022 by Sheng

[AArch64] Improve codegen for "trunc <4 x i64> to <4 x i8>" for all cases



To achieve this, we need this observation:

`uzp1` is just a `xtn` that operates on two registers

For example, given the following register with type v2i64:

LSB_______MSB

x0 x1	x2 x3

Applying xtn on it we get:

x0	x2

This is equivalent to bitcast it to v4i32, and then applying uzp1 on it:

x0	x1	x2	x3
   |
  uzp1
   v
x0	x2	<value from other register>

We can transform xtn to uzp1 by this observation, and vice versa.

This observation only works on little endian target. Big endian target has
a problem: the uzp1 cannot be replaced by xtn since there is a discrepancy
in the behavior of uzp1 between the little endian and big endian.

To illustrate, take the following for example:

LSB____________________MSB

x0	x1	x2	x3

On little endian, uzp1 grabs x0 and x2, which is right; on big endian, it
grabs x3 and x1, which doesn't match what I saw on the document. But, since
I'm new to AArch64, take my word with a pinch of salt. This bevavior is
observed on gdb, maybe there's issue in the order of the value printed by it ?

Whatever the reason is, the execution result given by qemu just doesn't match.
So I disable this on big endian target temporarily until we find the crux.

Fixes #57502

Reviewed By: dmgreen, mingmingl

Co-authored-by: Mingming Liu <mingmingl@google.com>

Differential Revision: https://reviews.llvm.org/D133850

parent b13f7f9c

Show whitespace changes

Inline Side-by-side

Please to comment