[AArch64LoadStoreOptimizer] Recommit: Generate more STPs by renaming registers earlier
This is a recommit that fixes unwanted STP generation by checking that the base register has not been modified or used elsewhere. Our initial motivating case was memcpy's with alignments > 16. The loads/stores, to which small memcpy's expand, are kept together in several places so that we get a sequence like this for a 64 bit copy: LD w0 LD w1 ST w0 ST w1 The load/store optimiser can generate a LDP/STP w0, w1 from this because the registers read/written are consecutive. In our case however, the sequence is optimised during ISel, resulting in: LD w0 ST w0 LD w0 ST w0 This instruction reordering allows reuse of registers. Since the registers are no longer consecutive (i.e. they are the same), it inhibits LDP/STP creation. The approach here is to perform renaming: LD w0 ST w0 LD w1 ST w1 to enable the folding of the stores into a STP. We do not yet generate the LDP due to a limitation in the renaming implementation, but plan to look at that in a follow-up so that we fully support this case. While this was initially motivated by certain memcpy's, this is a general approach and thus is beneficial for other cases too, as can be seen in some test changes. Differential Revision: https://reviews.llvm.org/D103597
Loading
Please register or sign in to comment