[AArch64] Add patterns for scalar FMUL, FMULX
Scalar FMUL, FMULX instructions perform better or the same compared to indexed FMUL, FMULX. For example, the Arm Cortex-A55 Software Optimization Guide lists the following instructions with a throughput of 2 IPC: - "FP multiply" FMUL - "ASIMD FP multiply" FMULX whereas it lists the following with a throughput of 1 IPC: - "ASIMD FP multiply, by element" FMUL, FMULX The Arm Cortex-A510 Software Optimization Guide, however, does not separately list "by element" variants of the "ASIMD FP multiply" instructions, which are listed with the same throughput as the non-ASIMD ones. Fixes #60817. Differential Revision: https://reviews.llvm.org/D153207
Loading
Please sign in to comment