[AArch64] Prefer to fold dup into fmul/fma as opposed to ld1r (9aa39481) · Commits · Lorenzo Albano / LLVM bpEVL

Commit 9aa39481 authored Mar 07, 2023 by David Green

[AArch64] Prefer to fold dup into fmul/fma as opposed to ld1r

There is a fold to create LD1DUPpost from dup(load) that can be postinc. If the
dup is used by a "by element" operation such as fmul or fma then it can be
slightly better to fold the dup into the fmul instead, which produces slightly
fast code.

  ld1r { v1.4s }, [x0], #4
  fmul v0.4s, v1.4s, v0.4s
vs
  ldr s1, [x0], #4
  fmul v0.4s, v0.4s, v1.s[0]

This could also be done with integer operations such as smull/umull too, so
long as the load/dup gets correctly combined into the mul operation. Currently
this just operates on foating point types.

Differential Revision: https://reviews.llvm.org/D145184

parent 912404db

Hide whitespace changes

Inline Side-by-side

Please register or to comment