Skip to content
Commit 9aa39481 authored by David Green's avatar David Green
Browse files

[AArch64] Prefer to fold dup into fmul/fma as opposed to ld1r

There is a fold to create LD1DUPpost from dup(load) that can be postinc. If the
dup is used by a "by element" operation such as fmul or fma then it can be
slightly better to fold the dup into the fmul instead, which produces slightly
fast code.

  ld1r { v1.4s }, [x0], #4
  fmul v0.4s, v1.4s, v0.4s
vs
  ldr s1, [x0], #4
  fmul v0.4s, v0.4s, v1.s[0]

This could also be done with integer operations such as smull/umull too, so
long as the load/dup gets correctly combined into the mul operation. Currently
this just operates on foating point types.

Differential Revision: https://reviews.llvm.org/D145184
parent 912404db
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment