Commit 69a3acff authored Jul 15, 2021 by Max Kazantsev

[Test] We can benefit from pipelining of ymm load/stores

This patch demonstrates a scenario when we need to load/store a single
64-byte value, which is done by 2 ymm loads and stores in AVX. The current
codegen choses the following sequence:

  load ymm0
  load ymm1
  store ymm1
  store ymm0

If we instead stored ymm0 before ymm1, we could execute 2nd load and 1st store
in parallel.

parent dfa76933

Show whitespace changes

Inline Side-by-side

Please to comment