[Test] We can benefit from pipelining of ymm load/stores
This patch demonstrates a scenario when we need to load/store a single 64-byte value, which is done by 2 ymm loads and stores in AVX. The current codegen choses the following sequence: load ymm0 load ymm1 store ymm1 store ymm0 If we instead stored ymm0 before ymm1, we could execute 2nd load and 1st store in parallel.
Loading
Please sign in to comment