[AMDGPU] Vectorize misaligned global loads & stores (b89236a9) · Commits · Lorenzo Albano / LLVM bpEVL

Commit b89236a9 authored Mar 01, 2023 by Jeffrey Byrnes

[AMDGPU] Vectorize misaligned global loads & stores

Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments.

Differential Revision: https://reviews.llvm.org/D145170

Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1

parent 7442f863

Hide whitespace changes

Inline Side-by-side

Please register or to comment