[mlir] [VectorOps] Improve scatter/gather CPU performance
Replaced the linearized address with the proper LLVM way of defining vector of base + indices in SIMD style. This yields much better code. Some prototype results with microbencmarking sparse matrix x vector with 50% sparsity (about 2-3x faster): LINEARIZED IMPROVED GFLOPS sdot saxpy sdot saxpy 16x16 1.6 1.4 4.4 2.1 32x32 1.7 1.6 5.8 5.9 64x64 1.7 1.7 6.4 6.4 128x128 1.7 1.7 5.9 5.9 256x256 1.6 1.6 6.1 6.0 512x512 1.4 1.4 4.9 4.7 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D84368
Loading
Please sign in to comment