[NVPTX] run LSR before straight-line optimizations
Summary: Straight-line optimizations can simplify the loop body and make LSR's cost analysis more precise. This significantly improves several Eigen3 CUDA benchmarks. With this change, EigenContractionKernel runs up to 40% faster (https://bitbucket.org/eigen/eigen/src/753ceee5f206ff7dde9f6a41a5a420749fc9406f/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h?at=default#cl-502). EigenConvolutionKernel2D runs up to 10% faster (https://bitbucket.org/eigen/eigen/src/753ceee5f206ff7dde9f6a41a5a420749fc9406f/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h?at=default#cl-605). I have some difficulties writing small tests that benefit from this reordering due to a seemingly issue with LSR (being discussed at http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088244.html). See the review thread for the compilation time impact of GVN. Reviewers: eliben, jholewinski Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11304 llvm-svn: 242982
Loading
Please sign in to comment