Skip to content
Commit 2e4ecfde authored by Justin Lebar's avatar Justin Lebar
Browse files

[CUDA] Implement __ldg using intrinsics.

Summary:
Previously it was implemented as inline asm in the CUDA headers.

This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions.  This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.

Reviewers: tra, rsmith

Subscribers: jhen, cfe-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D19990

llvm-svn: 270150
parent b926bdac
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment