Replace the uint64_t -> double convertion algorithm with one that's more efficient. (ac27f0c8) · Commits · Roger Ferrer / llvm-epi-0.8

Commit ac27f0c8 authored Jan 05, 2012 by Bill Wendling

Replace the uint64_t -> double convertion algorithm with one that's more efficient.

This small bit of ASM code is sufficient to do what the old algorithm did:

     movq       %rax,  %xmm0
     punpckldq  (c0),  %xmm0  // c0: (uint4){ 0x43300000U, 0x45300000U, 0U, 0U }
     subpd      (c1),  %xmm0  // c1: (double2){ 0x1.0p52, 0x1.0p52 * 0x1.0p32 }
   #ifdef __SSE3__
     haddpd   %xmm0, %xmm0          
   #else
     pshufd   $0x4e, %xmm0, %xmm1 
     addpd    %xmm1, %xmm0
   #endif

It's arguably faster. One caveat, the 'haddpd' instruction isn't very fast on
all processors.
<rdar://problem/7719814>

llvm-svn: 147593

parent c1b312a5

Hide whitespace changes

Inline Side-by-side

Please register or to comment