Replace the uint64_t -> double convertion algorithm with one that's more efficient.
This small bit of ASM code is sufficient to do what the old algorithm did: movq %rax, %xmm0 punpckldq (c0), %xmm0 // c0: (uint4){ 0x43300000U, 0x45300000U, 0U, 0U } subpd (c1), %xmm0 // c1: (double2){ 0x1.0p52, 0x1.0p52 * 0x1.0p32 } #ifdef __SSE3__ haddpd %xmm0, %xmm0 #else pshufd $0x4e, %xmm0, %xmm1 addpd %xmm1, %xmm0 #endif It's arguably faster. One caveat, the 'haddpd' instruction isn't very fast on all processors. <rdar://problem/7719814> llvm-svn: 147593
Loading
Please register or sign in to comment