Skip to content
  • Evan Cheng's avatar
    On recent Intel u-arch's, folding loads into some unary SSE instructions can · 4cf30b72
    Evan Cheng authored
    be non-optimal. To be precise, we should avoid folding loads if the instructions
    only update part of the destination register, and the non-updated part is not
    needed. e.g. cvtss2sd, sqrtss. Unfolding the load from these instructions breaks
    the partial register dependency and it can improve performance. e.g.
    
    movss (%rdi), %xmm0
    cvtss2sd %xmm0, %xmm0
    
    instead of
    cvtss2sd (%rdi), %xmm0
    
    An alternative method to break dependency is to clear the register first. e.g.
    xorps %xmm0, %xmm0
    cvtss2sd (%rdi), %xmm0
    
    llvm-svn: 91672
    4cf30b72
Loading