Skip to content
  • Chris Lattner's avatar
    When we have a two-address instruction where the input cannot be clobbered · 84e95d00
    Chris Lattner authored
    and is already available, instead of falling back to emitting a load, fall
    back to emitting a reg-reg copy.  This generates significantly better code
    for some SSE testcases, as SSE has lots of two-address instructions and
    none of them are read/modify/write.  As one example, this change does:
    
            pshufd %XMM5, XMMWORD PTR [%ESP + 84], 255
            xorps %XMM2, %XMM5
            cmpltps %XMM1, %XMM0
    -       movaps XMMWORD PTR [%ESP + 52], %XMM0
    -       movapd %XMM6, XMMWORD PTR [%ESP + 52]
    +       movaps %XMM6, %XMM0
            cmpltps %XMM6, XMMWORD PTR [%ESP + 68]
            movapd XMMWORD PTR [%ESP + 52], %XMM6
            movaps %XMM6, %XMM0
            cmpltps %XMM6, XMMWORD PTR [%ESP + 36]
            cmpltps %XMM3, %XMM0
    -       movaps XMMWORD PTR [%ESP + 20], %XMM0
    -       movapd %XMM7, XMMWORD PTR [%ESP + 20]
    +       movaps %XMM7, %XMM0
            cmpltps %XMM7, XMMWORD PTR [%ESP + 4]
            movapd XMMWORD PTR [%ESP + 20], %XMM7
            cmpltps %XMM4, %XMM0
    
    ... which is far better than a store followed by a load!
    
    llvm-svn: 28001
    84e95d00
Loading