Skip to content
  1. Apr 22, 2006
    • Evan Cheng's avatar
      Don't do all the lowering stuff for 2-wide build_vector's. Also, minor... · e728efdf
      Evan Cheng authored
      Don't do all the lowering stuff for 2-wide build_vector's. Also, minor optimization for shuffle of undef.
      
      llvm-svn: 27946
      e728efdf
    • Evan Cheng's avatar
      Fix a performance regression. Use {p}shuf* when there are only two distinct... · 16ef94f4
      Evan Cheng authored
      Fix a performance regression. Use {p}shuf* when there are only two distinct elements in a build_vector.
      
      llvm-svn: 27945
      16ef94f4
    • Evan Cheng's avatar
      Revamp build_vector lowering to take advantage of movss and movd instructions. · 14215c36
      Evan Cheng authored
      movd always clear the top 96 bits and movss does so when it's loading the
      value from memory.
      The net result is codegen for 4-wide shuffles is much improved. It is near
      optimal if one or more elements is a zero. e.g.
      
      __m128i test(int a, int b) {
        return _mm_set_epi32(0, 0, b, a);
      }
      
      compiles to
      
      _test:
      	movd 8(%esp), %xmm1
      	movd 4(%esp), %xmm0
      	punpckldq %xmm1, %xmm0
      	ret
      
      compare to gcc:
      
      _test:
      	subl	$12, %esp
      	movd	20(%esp), %xmm0
      	movd	16(%esp), %xmm1
      	punpckldq	%xmm0, %xmm1
      	movq	%xmm1, %xmm0
      	movhps	LC0, %xmm0
      	addl	$12, %esp
      	ret
      
      or icc:
      
      _test:
              movd      4(%esp), %xmm0                                #5.10
              movd      8(%esp), %xmm3                                #5.10
              xorl      %eax, %eax                                    #5.10
              movd      %eax, %xmm1                                   #5.10
              punpckldq %xmm1, %xmm0                                  #5.10
              movd      %eax, %xmm2                                   #5.10
              punpckldq %xmm2, %xmm3                                  #5.10
              punpckldq %xmm3, %xmm0                                  #5.10
              ret                                                     #5.10
      
      There are still room for improvement, for example the FP variant of the above example:
      
      __m128 test(float a, float b) {
        return _mm_set_ps(0.0, 0.0, b, a);
      }
      
      _test:
      	movss 8(%esp), %xmm1
      	movss 4(%esp), %xmm0
      	unpcklps %xmm1, %xmm0
      	xorps %xmm1, %xmm1
      	movlhps %xmm1, %xmm0
      	ret
      
      The xorps and movlhps are unnecessary. This will require post legalizer optimization to handle.
      
      llvm-svn: 27939
      14215c36
  2. Apr 21, 2006
  3. Apr 20, 2006
  4. Apr 19, 2006
  5. Apr 18, 2006
  6. Apr 17, 2006
  7. Apr 15, 2006
  8. Apr 14, 2006
  9. Apr 13, 2006
  10. Apr 12, 2006
  11. Apr 11, 2006
  12. Apr 10, 2006
  13. Apr 07, 2006
    • Evan Cheng's avatar
      Code clean up. · ac847268
      Evan Cheng authored
      llvm-svn: 27501
      ac847268
    • Evan Cheng's avatar
      - movlp{s|d} and movhp{s|d} support. · c995b45f
      Evan Cheng authored
      - Normalize shuffle nodes so result vector lower half elements come from the
        first vector, the rest come from the second vector. (Except for the
        exceptions :-).
      - Other minor fixes.
      
      llvm-svn: 27474
      c995b45f
  14. Apr 06, 2006
  15. Apr 05, 2006
  16. Apr 04, 2006
  17. Apr 03, 2006
  18. Mar 31, 2006
  19. Mar 30, 2006
  20. Mar 29, 2006
Loading