Skip to content
  1. Apr 22, 2006
    • Evan Cheng's avatar
      Don't do all the lowering stuff for 2-wide build_vector's. Also, minor... · e728efdf
      Evan Cheng authored
      Don't do all the lowering stuff for 2-wide build_vector's. Also, minor optimization for shuffle of undef.
      
      llvm-svn: 27946
      e728efdf
    • Evan Cheng's avatar
      Fix a performance regression. Use {p}shuf* when there are only two distinct... · 16ef94f4
      Evan Cheng authored
      Fix a performance regression. Use {p}shuf* when there are only two distinct elements in a build_vector.
      
      llvm-svn: 27945
      16ef94f4
    • Evan Cheng's avatar
      Revamp build_vector lowering to take advantage of movss and movd instructions. · 14215c36
      Evan Cheng authored
      movd always clear the top 96 bits and movss does so when it's loading the
      value from memory.
      The net result is codegen for 4-wide shuffles is much improved. It is near
      optimal if one or more elements is a zero. e.g.
      
      __m128i test(int a, int b) {
        return _mm_set_epi32(0, 0, b, a);
      }
      
      compiles to
      
      _test:
      	movd 8(%esp), %xmm1
      	movd 4(%esp), %xmm0
      	punpckldq %xmm1, %xmm0
      	ret
      
      compare to gcc:
      
      _test:
      	subl	$12, %esp
      	movd	20(%esp), %xmm0
      	movd	16(%esp), %xmm1
      	punpckldq	%xmm0, %xmm1
      	movq	%xmm1, %xmm0
      	movhps	LC0, %xmm0
      	addl	$12, %esp
      	ret
      
      or icc:
      
      _test:
              movd      4(%esp), %xmm0                                #5.10
              movd      8(%esp), %xmm3                                #5.10
              xorl      %eax, %eax                                    #5.10
              movd      %eax, %xmm1                                   #5.10
              punpckldq %xmm1, %xmm0                                  #5.10
              movd      %eax, %xmm2                                   #5.10
              punpckldq %xmm2, %xmm3                                  #5.10
              punpckldq %xmm3, %xmm0                                  #5.10
              ret                                                     #5.10
      
      There are still room for improvement, for example the FP variant of the above example:
      
      __m128 test(float a, float b) {
        return _mm_set_ps(0.0, 0.0, b, a);
      }
      
      _test:
      	movss 8(%esp), %xmm1
      	movss 4(%esp), %xmm0
      	unpcklps %xmm1, %xmm0
      	xorps %xmm1, %xmm1
      	movlhps %xmm1, %xmm0
      	ret
      
      The xorps and movlhps are unnecessary. This will require post legalizer optimization to handle.
      
      llvm-svn: 27939
      14215c36
  2. Apr 21, 2006
  3. Apr 20, 2006
  4. Apr 19, 2006
  5. Apr 18, 2006
  6. Apr 17, 2006
  7. Apr 16, 2006
  8. Apr 15, 2006
Loading