Skip to content
  1. Apr 27, 2006
  2. Apr 26, 2006
  3. Apr 25, 2006
  4. Apr 24, 2006
  5. Apr 23, 2006
  6. Apr 22, 2006
    • Nate Begeman's avatar
      JumpTable support! What this represents is working asm and jit support for · 4ca2ea5b
      Nate Begeman authored
      x86 and ppc for 100% dense switch statements when relocations are non-PIC.
      This support will be extended and enhanced in the coming days to support
      PIC, and less dense forms of jump tables.
      
      llvm-svn: 27947
      4ca2ea5b
    • Evan Cheng's avatar
      Don't do all the lowering stuff for 2-wide build_vector's. Also, minor... · e728efdf
      Evan Cheng authored
      Don't do all the lowering stuff for 2-wide build_vector's. Also, minor optimization for shuffle of undef.
      
      llvm-svn: 27946
      e728efdf
    • Evan Cheng's avatar
      Fix a performance regression. Use {p}shuf* when there are only two distinct... · 16ef94f4
      Evan Cheng authored
      Fix a performance regression. Use {p}shuf* when there are only two distinct elements in a build_vector.
      
      llvm-svn: 27945
      16ef94f4
    • Evan Cheng's avatar
      Revamp build_vector lowering to take advantage of movss and movd instructions. · 14215c36
      Evan Cheng authored
      movd always clear the top 96 bits and movss does so when it's loading the
      value from memory.
      The net result is codegen for 4-wide shuffles is much improved. It is near
      optimal if one or more elements is a zero. e.g.
      
      __m128i test(int a, int b) {
        return _mm_set_epi32(0, 0, b, a);
      }
      
      compiles to
      
      _test:
      	movd 8(%esp), %xmm1
      	movd 4(%esp), %xmm0
      	punpckldq %xmm1, %xmm0
      	ret
      
      compare to gcc:
      
      _test:
      	subl	$12, %esp
      	movd	20(%esp), %xmm0
      	movd	16(%esp), %xmm1
      	punpckldq	%xmm0, %xmm1
      	movq	%xmm1, %xmm0
      	movhps	LC0, %xmm0
      	addl	$12, %esp
      	ret
      
      or icc:
      
      _test:
              movd      4(%esp), %xmm0                                #5.10
              movd      8(%esp), %xmm3                                #5.10
              xorl      %eax, %eax                                    #5.10
              movd      %eax, %xmm1                                   #5.10
              punpckldq %xmm1, %xmm0                                  #5.10
              movd      %eax, %xmm2                                   #5.10
              punpckldq %xmm2, %xmm3                                  #5.10
              punpckldq %xmm3, %xmm0                                  #5.10
              ret                                                     #5.10
      
      There are still room for improvement, for example the FP variant of the above example:
      
      __m128 test(float a, float b) {
        return _mm_set_ps(0.0, 0.0, b, a);
      }
      
      _test:
      	movss 8(%esp), %xmm1
      	movss 4(%esp), %xmm0
      	unpcklps %xmm1, %xmm0
      	xorps %xmm1, %xmm1
      	movlhps %xmm1, %xmm0
      	ret
      
      The xorps and movlhps are unnecessary. This will require post legalizer optimization to handle.
      
      llvm-svn: 27939
      14215c36
  7. Apr 21, 2006
  8. Apr 20, 2006
  9. Apr 19, 2006
  10. Apr 18, 2006
Loading