Skip to content
  1. Aug 29, 2010
  2. Aug 28, 2010
    • Chris Lattner's avatar
      I have manually decoded the imm field of an insertps one too many · 7a05e6dc
      Chris Lattner authored
      times.  This patch causes llc and llvm-mc (which both default to
      verbose-asm) to print out comments after a few common shuffle 
      instructions which indicates the shuffle mask, e.g.:
      
      	insertps	$113, %xmm3, %xmm0     ## xmm0 = zero,xmm0[1,2],xmm3[1]
      	unpcklps	%xmm1, %xmm0    ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
      	pshufd	$1, %xmm1, %xmm1        ## xmm1 = xmm1[1,0,0,0]
      
      This is carefully factored to keep the information extraction (of the
      shuffle mask) separate from the printing logic.  I plan to move the
      extraction part out somewhere else at some point for other parts of
      the x86 backend that want to introspect on the behavior of shuffles.
      
      llvm-svn: 112387
      7a05e6dc
    • Chris Lattner's avatar
      fix the buildvector->insertp[sd] logic to not always create a redundant · 94656b1c
      Chris Lattner authored
      insertp[sd] $0, which is a noop.  Before:
      
      _f32:                                   ## @f32
      	pshufd	$1, %xmm1, %xmm2
      	pshufd	$1, %xmm0, %xmm3
      	addss	%xmm2, %xmm3
      	addss	%xmm1, %xmm0
                                              ## kill: XMM0<def> XMM0<kill> XMM0<def>
      	insertps	$0, %xmm0, %xmm0
      	insertps	$16, %xmm3, %xmm0
      	ret
      
      after:
      
      _f32:                                   ## @f32
      	movdqa	%xmm0, %xmm2
      	addss	%xmm1, %xmm2
      	pshufd	$1, %xmm1, %xmm1
      	pshufd	$1, %xmm0, %xmm3
      	addss	%xmm1, %xmm3
      	movdqa	%xmm2, %xmm0
      	insertps	$16, %xmm3, %xmm0
      	ret
      
      The extra movs are due to a random (poor) scheduling decision.
      
      llvm-svn: 112379
      94656b1c
    • Chris Lattner's avatar
      fix the BuildVector -> unpcklps logic to not do pointless shuffles · bcb6090a
      Chris Lattner authored
      when the top elements of a vector are undefined.  This happens all
      the time for X86-64 ABI stuff because only the low 2 elements of
      a 4 element vector are defined.  For example, on:
      
      _Complex float f32(_Complex float A, _Complex float B) {
        return A+B;
      }
      
      We used to produce (with SSE2, SSE4.1+ uses insertps):
      
      _f32:                                   ## @f32
      	movdqa	%xmm0, %xmm2
      	addss	%xmm1, %xmm2
      	pshufd	$16, %xmm2, %xmm2
      	pshufd	$1, %xmm1, %xmm1
      	pshufd	$1, %xmm0, %xmm0
      	addss	%xmm1, %xmm0
      	pshufd	$16, %xmm0, %xmm1
      	movdqa	%xmm2, %xmm0
      	unpcklps	%xmm1, %xmm0
      	ret
      
      We now produce:
      
      _f32:                                   ## @f32
      	movdqa	%xmm0, %xmm2
      	addss	%xmm1, %xmm2
      	pshufd	$1, %xmm1, %xmm1
      	pshufd	$1, %xmm0, %xmm3
      	addss	%xmm1, %xmm3
      	movaps	%xmm2, %xmm0
      	unpcklps	%xmm3, %xmm0
      	ret
      
      This implements rdar://8368414
      
      llvm-svn: 112378
      bcb6090a
Loading