Skip to content
  • Chris Lattner's avatar
    fix the BuildVector -> unpcklps logic to not do pointless shuffles · bcb6090a
    Chris Lattner authored
    when the top elements of a vector are undefined.  This happens all
    the time for X86-64 ABI stuff because only the low 2 elements of
    a 4 element vector are defined.  For example, on:
    
    _Complex float f32(_Complex float A, _Complex float B) {
      return A+B;
    }
    
    We used to produce (with SSE2, SSE4.1+ uses insertps):
    
    _f32:                                   ## @f32
    	movdqa	%xmm0, %xmm2
    	addss	%xmm1, %xmm2
    	pshufd	$16, %xmm2, %xmm2
    	pshufd	$1, %xmm1, %xmm1
    	pshufd	$1, %xmm0, %xmm0
    	addss	%xmm1, %xmm0
    	pshufd	$16, %xmm0, %xmm1
    	movdqa	%xmm2, %xmm0
    	unpcklps	%xmm1, %xmm0
    	ret
    
    We now produce:
    
    _f32:                                   ## @f32
    	movdqa	%xmm0, %xmm2
    	addss	%xmm1, %xmm2
    	pshufd	$1, %xmm1, %xmm1
    	pshufd	$1, %xmm0, %xmm3
    	addss	%xmm1, %xmm3
    	movaps	%xmm2, %xmm0
    	unpcklps	%xmm3, %xmm0
    	ret
    
    This implements rdar://8368414
    
    llvm-svn: 112378
    bcb6090a
Loading