Skip to content
  1. Sep 02, 2010
  2. Sep 01, 2010
  3. Aug 31, 2010
  4. Aug 28, 2010
    • Chris Lattner's avatar
      fix the buildvector->insertp[sd] logic to not always create a redundant · 94656b1c
      Chris Lattner authored
      insertp[sd] $0, which is a noop.  Before:
      
      _f32:                                   ## @f32
      	pshufd	$1, %xmm1, %xmm2
      	pshufd	$1, %xmm0, %xmm3
      	addss	%xmm2, %xmm3
      	addss	%xmm1, %xmm0
                                              ## kill: XMM0<def> XMM0<kill> XMM0<def>
      	insertps	$0, %xmm0, %xmm0
      	insertps	$16, %xmm3, %xmm0
      	ret
      
      after:
      
      _f32:                                   ## @f32
      	movdqa	%xmm0, %xmm2
      	addss	%xmm1, %xmm2
      	pshufd	$1, %xmm1, %xmm1
      	pshufd	$1, %xmm0, %xmm3
      	addss	%xmm1, %xmm3
      	movdqa	%xmm2, %xmm0
      	insertps	$16, %xmm3, %xmm0
      	ret
      
      The extra movs are due to a random (poor) scheduling decision.
      
      llvm-svn: 112379
      94656b1c
    • Chris Lattner's avatar
      fix the BuildVector -> unpcklps logic to not do pointless shuffles · bcb6090a
      Chris Lattner authored
      when the top elements of a vector are undefined.  This happens all
      the time for X86-64 ABI stuff because only the low 2 elements of
      a 4 element vector are defined.  For example, on:
      
      _Complex float f32(_Complex float A, _Complex float B) {
        return A+B;
      }
      
      We used to produce (with SSE2, SSE4.1+ uses insertps):
      
      _f32:                                   ## @f32
      	movdqa	%xmm0, %xmm2
      	addss	%xmm1, %xmm2
      	pshufd	$16, %xmm2, %xmm2
      	pshufd	$1, %xmm1, %xmm1
      	pshufd	$1, %xmm0, %xmm0
      	addss	%xmm1, %xmm0
      	pshufd	$16, %xmm0, %xmm1
      	movdqa	%xmm2, %xmm0
      	unpcklps	%xmm1, %xmm0
      	ret
      
      We now produce:
      
      _f32:                                   ## @f32
      	movdqa	%xmm0, %xmm2
      	addss	%xmm1, %xmm2
      	pshufd	$1, %xmm1, %xmm1
      	pshufd	$1, %xmm0, %xmm3
      	addss	%xmm1, %xmm3
      	movaps	%xmm2, %xmm0
      	unpcklps	%xmm3, %xmm0
      	ret
      
      This implements rdar://8368414
      
      llvm-svn: 112378
      bcb6090a
    • Chris Lattner's avatar
      improve comments in the unpcklps generating logic, introduce · 96db6e66
      Chris Lattner authored
      a new EltStride variable instead of reusing NumElems variable
      for a non-obvious purpose.  No functionality change.
      
      llvm-svn: 112377
      96db6e66
    • Bruno Cardoso Lopes's avatar
      Clean up the logic of vector shuffles -> vector shifts. · a982aa24
      Bruno Cardoso Lopes authored
      Also teach this logic how to handle target specific shuffles if
      needed, this is necessary while searching recursively for zeroed
      scalar elements in vector shuffle operands.
      
      llvm-svn: 112348
      a982aa24
  5. Aug 27, 2010
  6. Aug 26, 2010
  7. Aug 25, 2010
  8. Aug 24, 2010
  9. Aug 23, 2010
  10. Aug 21, 2010
  11. Aug 17, 2010
    • Anton Korobeynikov's avatar
      More fixes for win64: · 231ab847
      Anton Korobeynikov authored
        - Do not clobber al during variadic calls, this is AMD64 ABI-only feature
        - Emit wincall64, where necessary
      Patch by Cameron Esfahani!
      
      llvm-svn: 111289
      231ab847
  12. Aug 14, 2010
  13. Aug 13, 2010
  14. Aug 12, 2010
  15. Aug 11, 2010
    • Dan Gohman's avatar
      Use ISD::ADD instead of ISD::SUB with a negated constant. This · 5531aa4d
      Dan Gohman authored
      avoids trouble if the return type of TD->getPointerSize() is
      changed to something which doesn't promote to a signed type,
      and is simpler anyway.
      
      Also, use getCopyFromReg instead of getRegister to read a
      physical register's value.
      
      llvm-svn: 110835
      5531aa4d
    • Bruno Cardoso Lopes's avatar
      Add AVX matching patterns to Packed Bit Test intrinsics. · 91d61df3
      Bruno Cardoso Lopes authored
      Apply the same approach of SSE4.1 ptest intrinsics but
      create a new x86 node "testp" since AVX introduces
      vtest{ps}{pd} instructions which set ZF and CF depending
      on sign bit AND and ANDN of packed floating-point sources.
      
      This is slightly different from what the "ptest" does.
      Tests comming with the other 256 intrinsics tests.
      
      llvm-svn: 110744
      91d61df3
  16. Aug 10, 2010
  17. Aug 06, 2010
Loading