Skip to content
  1. Mar 05, 2008
  2. Feb 19, 2008
    • Evan Cheng's avatar
      - When DAG combiner is folding a bit convert into a BUILD_VECTOR, it should... · 6200c225
      Evan Cheng authored
      - When DAG combiner is folding a bit convert into a BUILD_VECTOR, it should check if it's essentially a SCALAR_TO_VECTOR. Avoid turning (v8i16) <10, u, u, u> to <10, 0, u, u, u, u, u, u>. Instead, simply convert it to a SCALAR_TO_VECTOR of the proper type.
      - X86 now normalize SCALAR_TO_VECTOR to (BIT_CONVERT (v4i32 SCALAR_TO_VECTOR)). Get rid of X86ISD::S2VEC.
      
      llvm-svn: 47290
      6200c225
  3. Feb 16, 2008
  4. Feb 12, 2008
  5. Feb 11, 2008
  6. Feb 10, 2008
  7. Feb 09, 2008
  8. Feb 04, 2008
  9. Feb 03, 2008
  10. Jan 24, 2008
    • Chris Lattner's avatar
      Significantly simplify and improve handling of FP function results on x86-32. · a91f77ea
      Chris Lattner authored
      This case returns the value in ST(0) and then has to convert it to an SSE
      register.  This causes significant codegen ugliness in some cases.  For 
      example in the trivial fp-stack-direct-ret.ll testcase we used to generate:
      
      _bar:
      	subl	$28, %esp
      	call	L_foo$stub
      	fstpl	16(%esp)
      	movsd	16(%esp), %xmm0
      	movsd	%xmm0, 8(%esp)
      	fldl	8(%esp)
      	addl	$28, %esp
      	ret
      
      because we move the result of foo() into an XMM register, then have to
      move it back for the return of bar.
      
      Instead of hacking ever-more special cases into the call result lowering code
      we take a much simpler approach: on x86-32, fp return is modeled as always 
      returning into an f80 register which is then truncated to f32 or f64 as needed.
      Similarly for a result, we model it as an extension to f80 + return.
      
      This exposes the truncate and extensions to the dag combiner, allowing target
      independent code to hack on them, eliminating them in this case.  This gives 
      us this code for the example above:
      
      _bar:
      	subl	$12, %esp
      	call	L_foo$stub
      	addl	$12, %esp
      	ret
      
      The nasty aspect of this is that these conversions are not legal, but we want
      the second pass of dag combiner (post-legalize) to be able to hack on them.
      To handle this, we lie to legalize and say they are legal, then custom expand
      them on entry to the isel pass (PreprocessForFPConvert).  This is gross, but
      less gross than the code it is replacing :)
      
      This also allows us to generate better code in several other cases.  For 
      example on fp-stack-ret-conv.ll, we now generate:
      
      _test:
      	subl	$12, %esp
      	call	L_foo$stub
      	fstps	8(%esp)
      	movl	16(%esp), %eax
      	cvtss2sd	8(%esp), %xmm0
      	movsd	%xmm0, (%eax)
      	addl	$12, %esp
      	ret
      
      where before we produced (incidentally, the old bad code is identical to what
      gcc produces):
      
      _test:
      	subl	$12, %esp
      	call	L_foo$stub
      	fstpl	(%esp)
      	cvtsd2ss	(%esp), %xmm0
      	cvtss2sd	%xmm0, %xmm0
      	movl	16(%esp), %eax
      	movsd	%xmm0, (%eax)
      	addl	$12, %esp
      	ret
      
      Note that we generate slightly worse code on pr1505b.ll due to a scheduling 
      deficiency that is unrelated to this patch.
      
      llvm-svn: 46307
      a91f77ea
  11. Jan 11, 2008
  12. Jan 10, 2008
  13. Jan 07, 2008
  14. Dec 29, 2007
  15. Dec 20, 2007
  16. Dec 18, 2007
  17. Dec 16, 2007
  18. Dec 15, 2007
  19. Dec 13, 2007
  20. Dec 06, 2007
  21. Nov 25, 2007
    • Chris Lattner's avatar
      Fix a long standing deficiency in the X86 backend: we would · 5728bdd4
      Chris Lattner authored
      sometimes emit "zero" and "all one" vectors multiple times,
      for example:
      
      _test2:
      	pcmpeqd	%mm0, %mm0
      	movq	%mm0, _M1
      	pcmpeqd	%mm0, %mm0
      	movq	%mm0, _M2
      	ret
      
      instead of:
      
      _test2:
      	pcmpeqd	%mm0, %mm0
      	movq	%mm0, _M1
      	movq	%mm0, _M2
      	ret
      
      This patch fixes this by always arranging for zero/one vectors
      to be defined as v4i32 or v2i32 (SSE/MMX) instead of letting them be
      any random type.  This ensures they get trivially CSE'd on the dag.
      This fix is also important for LegalizeDAGTypes, as it gets unhappy
      when the x86 backend wants BUILD_VECTOR(i64 0) to be legal even when
      'i64' isn't legal.
      
      This patch makes the following changes:
      
      1) X86TargetLowering::LowerBUILD_VECTOR now lowers 0/1 vectors into
         their canonical types.
      2) The now-dead patterns are removed from the SSE/MMX .td files.
      3) All the patterns in the .td file that referred to immAllOnesV or
         immAllZerosV in the wrong form now use *_bc to match them with a
         bitcast wrapped around them.
      4) X86DAGToDAGISel::SelectScalarSSELoad is generalized to handle 
         bitcast'd zero vectors, which simplifies the code actually.
      5) getShuffleVectorZeroOrUndef is updated to generate a shuffle that
         is legal, instead of generating one that is illegal and expecting
         a later legalize pass to clean it up.
      6) isZeroShuffle is generalized to handle bitcast of zeros.
      7) several other minor tweaks.
      
      This patch is definite goodness, but has the potential to cause random
      code quality regressions.  Please be on the lookout for these and let 
      me know if they happen.
      
      llvm-svn: 44310
      5728bdd4
  22. Nov 17, 2007
  23. Oct 30, 2007
  24. Oct 12, 2007
  25. Oct 11, 2007
  26. Oct 06, 2007
    • Evan Cheng's avatar
      Added DAG xforms. e.g. · f4b5d491
      Evan Cheng authored
      (vextract (v4f32 s2v (f32 load $addr)), 0) -> (f32 load $addr) 
      (vextract (v4i32 bc (v4f32 s2v (f32 load $addr))), 0) -> (i32 load $addr)
      Remove x86 specific patterns.
      
      llvm-svn: 42677
      f4b5d491
  27. Oct 01, 2007
  28. Sep 29, 2007
  29. Sep 25, 2007
  30. Sep 23, 2007
    • Dale Johannesen's avatar
      Fix PR 1681. When X86 target uses +sse -sse2, · e36c4002
      Dale Johannesen authored
      keep f32 in SSE registers and f64 in x87.  This
      is effectively a new codegen mode.
      Change addLegalFPImmediate to permit float and
      double variants to do different things.
      Adjust callers.
      
      llvm-svn: 42246
      e36c4002
  31. Sep 14, 2007
  32. Sep 11, 2007
  33. Sep 07, 2007
  34. Aug 30, 2007
  35. Aug 11, 2007
  36. Aug 10, 2007
Loading