Skip to content
  1. Mar 23, 2009
  2. Mar 12, 2009
  3. Mar 07, 2009
    • Dan Gohman's avatar
      Arithmetic instructions don't set EFLAGS bits OF and CF bits · ff659b5b
      Dan Gohman authored
      the same say the "test" instruction does in overflow cases,
      so eliminating the test is only safe when those bits aren't
      needed, as is the case for COND_E and COND_NE, or if it
      can be proven that no overflow will occur. For now, just
      restrict the optimization to COND_E and COND_NE and don't
      do any overflow analysis.
      
      llvm-svn: 66318
      ff659b5b
  4. Mar 04, 2009
  5. Feb 23, 2009
    • Nate Begeman's avatar
      Generate better code for v8i16 shuffles on SSE2 · e684da3e
      Nate Begeman authored
      Generate better code for v16i8 shuffles on SSE2 (avoids stack)
      Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
      Document the shuffle matching logic and add some FIXMEs for later further
        cleanups.
      New tests that test the above.
      
      Examples:
      
      New:
      _shuf2:
      	pextrw	$7, %xmm0, %eax
      	punpcklqdq	%xmm1, %xmm0
      	pshuflw	$128, %xmm0, %xmm0
      	pinsrw	$2, %eax, %xmm0
      
      Old:
      _shuf2:
      	pextrw	$2, %xmm0, %eax
      	pextrw	$7, %xmm0, %ecx
      	pinsrw	$2, %ecx, %xmm0
      	pinsrw	$3, %eax, %xmm0
      	movd	%xmm1, %eax
      	pinsrw	$4, %eax, %xmm0
      	ret
      
      =========
      
      New:
      _shuf4:
      	punpcklqdq	%xmm1, %xmm0
      	pshufb	LCPI1_0, %xmm0
      
      Old:
      _shuf4:
      	pextrw	$3, %xmm0, %eax
      	movsd	%xmm1, %xmm0
      	pextrw	$3, %xmm1, %ecx
      	pinsrw	$4, %ecx, %xmm0
      	pinsrw	$5, %eax, %xmm0
      
      ========
      
      New:
      _shuf1:
      	pushl	%ebx
      	pushl	%edi
      	pushl	%esi
      	pextrw	$1, %xmm0, %eax
      	rolw	$8, %ax
      	movd	%xmm0, %ecx
      	rolw	$8, %cx
      	pextrw	$5, %xmm0, %edx
      	pextrw	$4, %xmm0, %esi
      	pextrw	$3, %xmm0, %edi
      	pextrw	$2, %xmm0, %ebx
      	movaps	%xmm0, %xmm1
      	pinsrw	$0, %ecx, %xmm1
      	pinsrw	$1, %eax, %xmm1
      	rolw	$8, %bx
      	pinsrw	$2, %ebx, %xmm1
      	rolw	$8, %di
      	pinsrw	$3, %edi, %xmm1
      	rolw	$8, %si
      	pinsrw	$4, %esi, %xmm1
      	rolw	$8, %dx
      	pinsrw	$5, %edx, %xmm1
      	pextrw	$7, %xmm0, %eax
      	rolw	$8, %ax
      	movaps	%xmm1, %xmm0
      	pinsrw	$7, %eax, %xmm0
      	popl	%esi
      	popl	%edi
      	popl	%ebx
      	ret
      
      Old:
      _shuf1:
      	subl	$252, %esp
      	movaps	%xmm0, (%esp)
      	movaps	%xmm0, 16(%esp)
      	movaps	%xmm0, 32(%esp)
      	movaps	%xmm0, 48(%esp)
      	movaps	%xmm0, 64(%esp)
      	movaps	%xmm0, 80(%esp)
      	movaps	%xmm0, 96(%esp)
      	movaps	%xmm0, 224(%esp)
      	movaps	%xmm0, 208(%esp)
      	movaps	%xmm0, 192(%esp)
      	movaps	%xmm0, 176(%esp)
      	movaps	%xmm0, 160(%esp)
      	movaps	%xmm0, 144(%esp)
      	movaps	%xmm0, 128(%esp)
      	movaps	%xmm0, 112(%esp)
      	movzbl	14(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	22(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm1, %xmm2
      	movzbl	42(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	50(%esp), %eax
      	movd	%eax, %xmm3
      	punpcklbw	%xmm1, %xmm3
      	punpcklbw	%xmm2, %xmm3
      	movzbl	77(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	84(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm1, %xmm2
      	movzbl	104(%esp), %eax
      	movd	%eax, %xmm1
      	punpcklbw	%xmm1, %xmm0
      	punpcklbw	%xmm2, %xmm0
      	movaps	%xmm0, %xmm1
      	punpcklbw	%xmm3, %xmm1
      	movzbl	127(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	135(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm0, %xmm2
      	movzbl	155(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	163(%esp), %eax
      	movd	%eax, %xmm3
      	punpcklbw	%xmm0, %xmm3
      	punpcklbw	%xmm2, %xmm3
      	movzbl	188(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	197(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm0, %xmm2
      	movzbl	217(%esp), %eax
      	movd	%eax, %xmm4
      	movzbl	225(%esp), %eax
      	movd	%eax, %xmm0
      	punpcklbw	%xmm4, %xmm0
      	punpcklbw	%xmm2, %xmm0
      	punpcklbw	%xmm3, %xmm0
      	punpcklbw	%xmm1, %xmm0
      	addl	$252, %esp
      	ret
      
      llvm-svn: 65311
      e684da3e
  6. Feb 07, 2009
  7. Feb 04, 2009
  8. Feb 03, 2009
  9. Jan 24, 2009
  10. Jan 17, 2009
    • Bill Wendling's avatar
      Implement a special algorithm for converting uint_to_fp for i32 values on · 4d527590
      Bill Wendling authored
      X86. This code:
      
      void f() {
        uint32_t x;
        float y = (float)x;
      }
      
      used to be:
      
           movl     %eax, -8(%ebp)
           movl     [2^52 double], -4(%ebp)
           movsd    -8(%ebp), %xmm0
           subsd    [2^52 double], %xmm0
           cvtsd2ss %xmm0, %xmm0
      
      Is now:
      
         movsd        [2^52 double], %xmm0
         movsd        %xmm0, %xmm1
         movd         %ecx, %xmm2
         orps         %xmm2, %xmm1
         subsd        %xmm0, %xmm1
         cvtsd2ss     %xmm1, %xmm0
      
      This is faster on X86. Note that there's an extra load of %xmm0 into %xmm1. That
      will be fixed in a later coalescer fix.
      
      llvm-svn: 62404
      4d527590
  11. Jan 15, 2009
  12. Jan 13, 2009
    • Devang Patel's avatar
      · 5c6e1e3b
      Devang Patel authored
      Use DebugInfo interface to lower dbg_* intrinsics.
      
      llvm-svn: 62127
      5c6e1e3b
  13. Jan 01, 2009
    • Duncan Sands's avatar
      Fix PR3274: when promoting the condition of a BRCOND node, · 8feb694e
      Duncan Sands authored
      promote from i1 all the way up to the canonical SetCC type.
      In order to discover an appropriate type to use, pass
      MVT::Other to getSetCCResultType.  In order to be able to
      do this, change getSetCCResultType to take a type as an
      argument, not a value (this is also more logical).
      
      llvm-svn: 61542
      8feb694e
  14. Dec 23, 2008
  15. Dec 18, 2008
  16. Dec 12, 2008
  17. Dec 09, 2008
  18. Dec 02, 2008
  19. Dec 01, 2008
    • Duncan Sands's avatar
      Change the interface to the type legalization method · 6ed40141
      Duncan Sands authored
      ReplaceNodeResults: rather than returning a node which
      must have the same number of results as the original
      node (which means mucking around with MERGE_VALUES,
      and which is also easy to get wrong since SelectionDAG
      folding may mean you don't get the node you expect),
      return the results in a vector.
      
      llvm-svn: 60348
      6ed40141
  20. Nov 24, 2008
  21. Oct 30, 2008
  22. Oct 21, 2008
    • Dale Johannesen's avatar
      Add an SSE2 algorithm for uint64->f64 conversion. · 28929589
      Dale Johannesen authored
      The same one Apple gcc uses, faster.  Also gets the
      extreme case in gcc.c-torture/execute/ieee/rbug.c
      correct which we weren't before; this is not
      sufficient to get the test to pass though, there
      is another bug.
      
      llvm-svn: 57926
      28929589
  23. Oct 18, 2008
    • Dan Gohman's avatar
      Teach DAGCombine to fold constant offsets into GlobalAddress nodes, · 2fe6bee5
      Dan Gohman authored
      and add a TargetLowering hook for it to use to determine when this
      is legal (i.e. not in PIC mode, etc.)
      
      This allows instruction selection to emit folded constant offsets
      in more cases, such as the included testcase, eliminating the need
      for explicit arithmetic instructions.
      
      This eliminates the need for the C++ code in X86ISelDAGToDAG.cpp
      that attempted to achieve the same effect, but wasn't as effective.
      
      Also, fix handling of offsets in GlobalAddressSDNodes in several
      places, including changing GlobalAddressSDNode's offset from
      int to int64_t.
      
      The Mips, Alpha, Sparc, and CellSPU targets appear to be
      unaware of GlobalAddress offsets currently, so set the hook to
      false on those targets.
      
      llvm-svn: 57748
      2fe6bee5
  24. Oct 15, 2008
  25. Oct 04, 2008
  26. Oct 02, 2008
  27. Oct 01, 2008
  28. Sep 30, 2008
  29. Sep 25, 2008
  30. Sep 24, 2008
  31. Sep 23, 2008
  32. Sep 16, 2008
  33. Sep 13, 2008
    • Dan Gohman's avatar
      Define CallSDNode, an SDNode subclass for use with ISD::CALL. · d3fe174c
      Dan Gohman authored
      Currently it just holds the calling convention and flags
      for isVarArgs and isTailCall.
      
      And it has several utility methods, which eliminate magic
      5+2*i and similar index computations in several places.
      
      CallSDNodes are not CSE'd. Teach UpdateNodeOperands to handle
      nodes that are not CSE'd gracefully.
      
      llvm-svn: 56183
      d3fe174c
Loading