Skip to content
  1. Jun 02, 2009
    • Dale Johannesen's avatar
      Revert 72707 and 72709, for the moment. · 5234d379
      Dale Johannesen authored
      llvm-svn: 72712
      5234d379
    • Dale Johannesen's avatar
      Make the implicit inputs and outputs of target-independent · 0b8ca792
      Dale Johannesen authored
      ADDC/ADDE use MVT::i1 (later, whatever it gets legalized to)
      instead of MVT::Flag.  Remove CARRY_FALSE in favor of 0; adjust
      all target-independent code to use this format.
      
      Most targets will still produce a Flag-setting target-dependent
      version when selection is done.  X86 is converted to use i32
      instead, which means TableGen needs to produce different code
      in xxxGenDAGISel.inc.  This keys off the new supportsHasI1 bit
      in xxxInstrInfo, currently set only for X86; in principle this
      is temporary and should go away when all other targets have
      been converted.  All relevant X86 instruction patterns are
      modified to represent setting and using EFLAGS explicitly.  The
      same can be done on other targets.
      
      The immediate behavior change is that an ADC/ADD pair are no
      longer tightly coupled in the X86 scheduler; they can be
      separated by instructions that don't clobber the flags (MOV).
      I will soon add some peephole optimizations based on using
      other instructions that set the flags to feed into ADC.
      
      llvm-svn: 72707
      0b8ca792
  2. May 28, 2009
    • Evan Cheng's avatar
      Added optimization that narrow load / op / store and the 'op' is a bit... · a9cda8ab
      Evan Cheng authored
      Added optimization that narrow load / op / store and the 'op' is a bit twiddling instruction and its second operand is an immediate. If bits that are touched by 'op' can be done with a narrower instruction, reduce the width of the load and store as well. This happens a lot with bitfield manipulation code.
      e.g.
      orl     $65536, 8(%rax)
      =>
      orb     $1, 10(%rax)
      
      Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.
      
      llvm-svn: 72507
      a9cda8ab
  3. May 23, 2009
    • Eli Friedman's avatar
      Make the x86 backend custom-lower UINT_TO_FP and FP_TO_UINT on 32-bit · dfe4f253
      Eli Friedman authored
      systems instead of attempting to promote them to a 64-bit SINT_TO_FP or 
      FP_TO_SINT.  This is in preparation for removing the type legalization 
      code from LegalizeDAG: once type legalization is gone from LegalizeDAG, 
      it won't be able to handle the i64 operand/result correctly.
      
      This isn't quite ideal, but I don't think any other operation for any 
      target ends up in this situation, so treating this case specially seems 
      reasonable.
      
      llvm-svn: 72324
      dfe4f253
  4. Apr 29, 2009
  5. Apr 27, 2009
    • Nate Begeman's avatar
      2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan. · 8d6d4b92
      Nate Begeman authored
      PR2957
      
      ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
      mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
      as the shuffle mask.  A value of -1 represents UNDEF.
      
      In addition to eliminating the creation of illegal BUILD_VECTORS just to 
      represent shuffle masks, we are better about canonicalizing the shuffle mask,
      resulting in substantially better code for some classes of shuffles.
      
      llvm-svn: 70225
      8d6d4b92
  6. Apr 24, 2009
    • Rafael Espindola's avatar
      Revert 69952. Causes testsuite failures on linux x86-64. · b93db668
      Rafael Espindola authored
      llvm-svn: 69967
      b93db668
    • Nate Begeman's avatar
      PR2957 · bb881d66
      Nate Begeman authored
      ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
      mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
      as the shuffle mask.  A value of -1 represents UNDEF.
      
      In addition to eliminating the creation of illegal BUILD_VECTORS just to 
      represent shuffle masks, we are better about canonicalizing the shuffle mask,
      resulting in substantially better code for some classes of shuffles.
      
      A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.
      
      llvm-svn: 69952
      bb881d66
  7. Apr 08, 2009
    • Rafael Espindola's avatar
      Re-apply 68552. · 3b2df10c
      Rafael Espindola authored
      Tested by bootstrapping llvm-gcc and using that to build llvm.
      
      llvm-svn: 68645
      3b2df10c
    • Dan Gohman's avatar
      Implement support for using modeling implicit-zero-extension on x86-64 · ad3e549a
      Dan Gohman authored
      with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
      SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
      instructions), and teach the DAGCombiner to take advantage of this on
      targets which support it. This eliminates many redundant
      zero-extension operations on x86-64.
      
      This adds a new TargetLowering hook, isZExtFree. It's similar to
      isTruncateFree, except it only applies to actual definitions, and not
      no-op truncates which may not zero the high bits.
      
      Also, this adds a new optimization to SimplifyDemandedBits: transform
      operations like x+y into (zext (add (trunc x), (trunc y))) on targets
      where all the casts are no-ops. In contexts where the high part of the
      add is explicitly masked off, this allows the mask operation to be
      eliminated. Fix the DAGCombiner to avoid undoing these transformations
      to eliminate casts on targets where the casts are no-ops.
      
      Also, this adds a new two-address lowering heuristic. Since
      two-address lowering runs before coalescing, it helps to be able to
      look through copies when deciding whether commuting and/or
      three-address conversion are profitable.
      
      Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
      the case that a clobber range extended both before and beyond an
      existing live range. In that case, multiple live ranges need to be
      added. This was exposed by the new subreg coalescing code.
      
      Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
      spiller behavior it was looking for no longer occurrs with the new
      instruction selection.
      
      llvm-svn: 68576
      ad3e549a
    • Bill Wendling's avatar
      Temporarily revert r68552. This was causing a failure in the self-hosting LLVM · 4aa25b79
      Bill Wendling authored
      builds.
      
      --- Reverse-merging (from foreign repository) r68552 into '.':
      U    test/CodeGen/X86/tls8.ll
      U    test/CodeGen/X86/tls10.ll
      U    test/CodeGen/X86/tls2.ll
      U    test/CodeGen/X86/tls6.ll
      U    lib/Target/X86/X86Instr64bit.td
      U    lib/Target/X86/X86InstrSSE.td
      U    lib/Target/X86/X86InstrInfo.td
      U    lib/Target/X86/X86RegisterInfo.cpp
      U    lib/Target/X86/X86ISelLowering.cpp
      U    lib/Target/X86/X86CodeEmitter.cpp
      U    lib/Target/X86/X86FastISel.cpp
      U    lib/Target/X86/X86InstrInfo.h
      U    lib/Target/X86/X86ISelDAGToDAG.cpp
      U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
      U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
      U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
      U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
      U    lib/Target/X86/X86ISelLowering.h
      U    lib/Target/X86/X86InstrInfo.cpp
      U    lib/Target/X86/X86InstrBuilder.h
      U    lib/Target/X86/X86RegisterInfo.td
      
      llvm-svn: 68560
      4aa25b79
  8. Apr 07, 2009
  9. Mar 30, 2009
  10. Mar 26, 2009
  11. Mar 23, 2009
  12. Mar 12, 2009
  13. Mar 07, 2009
    • Dan Gohman's avatar
      Arithmetic instructions don't set EFLAGS bits OF and CF bits · ff659b5b
      Dan Gohman authored
      the same say the "test" instruction does in overflow cases,
      so eliminating the test is only safe when those bits aren't
      needed, as is the case for COND_E and COND_NE, or if it
      can be proven that no overflow will occur. For now, just
      restrict the optimization to COND_E and COND_NE and don't
      do any overflow analysis.
      
      llvm-svn: 66318
      ff659b5b
  14. Mar 04, 2009
  15. Feb 23, 2009
    • Nate Begeman's avatar
      Generate better code for v8i16 shuffles on SSE2 · e684da3e
      Nate Begeman authored
      Generate better code for v16i8 shuffles on SSE2 (avoids stack)
      Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
      Document the shuffle matching logic and add some FIXMEs for later further
        cleanups.
      New tests that test the above.
      
      Examples:
      
      New:
      _shuf2:
      	pextrw	$7, %xmm0, %eax
      	punpcklqdq	%xmm1, %xmm0
      	pshuflw	$128, %xmm0, %xmm0
      	pinsrw	$2, %eax, %xmm0
      
      Old:
      _shuf2:
      	pextrw	$2, %xmm0, %eax
      	pextrw	$7, %xmm0, %ecx
      	pinsrw	$2, %ecx, %xmm0
      	pinsrw	$3, %eax, %xmm0
      	movd	%xmm1, %eax
      	pinsrw	$4, %eax, %xmm0
      	ret
      
      =========
      
      New:
      _shuf4:
      	punpcklqdq	%xmm1, %xmm0
      	pshufb	LCPI1_0, %xmm0
      
      Old:
      _shuf4:
      	pextrw	$3, %xmm0, %eax
      	movsd	%xmm1, %xmm0
      	pextrw	$3, %xmm1, %ecx
      	pinsrw	$4, %ecx, %xmm0
      	pinsrw	$5, %eax, %xmm0
      
      ========
      
      New:
      _shuf1:
      	pushl	%ebx
      	pushl	%edi
      	pushl	%esi
      	pextrw	$1, %xmm0, %eax
      	rolw	$8, %ax
      	movd	%xmm0, %ecx
      	rolw	$8, %cx
      	pextrw	$5, %xmm0, %edx
      	pextrw	$4, %xmm0, %esi
      	pextrw	$3, %xmm0, %edi
      	pextrw	$2, %xmm0, %ebx
      	movaps	%xmm0, %xmm1
      	pinsrw	$0, %ecx, %xmm1
      	pinsrw	$1, %eax, %xmm1
      	rolw	$8, %bx
      	pinsrw	$2, %ebx, %xmm1
      	rolw	$8, %di
      	pinsrw	$3, %edi, %xmm1
      	rolw	$8, %si
      	pinsrw	$4, %esi, %xmm1
      	rolw	$8, %dx
      	pinsrw	$5, %edx, %xmm1
      	pextrw	$7, %xmm0, %eax
      	rolw	$8, %ax
      	movaps	%xmm1, %xmm0
      	pinsrw	$7, %eax, %xmm0
      	popl	%esi
      	popl	%edi
      	popl	%ebx
      	ret
      
      Old:
      _shuf1:
      	subl	$252, %esp
      	movaps	%xmm0, (%esp)
      	movaps	%xmm0, 16(%esp)
      	movaps	%xmm0, 32(%esp)
      	movaps	%xmm0, 48(%esp)
      	movaps	%xmm0, 64(%esp)
      	movaps	%xmm0, 80(%esp)
      	movaps	%xmm0, 96(%esp)
      	movaps	%xmm0, 224(%esp)
      	movaps	%xmm0, 208(%esp)
      	movaps	%xmm0, 192(%esp)
      	movaps	%xmm0, 176(%esp)
      	movaps	%xmm0, 160(%esp)
      	movaps	%xmm0, 144(%esp)
      	movaps	%xmm0, 128(%esp)
      	movaps	%xmm0, 112(%esp)
      	movzbl	14(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	22(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm1, %xmm2
      	movzbl	42(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	50(%esp), %eax
      	movd	%eax, %xmm3
      	punpcklbw	%xmm1, %xmm3
      	punpcklbw	%xmm2, %xmm3
      	movzbl	77(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	84(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm1, %xmm2
      	movzbl	104(%esp), %eax
      	movd	%eax, %xmm1
      	punpcklbw	%xmm1, %xmm0
      	punpcklbw	%xmm2, %xmm0
      	movaps	%xmm0, %xmm1
      	punpcklbw	%xmm3, %xmm1
      	movzbl	127(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	135(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm0, %xmm2
      	movzbl	155(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	163(%esp), %eax
      	movd	%eax, %xmm3
      	punpcklbw	%xmm0, %xmm3
      	punpcklbw	%xmm2, %xmm3
      	movzbl	188(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	197(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm0, %xmm2
      	movzbl	217(%esp), %eax
      	movd	%eax, %xmm4
      	movzbl	225(%esp), %eax
      	movd	%eax, %xmm0
      	punpcklbw	%xmm4, %xmm0
      	punpcklbw	%xmm2, %xmm0
      	punpcklbw	%xmm3, %xmm0
      	punpcklbw	%xmm1, %xmm0
      	addl	$252, %esp
      	ret
      
      llvm-svn: 65311
      e684da3e
  16. Feb 07, 2009
  17. Feb 04, 2009
  18. Feb 03, 2009
  19. Jan 24, 2009
  20. Jan 17, 2009
    • Bill Wendling's avatar
      Implement a special algorithm for converting uint_to_fp for i32 values on · 4d527590
      Bill Wendling authored
      X86. This code:
      
      void f() {
        uint32_t x;
        float y = (float)x;
      }
      
      used to be:
      
           movl     %eax, -8(%ebp)
           movl     [2^52 double], -4(%ebp)
           movsd    -8(%ebp), %xmm0
           subsd    [2^52 double], %xmm0
           cvtsd2ss %xmm0, %xmm0
      
      Is now:
      
         movsd        [2^52 double], %xmm0
         movsd        %xmm0, %xmm1
         movd         %ecx, %xmm2
         orps         %xmm2, %xmm1
         subsd        %xmm0, %xmm1
         cvtsd2ss     %xmm1, %xmm0
      
      This is faster on X86. Note that there's an extra load of %xmm0 into %xmm1. That
      will be fixed in a later coalescer fix.
      
      llvm-svn: 62404
      4d527590
  21. Jan 15, 2009
  22. Jan 13, 2009
    • Devang Patel's avatar
      · 5c6e1e3b
      Devang Patel authored
      Use DebugInfo interface to lower dbg_* intrinsics.
      
      llvm-svn: 62127
      5c6e1e3b
  23. Jan 01, 2009
    • Duncan Sands's avatar
      Fix PR3274: when promoting the condition of a BRCOND node, · 8feb694e
      Duncan Sands authored
      promote from i1 all the way up to the canonical SetCC type.
      In order to discover an appropriate type to use, pass
      MVT::Other to getSetCCResultType.  In order to be able to
      do this, change getSetCCResultType to take a type as an
      argument, not a value (this is also more logical).
      
      llvm-svn: 61542
      8feb694e
  24. Dec 23, 2008
  25. Dec 18, 2008
  26. Dec 12, 2008
  27. Dec 09, 2008
  28. Dec 02, 2008
  29. Dec 01, 2008
    • Duncan Sands's avatar
      Change the interface to the type legalization method · 6ed40141
      Duncan Sands authored
      ReplaceNodeResults: rather than returning a node which
      must have the same number of results as the original
      node (which means mucking around with MERGE_VALUES,
      and which is also easy to get wrong since SelectionDAG
      folding may mean you don't get the node you expect),
      return the results in a vector.
      
      llvm-svn: 60348
      6ed40141
  30. Nov 24, 2008
  31. Oct 30, 2008
  32. Oct 21, 2008
    • Dale Johannesen's avatar
      Add an SSE2 algorithm for uint64->f64 conversion. · 28929589
      Dale Johannesen authored
      The same one Apple gcc uses, faster.  Also gets the
      extreme case in gcc.c-torture/execute/ieee/rbug.c
      correct which we weren't before; this is not
      sufficient to get the test to pass though, there
      is another bug.
      
      llvm-svn: 57926
      28929589
Loading