Skip to content
  1. Apr 20, 2009
  2. Apr 17, 2009
  3. Apr 13, 2009
  4. Apr 10, 2009
  5. Apr 09, 2009
  6. Apr 08, 2009
    • Rafael Espindola's avatar
      Re-apply 68552. · 3b2df10c
      Rafael Espindola authored
      Tested by bootstrapping llvm-gcc and using that to build llvm.
      
      llvm-svn: 68645
      3b2df10c
    • Rafael Espindola's avatar
      Avoid a hard coded constant. · d173f423
      Rafael Espindola authored
      llvm-svn: 68603
      d173f423
    • Dan Gohman's avatar
      Implement support for using modeling implicit-zero-extension on x86-64 · ad3e549a
      Dan Gohman authored
      with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
      SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
      instructions), and teach the DAGCombiner to take advantage of this on
      targets which support it. This eliminates many redundant
      zero-extension operations on x86-64.
      
      This adds a new TargetLowering hook, isZExtFree. It's similar to
      isTruncateFree, except it only applies to actual definitions, and not
      no-op truncates which may not zero the high bits.
      
      Also, this adds a new optimization to SimplifyDemandedBits: transform
      operations like x+y into (zext (add (trunc x), (trunc y))) on targets
      where all the casts are no-ops. In contexts where the high part of the
      add is explicitly masked off, this allows the mask operation to be
      eliminated. Fix the DAGCombiner to avoid undoing these transformations
      to eliminate casts on targets where the casts are no-ops.
      
      Also, this adds a new two-address lowering heuristic. Since
      two-address lowering runs before coalescing, it helps to be able to
      look through copies when deciding whether commuting and/or
      three-address conversion are profitable.
      
      Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
      the case that a clobber range extended both before and beyond an
      existing live range. In that case, multiple live ranges need to be
      added. This was exposed by the new subreg coalescing code.
      
      Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
      spiller behavior it was looking for no longer occurrs with the new
      instruction selection.
      
      llvm-svn: 68576
      ad3e549a
    • Bill Wendling's avatar
      Temporarily revert r68552. This was causing a failure in the self-hosting LLVM · 4aa25b79
      Bill Wendling authored
      builds.
      
      --- Reverse-merging (from foreign repository) r68552 into '.':
      U    test/CodeGen/X86/tls8.ll
      U    test/CodeGen/X86/tls10.ll
      U    test/CodeGen/X86/tls2.ll
      U    test/CodeGen/X86/tls6.ll
      U    lib/Target/X86/X86Instr64bit.td
      U    lib/Target/X86/X86InstrSSE.td
      U    lib/Target/X86/X86InstrInfo.td
      U    lib/Target/X86/X86RegisterInfo.cpp
      U    lib/Target/X86/X86ISelLowering.cpp
      U    lib/Target/X86/X86CodeEmitter.cpp
      U    lib/Target/X86/X86FastISel.cpp
      U    lib/Target/X86/X86InstrInfo.h
      U    lib/Target/X86/X86ISelDAGToDAG.cpp
      U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
      U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
      U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
      U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
      U    lib/Target/X86/X86ISelLowering.h
      U    lib/Target/X86/X86InstrInfo.cpp
      U    lib/Target/X86/X86InstrBuilder.h
      U    lib/Target/X86/X86RegisterInfo.td
      
      llvm-svn: 68560
      4aa25b79
  7. Apr 07, 2009
  8. Apr 03, 2009
  9. Apr 02, 2009
  10. Mar 31, 2009
  11. Mar 30, 2009
  12. Mar 28, 2009
  13. Mar 27, 2009
  14. Mar 26, 2009
  15. Mar 13, 2009
    • Bill Wendling's avatar
      These instructions have special lowering that may lower them to SSE · 798fd56d
      Bill Wendling authored
      instructions. Prevent that if we don't want implicit uses of SSE.
      
      llvm-svn: 66877
      798fd56d
    • Evan Cheng's avatar
      Fix some significant problems with constant pools that resulted in unnecessary... · 1fb8aedd
      Evan Cheng authored
      Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues.
      
      1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
      2. MachineConstantPool alignment field is also a log2 value.
      3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
      4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
      5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
      6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
      
      
      Solutions:
      1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
      2. MachineConstantPool alignment field is also changed to keep non-log2 value.
      3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
      4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
      5. Asm printer uses cheaper data structure to group constant pool entries.
      6. Asm printer compute entry offsets after grouping is done.
      7. Change JIT code to compute entry offsets on the fly.
      
      llvm-svn: 66875
      1fb8aedd
    • Chris Lattner's avatar
      generalize the previous code to use the full generality of LEA · 99cc1337
      Chris Lattner authored
      for i32/i64 expressions (we could also do i16 on cpus where
      i16 lea is fast, but I didn't add this).  On the example, we now
      generate:
      
      _test:
      	movl	4(%esp), %eax
      	cmpl	$42, (%eax)
      	setl	%al
      	movzbl	%al, %eax
      	leal	4(%eax,%eax,8), %eax
      	ret
      
      instead of:
      
      _test:
      	movl	4(%esp), %eax
      	cmpl	$41, (%eax)
      	movl	$4, %ecx
      	movl	$13, %eax
      	cmovg	%ecx, %eax
      	ret
      
      llvm-svn: 66869
      99cc1337
    • Chris Lattner's avatar
      optimize the case of cond ? 42 : 41 and friends. This compiles the · 4be6df5d
      Chris Lattner authored
      example to:
      
      _test:
      	movl	4(%esp), %eax
      	cmpl	$41, (%eax)
      	setg	%al
      	movzbl	%al, %eax
      	orl	$4294967294, %eax
      	ret
      
      instead of:
      
              movl    4(%esp), %eax
              cmpl    $41, (%eax)
      	movl	$4294967294, %ecx
      	movl	$4294967295, %eax
      	cmova	%ecx, %eax
      	ret
      
      which is smaller in code size and faster. rdar://6668608
      
      llvm-svn: 66868
      4be6df5d
  16. Mar 12, 2009
    • Chris Lattner's avatar
      Move 3 "(add (select cc, 0, c), x) -> (select cc, x, (add, x, c))" · 4147f08e
      Chris Lattner authored
      related transformations out of target-specific dag combine into the
      ARM backend.  These were added by Evan in r37685 with no testcases
      and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).
      
      Add some simple X86-specific (for now) DAG combines that turn things
      like cond ? 8 : 0  -> (zext(cond) << 3).  This happens frequently
      with the recently added cp constant select optimization, but is a
      very general xform.  For example, we now compile the second example
      in const-select.ll to:
      
      _test:
              movsd   LCPI2_0, %xmm0
              ucomisd 8(%esp), %xmm0
              seta    %al
              movzbl  %al, %eax
              movl    4(%esp), %ecx
              movsbl  (%ecx,%eax,4), %eax
              ret
      
      instead of:
      
      _test:
              movl    4(%esp), %eax
              leal    4(%eax), %ecx
              movsd   LCPI2_0, %xmm0
              ucomisd 8(%esp), %xmm0
              cmovbe  %eax, %ecx
              movsbl  (%ecx), %eax
              ret
      
      This passes multisource and dejagnu.
      
      llvm-svn: 66779
      4147f08e
    • Evan Cheng's avatar
      On x86, if the only use of a i64 load is a i64 store, generate a pair of... · ef0b7cc2
      Evan Cheng authored
      On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead.
      
      llvm-svn: 66776
      ef0b7cc2
  17. Mar 11, 2009
  18. Mar 07, 2009
    • Dan Gohman's avatar
      Arithmetic instructions don't set EFLAGS bits OF and CF bits · ff659b5b
      Dan Gohman authored
      the same say the "test" instruction does in overflow cases,
      so eliminating the test is only safe when those bits aren't
      needed, as is the case for COND_E and COND_NE, or if it
      can be proven that no overflow will occur. For now, just
      restrict the optimization to COND_E and COND_NE and don't
      do any overflow analysis.
      
      llvm-svn: 66318
      ff659b5b
  19. Mar 05, 2009
  20. Mar 04, 2009
  21. Feb 27, 2009
    • Rafael Espindola's avatar
      Refactor TLS code and add some tests. The tests and expected results are: · 000421ea
      Rafael Espindola authored
       pic |  declaration | linkage  | visibility |
      
      !pic |  declaration | external | default    | tls1.ll     tls2.ll     | local exec
       pic |  declaration | external | default    | tls1-pic.ll tls2-pic.ll | general dynamic
      !pic | !declaration | external | default    | tls3.ll     tls4.ll     | initial exec
       pic | !declaration | external | default    | tls3-pic.ll tls4-pic.ll | general dynamic
      
      !pic |  declaration | external | hidden     | tls7.ll     tls8.ll     | local exec
       pic |  declaration | external | hidden     | X                       | local dynamic
      !pic | !declaration | external | hidden     | tls9.ll     tls10.ll    | local exec
       pic | !declaration | external | hidden     | X                       | local dynamic
      
      !pic |  declaration | internal | default    | tls5.ll     tls6.ll     | local exec
       pic |  declaration | internal | default    | X                       | local dynamic
      
      The ones marked with an X have not been implemented since local dynamic is not implemented.
      
      llvm-svn: 65632
      000421ea
  22. Feb 25, 2009
  23. Feb 23, 2009
    • Evan Cheng's avatar
      Only v1i16 (i.e. _m64) is returned via RAX / RDX. · 9f8fddee
      Evan Cheng authored
      llvm-svn: 65313
      9f8fddee
    • Nate Begeman's avatar
      Generate better code for v8i16 shuffles on SSE2 · e684da3e
      Nate Begeman authored
      Generate better code for v16i8 shuffles on SSE2 (avoids stack)
      Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
      Document the shuffle matching logic and add some FIXMEs for later further
        cleanups.
      New tests that test the above.
      
      Examples:
      
      New:
      _shuf2:
      	pextrw	$7, %xmm0, %eax
      	punpcklqdq	%xmm1, %xmm0
      	pshuflw	$128, %xmm0, %xmm0
      	pinsrw	$2, %eax, %xmm0
      
      Old:
      _shuf2:
      	pextrw	$2, %xmm0, %eax
      	pextrw	$7, %xmm0, %ecx
      	pinsrw	$2, %ecx, %xmm0
      	pinsrw	$3, %eax, %xmm0
      	movd	%xmm1, %eax
      	pinsrw	$4, %eax, %xmm0
      	ret
      
      =========
      
      New:
      _shuf4:
      	punpcklqdq	%xmm1, %xmm0
      	pshufb	LCPI1_0, %xmm0
      
      Old:
      _shuf4:
      	pextrw	$3, %xmm0, %eax
      	movsd	%xmm1, %xmm0
      	pextrw	$3, %xmm1, %ecx
      	pinsrw	$4, %ecx, %xmm0
      	pinsrw	$5, %eax, %xmm0
      
      ========
      
      New:
      _shuf1:
      	pushl	%ebx
      	pushl	%edi
      	pushl	%esi
      	pextrw	$1, %xmm0, %eax
      	rolw	$8, %ax
      	movd	%xmm0, %ecx
      	rolw	$8, %cx
      	pextrw	$5, %xmm0, %edx
      	pextrw	$4, %xmm0, %esi
      	pextrw	$3, %xmm0, %edi
      	pextrw	$2, %xmm0, %ebx
      	movaps	%xmm0, %xmm1
      	pinsrw	$0, %ecx, %xmm1
      	pinsrw	$1, %eax, %xmm1
      	rolw	$8, %bx
      	pinsrw	$2, %ebx, %xmm1
      	rolw	$8, %di
      	pinsrw	$3, %edi, %xmm1
      	rolw	$8, %si
      	pinsrw	$4, %esi, %xmm1
      	rolw	$8, %dx
      	pinsrw	$5, %edx, %xmm1
      	pextrw	$7, %xmm0, %eax
      	rolw	$8, %ax
      	movaps	%xmm1, %xmm0
      	pinsrw	$7, %eax, %xmm0
      	popl	%esi
      	popl	%edi
      	popl	%ebx
      	ret
      
      Old:
      _shuf1:
      	subl	$252, %esp
      	movaps	%xmm0, (%esp)
      	movaps	%xmm0, 16(%esp)
      	movaps	%xmm0, 32(%esp)
      	movaps	%xmm0, 48(%esp)
      	movaps	%xmm0, 64(%esp)
      	movaps	%xmm0, 80(%esp)
      	movaps	%xmm0, 96(%esp)
      	movaps	%xmm0, 224(%esp)
      	movaps	%xmm0, 208(%esp)
      	movaps	%xmm0, 192(%esp)
      	movaps	%xmm0, 176(%esp)
      	movaps	%xmm0, 160(%esp)
      	movaps	%xmm0, 144(%esp)
      	movaps	%xmm0, 128(%esp)
      	movaps	%xmm0, 112(%esp)
      	movzbl	14(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	22(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm1, %xmm2
      	movzbl	42(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	50(%esp), %eax
      	movd	%eax, %xmm3
      	punpcklbw	%xmm1, %xmm3
      	punpcklbw	%xmm2, %xmm3
      	movzbl	77(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	84(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm1, %xmm2
      	movzbl	104(%esp), %eax
      	movd	%eax, %xmm1
      	punpcklbw	%xmm1, %xmm0
      	punpcklbw	%xmm2, %xmm0
      	movaps	%xmm0, %xmm1
      	punpcklbw	%xmm3, %xmm1
      	movzbl	127(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	135(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm0, %xmm2
      	movzbl	155(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	163(%esp), %eax
      	movd	%eax, %xmm3
      	punpcklbw	%xmm0, %xmm3
      	punpcklbw	%xmm2, %xmm3
      	movzbl	188(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	197(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm0, %xmm2
      	movzbl	217(%esp), %eax
      	movd	%eax, %xmm4
      	movzbl	225(%esp), %eax
      	movd	%eax, %xmm0
      	punpcklbw	%xmm4, %xmm0
      	punpcklbw	%xmm2, %xmm0
      	punpcklbw	%xmm3, %xmm0
      	punpcklbw	%xmm1, %xmm0
      	addl	$252, %esp
      	ret
      
      llvm-svn: 65311
      e684da3e
Loading