Skip to content
  1. May 28, 2009
    • Evan Cheng's avatar
      Added optimization that narrow load / op / store and the 'op' is a bit... · a9cda8ab
      Evan Cheng authored
      Added optimization that narrow load / op / store and the 'op' is a bit twiddling instruction and its second operand is an immediate. If bits that are touched by 'op' can be done with a narrower instruction, reduce the width of the load and store as well. This happens a lot with bitfield manipulation code.
      e.g.
      orl     $65536, 8(%rax)
      =>
      orb     $1, 10(%rax)
      
      Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization.
      
      llvm-svn: 72507
      a9cda8ab
  2. May 27, 2009
  3. May 26, 2009
  4. May 24, 2009
  5. May 23, 2009
    • Eli Friedman's avatar
      Make the x86 backend custom-lower UINT_TO_FP and FP_TO_UINT on 32-bit · dfe4f253
      Eli Friedman authored
      systems instead of attempting to promote them to a 64-bit SINT_TO_FP or 
      FP_TO_SINT.  This is in preparation for removing the type legalization 
      code from LegalizeDAG: once type legalization is gone from LegalizeDAG, 
      it won't be able to handle the i64 operand/result correctly.
      
      This isn't quite ideal, but I don't think any other operation for any 
      target ends up in this situation, so treating this case specially seems 
      reasonable.
      
      llvm-svn: 72324
      dfe4f253
  6. May 13, 2009
  7. May 08, 2009
  8. Apr 30, 2009
  9. Apr 29, 2009
  10. Apr 27, 2009
    • Nate Begeman's avatar
      2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan. · 8d6d4b92
      Nate Begeman authored
      PR2957
      
      ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
      mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
      as the shuffle mask.  A value of -1 represents UNDEF.
      
      In addition to eliminating the creation of illegal BUILD_VECTORS just to 
      represent shuffle masks, we are better about canonicalizing the shuffle mask,
      resulting in substantially better code for some classes of shuffles.
      
      llvm-svn: 70225
      8d6d4b92
  11. Apr 24, 2009
  12. Apr 21, 2009
  13. Apr 20, 2009
  14. Apr 17, 2009
  15. Apr 13, 2009
  16. Apr 10, 2009
  17. Apr 09, 2009
  18. Apr 08, 2009
    • Rafael Espindola's avatar
      Re-apply 68552. · 3b2df10c
      Rafael Espindola authored
      Tested by bootstrapping llvm-gcc and using that to build llvm.
      
      llvm-svn: 68645
      3b2df10c
    • Rafael Espindola's avatar
      Avoid a hard coded constant. · d173f423
      Rafael Espindola authored
      llvm-svn: 68603
      d173f423
    • Dan Gohman's avatar
      Implement support for using modeling implicit-zero-extension on x86-64 · ad3e549a
      Dan Gohman authored
      with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce
      SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG
      instructions), and teach the DAGCombiner to take advantage of this on
      targets which support it. This eliminates many redundant
      zero-extension operations on x86-64.
      
      This adds a new TargetLowering hook, isZExtFree. It's similar to
      isTruncateFree, except it only applies to actual definitions, and not
      no-op truncates which may not zero the high bits.
      
      Also, this adds a new optimization to SimplifyDemandedBits: transform
      operations like x+y into (zext (add (trunc x), (trunc y))) on targets
      where all the casts are no-ops. In contexts where the high part of the
      add is explicitly masked off, this allows the mask operation to be
      eliminated. Fix the DAGCombiner to avoid undoing these transformations
      to eliminate casts on targets where the casts are no-ops.
      
      Also, this adds a new two-address lowering heuristic. Since
      two-address lowering runs before coalescing, it helps to be able to
      look through copies when deciding whether commuting and/or
      three-address conversion are profitable.
      
      Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle
      the case that a clobber range extended both before and beyond an
      existing live range. In that case, multiple live ranges need to be
      added. This was exposed by the new subreg coalescing code.
      
      Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the
      spiller behavior it was looking for no longer occurrs with the new
      instruction selection.
      
      llvm-svn: 68576
      ad3e549a
    • Bill Wendling's avatar
      Temporarily revert r68552. This was causing a failure in the self-hosting LLVM · 4aa25b79
      Bill Wendling authored
      builds.
      
      --- Reverse-merging (from foreign repository) r68552 into '.':
      U    test/CodeGen/X86/tls8.ll
      U    test/CodeGen/X86/tls10.ll
      U    test/CodeGen/X86/tls2.ll
      U    test/CodeGen/X86/tls6.ll
      U    lib/Target/X86/X86Instr64bit.td
      U    lib/Target/X86/X86InstrSSE.td
      U    lib/Target/X86/X86InstrInfo.td
      U    lib/Target/X86/X86RegisterInfo.cpp
      U    lib/Target/X86/X86ISelLowering.cpp
      U    lib/Target/X86/X86CodeEmitter.cpp
      U    lib/Target/X86/X86FastISel.cpp
      U    lib/Target/X86/X86InstrInfo.h
      U    lib/Target/X86/X86ISelDAGToDAG.cpp
      U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
      U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
      U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
      U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
      U    lib/Target/X86/X86ISelLowering.h
      U    lib/Target/X86/X86InstrInfo.cpp
      U    lib/Target/X86/X86InstrBuilder.h
      U    lib/Target/X86/X86RegisterInfo.td
      
      llvm-svn: 68560
      4aa25b79
  19. Apr 07, 2009
  20. Apr 03, 2009
  21. Apr 02, 2009
  22. Mar 31, 2009
  23. Mar 30, 2009
  24. Mar 28, 2009
  25. Mar 27, 2009
  26. Mar 26, 2009
  27. Mar 13, 2009
    • Bill Wendling's avatar
      These instructions have special lowering that may lower them to SSE · 798fd56d
      Bill Wendling authored
      instructions. Prevent that if we don't want implicit uses of SSE.
      
      llvm-svn: 66877
      798fd56d
    • Evan Cheng's avatar
      Fix some significant problems with constant pools that resulted in unnecessary... · 1fb8aedd
      Evan Cheng authored
      Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues.
      
      1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
      2. MachineConstantPool alignment field is also a log2 value.
      3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
      4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
      5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
      6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
      
      
      Solutions:
      1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
      2. MachineConstantPool alignment field is also changed to keep non-log2 value.
      3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
      4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
      5. Asm printer uses cheaper data structure to group constant pool entries.
      6. Asm printer compute entry offsets after grouping is done.
      7. Change JIT code to compute entry offsets on the fly.
      
      llvm-svn: 66875
      1fb8aedd
    • Chris Lattner's avatar
      generalize the previous code to use the full generality of LEA · 99cc1337
      Chris Lattner authored
      for i32/i64 expressions (we could also do i16 on cpus where
      i16 lea is fast, but I didn't add this).  On the example, we now
      generate:
      
      _test:
      	movl	4(%esp), %eax
      	cmpl	$42, (%eax)
      	setl	%al
      	movzbl	%al, %eax
      	leal	4(%eax,%eax,8), %eax
      	ret
      
      instead of:
      
      _test:
      	movl	4(%esp), %eax
      	cmpl	$41, (%eax)
      	movl	$4, %ecx
      	movl	$13, %eax
      	cmovg	%ecx, %eax
      	ret
      
      llvm-svn: 66869
      99cc1337
    • Chris Lattner's avatar
      optimize the case of cond ? 42 : 41 and friends. This compiles the · 4be6df5d
      Chris Lattner authored
      example to:
      
      _test:
      	movl	4(%esp), %eax
      	cmpl	$41, (%eax)
      	setg	%al
      	movzbl	%al, %eax
      	orl	$4294967294, %eax
      	ret
      
      instead of:
      
              movl    4(%esp), %eax
              cmpl    $41, (%eax)
      	movl	$4294967294, %ecx
      	movl	$4294967295, %eax
      	cmova	%ecx, %eax
      	ret
      
      which is smaller in code size and faster. rdar://6668608
      
      llvm-svn: 66868
      4be6df5d
Loading