Skip to content
  1. Aug 23, 2010
  2. Aug 21, 2010
  3. Aug 17, 2010
    • Anton Korobeynikov's avatar
      More fixes for win64: · 231ab847
      Anton Korobeynikov authored
        - Do not clobber al during variadic calls, this is AMD64 ABI-only feature
        - Emit wincall64, where necessary
      Patch by Cameron Esfahani!
      
      llvm-svn: 111289
      231ab847
  4. Aug 14, 2010
  5. Aug 13, 2010
  6. Aug 12, 2010
  7. Aug 11, 2010
    • Dan Gohman's avatar
      Use ISD::ADD instead of ISD::SUB with a negated constant. This · 5531aa4d
      Dan Gohman authored
      avoids trouble if the return type of TD->getPointerSize() is
      changed to something which doesn't promote to a signed type,
      and is simpler anyway.
      
      Also, use getCopyFromReg instead of getRegister to read a
      physical register's value.
      
      llvm-svn: 110835
      5531aa4d
    • Bruno Cardoso Lopes's avatar
      Add AVX matching patterns to Packed Bit Test intrinsics. · 91d61df3
      Bruno Cardoso Lopes authored
      Apply the same approach of SSE4.1 ptest intrinsics but
      create a new x86 node "testp" since AVX introduces
      vtest{ps}{pd} instructions which set ZF and CF depending
      on sign bit AND and ANDN of packed floating-point sources.
      
      This is slightly different from what the "ptest" does.
      Tests comming with the other 256 intrinsics tests.
      
      llvm-svn: 110744
      91d61df3
  8. Aug 10, 2010
  9. Aug 06, 2010
  10. Aug 05, 2010
  11. Jul 30, 2010
  12. Jul 29, 2010
    • Jakob Stoklund Olesen's avatar
      Revert r109652, and remove the offending assert in loadRegFromStackSlot instead. · ba0e124a
      Jakob Stoklund Olesen authored
      We do sometimes load from a too small stack slot when dealing with x86 arguments
      (varargs and smaller-than-32-bit args). It looks like we know what we are doing
      in those cases, so I am going to remove the assert instead of artifically
      enlarging stack slot sizes.
      
      The assert in storeRegToStackSlot stays in. We don't want to write beyond the
      bounds of a stack slot.
      
      llvm-svn: 109764
      ba0e124a
  13. Jul 28, 2010
    • Jakob Stoklund Olesen's avatar
      Create a fixed stack object for varargs that is as large as any register. · f2234fbe
      Jakob Stoklund Olesen authored
      The size of this object isn't used for anything - technically it is of variable
      size.
      
      This avoids a false positive from the assert in
      X86InstrInfo::loadRegFromStackSlot, and fixes PR7735.
      
      llvm-svn: 109652
      f2234fbe
    • Nate Begeman's avatar
      Implement a vectorized algorithm for <16 x i8> << <16 x i8> · 53afc8f0
      Nate Begeman authored
      This is about 4x faster and smaller than the existing scalarization.
      
      llvm-svn: 109566
      53afc8f0
    • Nate Begeman's avatar
      ~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller... · 269a6da0
      Nate Begeman authored
      ~40% faster vector shl <4 x i32> on SSE 4.1  Larger improvements for smaller types coming in future patches.
      
      For:
      
      define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp {
      entry:
        %shl = shl <4 x i32> %r, %a                     ; <<4 x i32>> [#uses=1]
        %tmp2 = bitcast <4 x i32> %shl to <2 x i64>     ; <<2 x i64>> [#uses=1]
        ret <2 x i64> %tmp2
      }
      
      We get:
      
      _shl:                                   ## @shl
      	pslld	$23, %xmm1
      	paddd	LCPI0_0, %xmm1
      	cvttps2dq	%xmm1, %xmm1
      	pmulld	%xmm1, %xmm0
      	ret
      
      Instead of:
      
      _shl:                                   ## @shl
      	pshufd	$3, %xmm0, %xmm2
      	movd	%xmm2, %eax
      	pshufd	$3, %xmm1, %xmm2
      	movd	%xmm2, %ecx
      	shll	%cl, %eax
      	movd	%eax, %xmm2
      	pshufd	$1, %xmm0, %xmm3
      	movd	%xmm3, %eax
      	pshufd	$1, %xmm1, %xmm3
      	movd	%xmm3, %ecx
      	shll	%cl, %eax
      	movd	%eax, %xmm3
      	punpckldq	%xmm2, %xmm3
      	movd	%xmm0, %eax
      	movd	%xmm1, %ecx
      	shll	%cl, %eax
      	movd	%eax, %xmm2
      	movhlps	%xmm0, %xmm0
      	movd	%xmm0, %eax
      	movhlps	%xmm1, %xmm1
      	movd	%xmm1, %ecx
      	shll	%cl, %eax
      	movd	%eax, %xmm0
      	punpckldq	%xmm0, %xmm2
      	movdqa	%xmm2, %xmm0
      	punpckldq	%xmm3, %xmm0
      	ret
      
      llvm-svn: 109549
      269a6da0
  14. Jul 26, 2010
  15. Jul 24, 2010
    • Evan Cheng's avatar
      Add an ILP scheduler. This is a register pressure aware scheduler that's · 37b740c4
      Evan Cheng authored
      appropriate for targets without detailed instruction iterineries.
      The scheduler schedules for increased instruction level parallelism in
      low register pressure situation; it schedules to reduce register pressure
      when the register pressure becomes high.
      
      On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2
      by 16%.
      
      llvm-svn: 109300
      37b740c4
  16. Jul 23, 2010
    • Dale Johannesen's avatar
      The only supported calling convention for X86-64 uses · f2d75670
      Dale Johannesen authored
      SSE, so we can't return floating point values if this
      is disabled.  Detect this error for clang.
      
      With SSE1 only, f64 is a problem; it can be done, but
      neither llvm-gcc nor clang has ever generated correct
      code for it.  Since nobody noticed this I think it's
      OK to treat it as an error for now.
      
      This also handles SSE-sized vectors of floating point.
      8207686, 8204109.
      
      llvm-svn: 109201
      f2d75670
  17. Jul 22, 2010
  18. Jul 21, 2010
  19. Jul 16, 2010
    • Evan Cheng's avatar
      Split -enable-finite-only-fp-math to two options: · 55f0c6b9
      Evan Cheng authored
      -enable-no-nans-fp-math and -enable-no-infs-fp-math. All of the current codegen fp math optimizations only care whether the fp arithmetics arguments and results can never be NaN.
      
      llvm-svn: 108465
      55f0c6b9
  20. Jul 15, 2010
  21. Jul 14, 2010
  22. Jul 10, 2010
  23. Jul 09, 2010
    • Bob Wilson's avatar
      --- Reverse-merging r107947 into '.': · 6586e9b2
      Bob Wilson authored
      U    utils/TableGen/FastISelEmitter.cpp
      --- Reverse-merging r107943 into '.':
      U    test/CodeGen/X86/fast-isel.ll
      U    test/CodeGen/X86/fast-isel-loads.ll
      U    include/llvm/Target/TargetLowering.h
      U    include/llvm/Support/PassNameParser.h
      U    include/llvm/CodeGen/FunctionLoweringInfo.h
      U    include/llvm/CodeGen/CallingConvLower.h
      U    include/llvm/CodeGen/FastISel.h
      U    include/llvm/CodeGen/SelectionDAGISel.h
      U    lib/CodeGen/LLVMTargetMachine.cpp
      U    lib/CodeGen/CallingConvLower.cpp
      U    lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
      U    lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
      U    lib/CodeGen/SelectionDAG/FastISel.cpp
      U    lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
      U    lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
      U    lib/CodeGen/SelectionDAG/InstrEmitter.cpp
      U    lib/CodeGen/SelectionDAG/TargetLowering.cpp
      U    lib/Target/XCore/XCoreISelLowering.cpp
      U    lib/Target/XCore/XCoreISelLowering.h
      U    lib/Target/X86/X86ISelLowering.cpp
      U    lib/Target/X86/X86FastISel.cpp
      U    lib/Target/X86/X86ISelLowering.h
      
      llvm-svn: 107987
      6586e9b2
    • Dan Gohman's avatar
      Fix the memoperand offsets in code generated for va_start. · 0a7d155d
      Dan Gohman authored
      llvm-svn: 107948
      0a7d155d
    • Dan Gohman's avatar
      Re-apply bottom-up fast-isel, with fixes. Be very careful to avoid emitting · 0b5aa1cd
      Dan Gohman authored
      a DBG_VALUE after a terminator, or emitting any instructions before an EH_LABEL.
      
      llvm-svn: 107943
      0b5aa1cd
    • Chris Lattner's avatar
      Change LEA to have 5 operands for its memory operand, just · f469307c
      Chris Lattner authored
      like all other instructions, even though a segment is not
      allowed.  This resolves a bunch of gross hacks in the 
      encoder and makes LEA more consistent with the rest of the
      instruction set.
      
      No functionality change.
      
      llvm-svn: 107934
      f469307c
    • Chris Lattner's avatar
      add some long-overdue enums to refer to the parts of the 5-operand · ec536276
      Chris Lattner authored
      X86 memory operand.
      
      llvm-svn: 107925
      ec536276
  24. Jul 08, 2010
Loading