Skip to content
  1. Feb 26, 2009
  2. Feb 25, 2009
  3. Feb 24, 2009
  4. Feb 23, 2009
    • Dan Gohman's avatar
      Fast-isel can't do TLS yet, so it should fall back to SDISel · 318d7376
      Dan Gohman authored
      if it sees TLS addresses.
      
      llvm-svn: 65341
      318d7376
    • Dan Gohman's avatar
      LoopDeletion needs to inform ScalarEvolution when a loop is deleted, · e591411f
      Dan Gohman authored
      so that ScalarEvolution doesn't hang onto a dangling Loop*, which
      could be a problem if another Loop happens to get allocated at the
      same address.
      
      llvm-svn: 65323
      e591411f
    • Dan Gohman's avatar
      IndVarSimplify preserves ScalarEvolution. In the · 42987f52
      Dan Gohman authored
      -std-compile-opts sequence, this avoids the need for ScalarEvolution to
      be rerun before LoopDeletion.
      
      llvm-svn: 65318
      42987f52
    • Zhou Sheng's avatar
      Should reset DBI_Prev if DBI_Next == 0. · 3a86bcf1
      Zhou Sheng authored
      llvm-svn: 65314
      3a86bcf1
    • Evan Cheng's avatar
      Only v1i16 (i.e. _m64) is returned via RAX / RDX. · 9f8fddee
      Evan Cheng authored
      llvm-svn: 65313
      9f8fddee
    • Nate Begeman's avatar
      Generate better code for v8i16 shuffles on SSE2 · e684da3e
      Nate Begeman authored
      Generate better code for v16i8 shuffles on SSE2 (avoids stack)
      Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
      Document the shuffle matching logic and add some FIXMEs for later further
        cleanups.
      New tests that test the above.
      
      Examples:
      
      New:
      _shuf2:
      	pextrw	$7, %xmm0, %eax
      	punpcklqdq	%xmm1, %xmm0
      	pshuflw	$128, %xmm0, %xmm0
      	pinsrw	$2, %eax, %xmm0
      
      Old:
      _shuf2:
      	pextrw	$2, %xmm0, %eax
      	pextrw	$7, %xmm0, %ecx
      	pinsrw	$2, %ecx, %xmm0
      	pinsrw	$3, %eax, %xmm0
      	movd	%xmm1, %eax
      	pinsrw	$4, %eax, %xmm0
      	ret
      
      =========
      
      New:
      _shuf4:
      	punpcklqdq	%xmm1, %xmm0
      	pshufb	LCPI1_0, %xmm0
      
      Old:
      _shuf4:
      	pextrw	$3, %xmm0, %eax
      	movsd	%xmm1, %xmm0
      	pextrw	$3, %xmm1, %ecx
      	pinsrw	$4, %ecx, %xmm0
      	pinsrw	$5, %eax, %xmm0
      
      ========
      
      New:
      _shuf1:
      	pushl	%ebx
      	pushl	%edi
      	pushl	%esi
      	pextrw	$1, %xmm0, %eax
      	rolw	$8, %ax
      	movd	%xmm0, %ecx
      	rolw	$8, %cx
      	pextrw	$5, %xmm0, %edx
      	pextrw	$4, %xmm0, %esi
      	pextrw	$3, %xmm0, %edi
      	pextrw	$2, %xmm0, %ebx
      	movaps	%xmm0, %xmm1
      	pinsrw	$0, %ecx, %xmm1
      	pinsrw	$1, %eax, %xmm1
      	rolw	$8, %bx
      	pinsrw	$2, %ebx, %xmm1
      	rolw	$8, %di
      	pinsrw	$3, %edi, %xmm1
      	rolw	$8, %si
      	pinsrw	$4, %esi, %xmm1
      	rolw	$8, %dx
      	pinsrw	$5, %edx, %xmm1
      	pextrw	$7, %xmm0, %eax
      	rolw	$8, %ax
      	movaps	%xmm1, %xmm0
      	pinsrw	$7, %eax, %xmm0
      	popl	%esi
      	popl	%edi
      	popl	%ebx
      	ret
      
      Old:
      _shuf1:
      	subl	$252, %esp
      	movaps	%xmm0, (%esp)
      	movaps	%xmm0, 16(%esp)
      	movaps	%xmm0, 32(%esp)
      	movaps	%xmm0, 48(%esp)
      	movaps	%xmm0, 64(%esp)
      	movaps	%xmm0, 80(%esp)
      	movaps	%xmm0, 96(%esp)
      	movaps	%xmm0, 224(%esp)
      	movaps	%xmm0, 208(%esp)
      	movaps	%xmm0, 192(%esp)
      	movaps	%xmm0, 176(%esp)
      	movaps	%xmm0, 160(%esp)
      	movaps	%xmm0, 144(%esp)
      	movaps	%xmm0, 128(%esp)
      	movaps	%xmm0, 112(%esp)
      	movzbl	14(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	22(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm1, %xmm2
      	movzbl	42(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	50(%esp), %eax
      	movd	%eax, %xmm3
      	punpcklbw	%xmm1, %xmm3
      	punpcklbw	%xmm2, %xmm3
      	movzbl	77(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	84(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm1, %xmm2
      	movzbl	104(%esp), %eax
      	movd	%eax, %xmm1
      	punpcklbw	%xmm1, %xmm0
      	punpcklbw	%xmm2, %xmm0
      	movaps	%xmm0, %xmm1
      	punpcklbw	%xmm3, %xmm1
      	movzbl	127(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	135(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm0, %xmm2
      	movzbl	155(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	163(%esp), %eax
      	movd	%eax, %xmm3
      	punpcklbw	%xmm0, %xmm3
      	punpcklbw	%xmm2, %xmm3
      	movzbl	188(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	197(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm0, %xmm2
      	movzbl	217(%esp), %eax
      	movd	%eax, %xmm4
      	movzbl	225(%esp), %eax
      	movd	%eax, %xmm0
      	punpcklbw	%xmm4, %xmm0
      	punpcklbw	%xmm2, %xmm0
      	punpcklbw	%xmm3, %xmm0
      	punpcklbw	%xmm1, %xmm0
      	addl	$252, %esp
      	ret
      
      llvm-svn: 65311
      e684da3e
    • Mon P Wang's avatar
      Changed option name from inline-threshold to basic-inline-threshold because · dccfa0b2
      Mon P Wang authored
      inline-threshold option is used by the inliner.
      
      llvm-svn: 65309
      dccfa0b2
    • Chris Lattner's avatar
      fix some typos that Duncan noticed · d5420f09
      Chris Lattner authored
      llvm-svn: 65306
      d5420f09
    • Bill Wendling's avatar
      Propagate debug loc info through prologue/epilogue. · 9ee052bc
      Bill Wendling authored
      llvm-svn: 65298
      9ee052bc
    • Scott Michel's avatar
      Introduce the BuildVectorSDNode class that encapsulates the ISD::BUILD_VECTOR · 9d31aca6
      Scott Michel authored
      instruction. The class also consolidates the code for detecting constant
      splats that's shared across PowerPC and the CellSPU backends (and might be
      useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
      generating new BUILD_VECTOR nodes.
      
      llvm-svn: 65296
      9d31aca6
  5. Feb 22, 2009
    • Dan Gohman's avatar
      Revert the part of 64623 that attempted to align the source in a · 648c5e9c
      Dan Gohman authored
      memcpy to match the alignment of the destination. It isn't necessary
      for making loads and stores handled like the SSE loadu/storeu
      intrinsics, and it was causing a performance regression in
      MultiSource/Applications/JM/lencod.
      
      The problem appears to have been a memcpy that copies from some
      highly aligned array into an alloca; the alloca was then being
      assigned a large alignment, which required codegen to perform
      dynamic stack-pointer re-alignment, which forced the enclosing
      function to have a frame pointer, which led to increased spilling.
      
      llvm-svn: 65289
      648c5e9c
    • Dan Gohman's avatar
      Properly parenthesize this expression, fixing a real bug in the new · f394e58a
      Dan Gohman authored
      -full-lsr code, as well as a GCC warning.
      
      llvm-svn: 65288
      f394e58a
Loading