Skip to content
  1. Dec 20, 2010
  2. Dec 19, 2010
  3. Dec 18, 2010
  4. Dec 17, 2010
  5. Dec 16, 2010
  6. Dec 15, 2010
  7. Dec 13, 2010
  8. Dec 11, 2010
  9. Dec 10, 2010
  10. Dec 09, 2010
  11. Dec 07, 2010
  12. Dec 06, 2010
  13. Dec 05, 2010
    • Chris Lattner's avatar
      Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags · 68861717
      Chris Lattner authored
      result.  This allows us to compile:
      
      void *test12(long count) {
            return new int[count];
      }
      
      into:
      
      test12:
      	movl	$4, %ecx
      	movq	%rdi, %rax
      	mulq	%rcx
      	movq	$-1, %rdi
      	cmovnoq	%rax, %rdi
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      test12:
      	movl	$4, %ecx
      	movq	%rdi, %rax
      	mulq	%rcx
      	seto	%cl
      	testb	%cl, %cl
      	movq	$-1, %rdi
      	cmoveq	%rax, %rdi
      	jmp	__Znam
      
      Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
      which would eliminate the need for the 'movq %rdi, %rax'.
      
      llvm-svn: 120936
      68861717
    • Chris Lattner's avatar
      it turns out that when ".with.overflow" intrinsics were added to the X86 · 364bb0a0
      Chris Lattner authored
      backend that they were all implemented except umul.  This one fell back
      to the default implementation that did a hi/lo multiply and compared the
      top.  Fix this to check the overflow flag that the 'mul' instruction
      sets, so we can avoid an explicit test.  Now we compile:
      
      void *func(long count) {
            return new int[count];
      }
      
      into:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
      	testb	%cl, %cl                ## encoding: [0x84,0xc9]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      Other than the silly seto+test, this is using the o bit directly, so it's going in the right
      direction.
      
      llvm-svn: 120935
      364bb0a0
    • Chris Lattner's avatar
      generalize the previous check to handle -1 on either side of the · 116580a1
      Chris Lattner authored
      select, inserting a not to compensate.  Add a missing isZero check
      that I lost somehow.
      
      This improves codegen of:
      
      void *func(long count) {
            return new int[count];
      }
      
      from:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
                                              ## encoding: [0xeb,A]
      
      to:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
      	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
      	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
      	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
      	jmp	__Znam                  ## TAILCALL
                                              ## encoding: [0xeb,A]
      
      llvm-svn: 120932
      116580a1
    • Chris Lattner's avatar
      Improve an integer select optimization in two ways: · 342e6ea5
      Chris Lattner authored
      1. generalize 
          (select (x == 0), -1, 0) -> (sign_bit (x - 1))
      to:
          (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y
      
      2. Handle the identical pattern that happens with !=:
         (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y
      
      cmov is often high latency and can't fold immediates or
      memory operands.  For example for (x == 0) ? -1 : 1, before 
      we got:
      
      < 	testb	%sil, %sil
      < 	movl	$-1, %ecx
      < 	movl	$1, %eax
      < 	cmovel	%ecx, %eax
      
      now we get:
      
      > 	cmpb	$1, %sil
      > 	sbbl	%eax, %eax
      > 	orl	$1, %eax
      
      llvm-svn: 120929
      342e6ea5
Loading