Skip to content
  1. Dec 09, 2010
  2. Dec 07, 2010
  3. Dec 06, 2010
  4. Dec 05, 2010
    • Chris Lattner's avatar
      Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags · 68861717
      Chris Lattner authored
      result.  This allows us to compile:
      
      void *test12(long count) {
            return new int[count];
      }
      
      into:
      
      test12:
      	movl	$4, %ecx
      	movq	%rdi, %rax
      	mulq	%rcx
      	movq	$-1, %rdi
      	cmovnoq	%rax, %rdi
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      test12:
      	movl	$4, %ecx
      	movq	%rdi, %rax
      	mulq	%rcx
      	seto	%cl
      	testb	%cl, %cl
      	movq	$-1, %rdi
      	cmoveq	%rax, %rdi
      	jmp	__Znam
      
      Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
      which would eliminate the need for the 'movq %rdi, %rax'.
      
      llvm-svn: 120936
      68861717
    • Chris Lattner's avatar
      it turns out that when ".with.overflow" intrinsics were added to the X86 · 364bb0a0
      Chris Lattner authored
      backend that they were all implemented except umul.  This one fell back
      to the default implementation that did a hi/lo multiply and compared the
      top.  Fix this to check the overflow flag that the 'mul' instruction
      sets, so we can avoid an explicit test.  Now we compile:
      
      void *func(long count) {
            return new int[count];
      }
      
      into:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
      	testb	%cl, %cl                ## encoding: [0x84,0xc9]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      Other than the silly seto+test, this is using the o bit directly, so it's going in the right
      direction.
      
      llvm-svn: 120935
      364bb0a0
    • Chris Lattner's avatar
      generalize the previous check to handle -1 on either side of the · 116580a1
      Chris Lattner authored
      select, inserting a not to compensate.  Add a missing isZero check
      that I lost somehow.
      
      This improves codegen of:
      
      void *func(long count) {
            return new int[count];
      }
      
      from:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
                                              ## encoding: [0xeb,A]
      
      to:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
      	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
      	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
      	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
      	jmp	__Znam                  ## TAILCALL
                                              ## encoding: [0xeb,A]
      
      llvm-svn: 120932
      116580a1
    • Chris Lattner's avatar
      Improve an integer select optimization in two ways: · 342e6ea5
      Chris Lattner authored
      1. generalize 
          (select (x == 0), -1, 0) -> (sign_bit (x - 1))
      to:
          (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y
      
      2. Handle the identical pattern that happens with !=:
         (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y
      
      cmov is often high latency and can't fold immediates or
      memory operands.  For example for (x == 0) ? -1 : 1, before 
      we got:
      
      < 	testb	%sil, %sil
      < 	movl	$-1, %ecx
      < 	movl	$1, %eax
      < 	cmovel	%ecx, %eax
      
      now we get:
      
      > 	cmpb	$1, %sil
      > 	sbbl	%eax, %eax
      > 	orl	$1, %eax
      
      llvm-svn: 120929
      342e6ea5
    • Bill Wendling's avatar
      Initialize HasPOPCNT. · 2bce78e8
      Bill Wendling authored
      llvm-svn: 120923
      2bce78e8
  5. Dec 04, 2010
    • Benjamin Kramer's avatar
      Add patterns for the x86 popcnt instruction. · 2f489236
      Benjamin Kramer authored
      - Also adds a new POPCNT subtarget feature that is currently enabled if the target
        supports SSE4.2 (nehalem) or SSE4A (barcelona).
      
      llvm-svn: 120917
      2f489236
    • Benjamin Kramer's avatar
      Simplify code. No functionality change. · 8ceebfaa
      Benjamin Kramer authored
      llvm-svn: 120907
      8ceebfaa
    • Rafael Espindola's avatar
      There are two reasons why we might want to use · 1c8ac8f0
      Rafael Espindola authored
      foo = a - b
      .long foo
      instead of just
      .long a - b
      
      First, on darwin9 64 bits the assembler produces the wrong result. Second,
      if "a" is the end of the section all darwin assemblers (9, 10 and mc) will not
      consider a - b to be a constant but will if the dummy foo is created.
      
      Split how we handle these cases. The first one is something MC should take care
      of. The second one has to be handled by the caller.
      
      llvm-svn: 120889
      1c8ac8f0
  6. Dec 03, 2010
  7. Dec 02, 2010
  8. Dec 01, 2010
  9. Nov 30, 2010
  10. Nov 29, 2010
  11. Nov 28, 2010
  12. Nov 27, 2010
Loading