Skip to content
  1. Dec 19, 2010
    • Chris Lattner's avatar
      reduce copy/paste programming with the power of for loops. · ae756e19
      Chris Lattner authored
      llvm-svn: 122187
      ae756e19
    • Chris Lattner's avatar
      X86 supports i8/i16 overflow ops (except i8 multiplies), we should · 1e8c032a
      Chris Lattner authored
      generate them.  
      
      Now we compile:
      
      define zeroext i8 @X(i8 signext %a, i8 signext %b) nounwind ssp {
      entry:
        %0 = tail call %0 @llvm.sadd.with.overflow.i8(i8 %a, i8 %b)
        %cmp = extractvalue %0 %0, 1
        br i1 %cmp, label %if.then, label %if.end
      
      into:
      
      _X:                                     ## @X
      ## BB#0:                                ## %entry
      	subl	$12, %esp
      	movb	16(%esp), %al
      	addb	20(%esp), %al
      	jo	LBB0_2
      
      Before we were generating:
      
      _X:                                     ## @X
      ## BB#0:                                ## %entry
      	pushl	%ebp
      	movl	%esp, %ebp
      	subl	$8, %esp
      	movb	12(%ebp), %al
      	testb	%al, %al
      	setge	%cl
      	movb	8(%ebp), %dl
      	testb	%dl, %dl
      	setge	%ah
      	cmpb	%cl, %ah
      	sete	%cl
      	addb	%al, %dl
      	testb	%dl, %dl
      	setge	%al
      	cmpb	%al, %ah
      	setne	%al
      	andb	%cl, %al
      	testb	%al, %al
      	jne	LBB0_2
      
      llvm-svn: 122186
      1e8c032a
  2. Dec 18, 2010
  3. Dec 17, 2010
  4. Dec 16, 2010
  5. Dec 15, 2010
  6. Dec 13, 2010
  7. Dec 11, 2010
  8. Dec 10, 2010
  9. Dec 09, 2010
  10. Dec 07, 2010
  11. Dec 06, 2010
  12. Dec 05, 2010
    • Chris Lattner's avatar
      Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags · 68861717
      Chris Lattner authored
      result.  This allows us to compile:
      
      void *test12(long count) {
            return new int[count];
      }
      
      into:
      
      test12:
      	movl	$4, %ecx
      	movq	%rdi, %rax
      	mulq	%rcx
      	movq	$-1, %rdi
      	cmovnoq	%rax, %rdi
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      test12:
      	movl	$4, %ecx
      	movq	%rdi, %rax
      	mulq	%rcx
      	seto	%cl
      	testb	%cl, %cl
      	movq	$-1, %rdi
      	cmoveq	%rax, %rdi
      	jmp	__Znam
      
      Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
      which would eliminate the need for the 'movq %rdi, %rax'.
      
      llvm-svn: 120936
      68861717
    • Chris Lattner's avatar
      it turns out that when ".with.overflow" intrinsics were added to the X86 · 364bb0a0
      Chris Lattner authored
      backend that they were all implemented except umul.  This one fell back
      to the default implementation that did a hi/lo multiply and compared the
      top.  Fix this to check the overflow flag that the 'mul' instruction
      sets, so we can avoid an explicit test.  Now we compile:
      
      void *func(long count) {
            return new int[count];
      }
      
      into:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
      	testb	%cl, %cl                ## encoding: [0x84,0xc9]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      Other than the silly seto+test, this is using the o bit directly, so it's going in the right
      direction.
      
      llvm-svn: 120935
      364bb0a0
    • Chris Lattner's avatar
      generalize the previous check to handle -1 on either side of the · 116580a1
      Chris Lattner authored
      select, inserting a not to compensate.  Add a missing isZero check
      that I lost somehow.
      
      This improves codegen of:
      
      void *func(long count) {
            return new int[count];
      }
      
      from:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
                                              ## encoding: [0xeb,A]
      
      to:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
      	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
      	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
      	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
      	jmp	__Znam                  ## TAILCALL
                                              ## encoding: [0xeb,A]
      
      llvm-svn: 120932
      116580a1
    • Chris Lattner's avatar
      Improve an integer select optimization in two ways: · 342e6ea5
      Chris Lattner authored
      1. generalize 
          (select (x == 0), -1, 0) -> (sign_bit (x - 1))
      to:
          (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y
      
      2. Handle the identical pattern that happens with !=:
         (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y
      
      cmov is often high latency and can't fold immediates or
      memory operands.  For example for (x == 0) ? -1 : 1, before 
      we got:
      
      < 	testb	%sil, %sil
      < 	movl	$-1, %ecx
      < 	movl	$1, %eax
      < 	cmovel	%ecx, %eax
      
      now we get:
      
      > 	cmpb	$1, %sil
      > 	sbbl	%eax, %eax
      > 	orl	$1, %eax
      
      llvm-svn: 120929
      342e6ea5
    • Bill Wendling's avatar
      Initialize HasPOPCNT. · 2bce78e8
      Bill Wendling authored
      llvm-svn: 120923
      2bce78e8
  13. Dec 04, 2010
    • Benjamin Kramer's avatar
      Add patterns for the x86 popcnt instruction. · 2f489236
      Benjamin Kramer authored
      - Also adds a new POPCNT subtarget feature that is currently enabled if the target
        supports SSE4.2 (nehalem) or SSE4A (barcelona).
      
      llvm-svn: 120917
      2f489236
    • Benjamin Kramer's avatar
      Simplify code. No functionality change. · 8ceebfaa
      Benjamin Kramer authored
      llvm-svn: 120907
      8ceebfaa
    • Rafael Espindola's avatar
      There are two reasons why we might want to use · 1c8ac8f0
      Rafael Espindola authored
      foo = a - b
      .long foo
      instead of just
      .long a - b
      
      First, on darwin9 64 bits the assembler produces the wrong result. Second,
      if "a" is the end of the section all darwin assemblers (9, 10 and mc) will not
      consider a - b to be a constant but will if the dummy foo is created.
      
      Split how we handle these cases. The first one is something MC should take care
      of. The second one has to be handled by the caller.
      
      llvm-svn: 120889
      1c8ac8f0
Loading