Skip to content
  1. Dec 19, 2010
    • Chris Lattner's avatar
      reduce copy/paste programming with the power of for loops. · ae756e19
      Chris Lattner authored
      llvm-svn: 122187
      ae756e19
    • Chris Lattner's avatar
      X86 supports i8/i16 overflow ops (except i8 multiplies), we should · 1e8c032a
      Chris Lattner authored
      generate them.  
      
      Now we compile:
      
      define zeroext i8 @X(i8 signext %a, i8 signext %b) nounwind ssp {
      entry:
        %0 = tail call %0 @llvm.sadd.with.overflow.i8(i8 %a, i8 %b)
        %cmp = extractvalue %0 %0, 1
        br i1 %cmp, label %if.then, label %if.end
      
      into:
      
      _X:                                     ## @X
      ## BB#0:                                ## %entry
      	subl	$12, %esp
      	movb	16(%esp), %al
      	addb	20(%esp), %al
      	jo	LBB0_2
      
      Before we were generating:
      
      _X:                                     ## @X
      ## BB#0:                                ## %entry
      	pushl	%ebp
      	movl	%esp, %ebp
      	subl	$8, %esp
      	movb	12(%ebp), %al
      	testb	%al, %al
      	setge	%cl
      	movb	8(%ebp), %dl
      	testb	%dl, %dl
      	setge	%ah
      	cmpb	%cl, %ah
      	sete	%cl
      	addb	%al, %dl
      	testb	%dl, %dl
      	setge	%al
      	cmpb	%al, %ah
      	setne	%al
      	andb	%cl, %al
      	testb	%al, %al
      	jne	LBB0_2
      
      llvm-svn: 122186
      1e8c032a
  2. Dec 17, 2010
  3. Dec 10, 2010
    • Nate Begeman's avatar
      Formalize the notion that AVX and SSE are non-overlapping extensions from the... · 8b08f523
      Nate Begeman authored
      Formalize the notion that AVX and SSE are non-overlapping extensions from the compiler's point of view.  Per email discussion, we either want to always use VEX-prefixed instructions or never use them, and are taking "HasAVX" to mean "Always use VEX".  Passing -mattr=-avx,+sse42 should serve to restore legacy SSE support when desirable.
      
      llvm-svn: 121439
      8b08f523
  4. Dec 09, 2010
  5. Dec 05, 2010
    • Chris Lattner's avatar
      Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags · 68861717
      Chris Lattner authored
      result.  This allows us to compile:
      
      void *test12(long count) {
            return new int[count];
      }
      
      into:
      
      test12:
      	movl	$4, %ecx
      	movq	%rdi, %rax
      	mulq	%rcx
      	movq	$-1, %rdi
      	cmovnoq	%rax, %rdi
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      test12:
      	movl	$4, %ecx
      	movq	%rdi, %rax
      	mulq	%rcx
      	seto	%cl
      	testb	%cl, %cl
      	movq	$-1, %rdi
      	cmoveq	%rax, %rdi
      	jmp	__Znam
      
      Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
      which would eliminate the need for the 'movq %rdi, %rax'.
      
      llvm-svn: 120936
      68861717
    • Chris Lattner's avatar
      it turns out that when ".with.overflow" intrinsics were added to the X86 · 364bb0a0
      Chris Lattner authored
      backend that they were all implemented except umul.  This one fell back
      to the default implementation that did a hi/lo multiply and compared the
      top.  Fix this to check the overflow flag that the 'mul' instruction
      sets, so we can avoid an explicit test.  Now we compile:
      
      void *func(long count) {
            return new int[count];
      }
      
      into:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
      	testb	%cl, %cl                ## encoding: [0x84,0xc9]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      Other than the silly seto+test, this is using the o bit directly, so it's going in the right
      direction.
      
      llvm-svn: 120935
      364bb0a0
    • Chris Lattner's avatar
      generalize the previous check to handle -1 on either side of the · 116580a1
      Chris Lattner authored
      select, inserting a not to compensate.  Add a missing isZero check
      that I lost somehow.
      
      This improves codegen of:
      
      void *func(long count) {
            return new int[count];
      }
      
      from:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
                                              ## encoding: [0xeb,A]
      
      to:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
      	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
      	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
      	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
      	jmp	__Znam                  ## TAILCALL
                                              ## encoding: [0xeb,A]
      
      llvm-svn: 120932
      116580a1
    • Chris Lattner's avatar
      Improve an integer select optimization in two ways: · 342e6ea5
      Chris Lattner authored
      1. generalize 
          (select (x == 0), -1, 0) -> (sign_bit (x - 1))
      to:
          (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y
      
      2. Handle the identical pattern that happens with !=:
         (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y
      
      cmov is often high latency and can't fold immediates or
      memory operands.  For example for (x == 0) ? -1 : 1, before 
      we got:
      
      < 	testb	%sil, %sil
      < 	movl	$-1, %ecx
      < 	movl	$1, %eax
      < 	cmovel	%ecx, %eax
      
      now we get:
      
      > 	cmpb	$1, %sil
      > 	sbbl	%eax, %eax
      > 	orl	$1, %eax
      
      llvm-svn: 120929
      342e6ea5
  6. Dec 04, 2010
  7. Dec 01, 2010
  8. Nov 30, 2010
  9. Nov 28, 2010
  10. Nov 27, 2010
  11. Nov 23, 2010
  12. Nov 18, 2010
  13. Nov 15, 2010
    • Chris Lattner's avatar
      add targetoperand flags for jump tables, constant pool and block address · edb9d84d
      Chris Lattner authored
      nodes to indicate when ha16/lo16 modifiers should be used.  This lets
      us pass PowerPC/indirectbr.ll.
      
      The one annoying thing about this patch is that the MCSymbolExpr isn't
      expressive enough to represent ha16(label1-label2) which we need on
      PowerPC.  I have a terrible hack in the meantime, but this will have
      to be revisited at some point.
      
      Last major conversion item left is global variable references.
      
      llvm-svn: 119105
      edb9d84d
  14. Nov 14, 2010
  15. Nov 13, 2010
  16. Nov 12, 2010
  17. Nov 03, 2010
  18. Oct 31, 2010
  19. Oct 29, 2010
  20. Oct 27, 2010
  21. Oct 26, 2010
  22. Oct 21, 2010
  23. Oct 20, 2010
  24. Oct 19, 2010
Loading