Skip to content
  1. Dec 20, 2010
  2. Dec 19, 2010
  3. Dec 17, 2010
  4. Dec 10, 2010
    • Nate Begeman's avatar
      Formalize the notion that AVX and SSE are non-overlapping extensions from the... · 8b08f523
      Nate Begeman authored
      Formalize the notion that AVX and SSE are non-overlapping extensions from the compiler's point of view.  Per email discussion, we either want to always use VEX-prefixed instructions or never use them, and are taking "HasAVX" to mean "Always use VEX".  Passing -mattr=-avx,+sse42 should serve to restore legacy SSE support when desirable.
      
      llvm-svn: 121439
      8b08f523
  5. Dec 09, 2010
  6. Dec 05, 2010
    • Chris Lattner's avatar
      Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags · 68861717
      Chris Lattner authored
      result.  This allows us to compile:
      
      void *test12(long count) {
            return new int[count];
      }
      
      into:
      
      test12:
      	movl	$4, %ecx
      	movq	%rdi, %rax
      	mulq	%rcx
      	movq	$-1, %rdi
      	cmovnoq	%rax, %rdi
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      test12:
      	movl	$4, %ecx
      	movq	%rdi, %rax
      	mulq	%rcx
      	seto	%cl
      	testb	%cl, %cl
      	movq	$-1, %rdi
      	cmoveq	%rax, %rdi
      	jmp	__Znam
      
      Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
      which would eliminate the need for the 'movq %rdi, %rax'.
      
      llvm-svn: 120936
      68861717
    • Chris Lattner's avatar
      it turns out that when ".with.overflow" intrinsics were added to the X86 · 364bb0a0
      Chris Lattner authored
      backend that they were all implemented except umul.  This one fell back
      to the default implementation that did a hi/lo multiply and compared the
      top.  Fix this to check the overflow flag that the 'mul' instruction
      sets, so we can avoid an explicit test.  Now we compile:
      
      void *func(long count) {
            return new int[count];
      }
      
      into:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
      	testb	%cl, %cl                ## encoding: [0x84,0xc9]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      Other than the silly seto+test, this is using the o bit directly, so it's going in the right
      direction.
      
      llvm-svn: 120935
      364bb0a0
    • Chris Lattner's avatar
      generalize the previous check to handle -1 on either side of the · 116580a1
      Chris Lattner authored
      select, inserting a not to compensate.  Add a missing isZero check
      that I lost somehow.
      
      This improves codegen of:
      
      void *func(long count) {
            return new int[count];
      }
      
      from:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
                                              ## encoding: [0xeb,A]
      
      to:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
      	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
      	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
      	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
      	jmp	__Znam                  ## TAILCALL
                                              ## encoding: [0xeb,A]
      
      llvm-svn: 120932
      116580a1
    • Chris Lattner's avatar
      Improve an integer select optimization in two ways: · 342e6ea5
      Chris Lattner authored
      1. generalize 
          (select (x == 0), -1, 0) -> (sign_bit (x - 1))
      to:
          (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y
      
      2. Handle the identical pattern that happens with !=:
         (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y
      
      cmov is often high latency and can't fold immediates or
      memory operands.  For example for (x == 0) ? -1 : 1, before 
      we got:
      
      < 	testb	%sil, %sil
      < 	movl	$-1, %ecx
      < 	movl	$1, %eax
      < 	cmovel	%ecx, %eax
      
      now we get:
      
      > 	cmpb	$1, %sil
      > 	sbbl	%eax, %eax
      > 	orl	$1, %eax
      
      llvm-svn: 120929
      342e6ea5
  7. Dec 04, 2010
  8. Dec 01, 2010
  9. Nov 30, 2010
  10. Nov 28, 2010
  11. Nov 27, 2010
  12. Nov 23, 2010
  13. Nov 18, 2010
  14. Nov 15, 2010
    • Chris Lattner's avatar
      add targetoperand flags for jump tables, constant pool and block address · edb9d84d
      Chris Lattner authored
      nodes to indicate when ha16/lo16 modifiers should be used.  This lets
      us pass PowerPC/indirectbr.ll.
      
      The one annoying thing about this patch is that the MCSymbolExpr isn't
      expressive enough to represent ha16(label1-label2) which we need on
      PowerPC.  I have a terrible hack in the meantime, but this will have
      to be revisited at some point.
      
      Last major conversion item left is global variable references.
      
      llvm-svn: 119105
      edb9d84d
  15. Nov 14, 2010
  16. Nov 13, 2010
  17. Nov 12, 2010
  18. Nov 03, 2010
  19. Oct 31, 2010
  20. Oct 29, 2010
  21. Oct 27, 2010
  22. Oct 26, 2010
Loading