Skip to content
  1. Dec 05, 2010
    • Cameron Zwarich's avatar
      Some cleanup before I start committing some incremental progress on · c7223a3e
      Cameron Zwarich authored
      StrongPHIElimination.
      
      llvm-svn: 120961
      c7223a3e
    • Evan Cheng's avatar
      Making use of VFP / NEON floating point multiply-accumulate / subtraction is · 62c7b5bf
      Evan Cheng authored
      difficult on current ARM implementations for a few reasons.
      1. Even though a single vmla has latency that is one cycle shorter than a pair
         of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
         additional pipeline stall. So it's frequently better to single codegen
         vmul + vadd.
      2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
         stall for 4 cycles. We need to schedule them apart.
      3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
         vmla + vmla is very bad. But this isn't ideal either:
           vmul
           vadd
           vmla
         Instead, we want to expand the second vmla:
           vmla
           vmul
           vadd
         Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
         faster.
      
      Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
      but it isn't the optimial solution. This patch attempts to make it possible to
      use vmla / vmls in cases where it is profitable.
      
      A. Add missing isel predicates which cause vmla to be codegen'ed.
      B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
         compute a fmul and a fmla.
      C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
         fp instructions (except for the #3 exceptional case).
      D. Add ARM hazard recognizer to model the vmla / vmls hazards.
      E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
         vmla / vmls will trigger one of the special hazards.
      
      Work in progress, only A+B are enabled.
      
      llvm-svn: 120960
      62c7b5bf
    • Cameron Zwarich's avatar
      Remove the PHIElimination.h header, as it is no longer needed. · a3fb8cb3
      Cameron Zwarich authored
      llvm-svn: 120959
      a3fb8cb3
    • Frits van Bommel's avatar
      Clarify some of the differences between indexing with getelementptr and... · 7cf63ace
      Frits van Bommel authored
      Clarify some of the differences between indexing with getelementptr and indexing with insertvalue/extractvalue.
      
      llvm-svn: 120957
      7cf63ace
    • Frits van Bommel's avatar
      Fix PR 4170 by having ExtractValueInst::getIndexedType() reject out-of-bounds indexing. · 16ebe77b
      Frits van Bommel authored
      Also add asserts that the indices are valid in InsertValueInst::init(). ExtractValueInst already asserts when constructed with invalid indices.
      
      llvm-svn: 120956
      16ebe77b
    • Cameron Zwarich's avatar
      I forgot to actually remove the FindCopyInsertPoint() declaration from · 6766c420
      Cameron Zwarich authored
      PHIElimination.h.
      
      llvm-svn: 120953
      6766c420
    • Cameron Zwarich's avatar
      Remove the SplitCriticalEdge() method declaration from PHIElimination.h. At one · 8d169558
      Cameron Zwarich authored
      time, this method existed, but now PHIElimination uses the method of the same
      name on MachineBasicBlock.
      
      llvm-svn: 120952
      8d169558
    • Cameron Zwarich's avatar
      Move the FindCopyInsertPoint method of PHIElimination to a new standalone · da592a9e
      Cameron Zwarich authored
      function so that it can be shared with StrongPHIElimination.
      
      llvm-svn: 120951
      da592a9e
    • Frits van Bommel's avatar
      Refactor jump threading. · 76244867
      Frits van Bommel authored
      Should have no functional change other than the order of two transformations that are mutually-exclusive and the exact formatting of debug output.
      Internally, it now stores the ConstantInt*s as Constant*s, and actual undef values instead of nulls.
      
      llvm-svn: 120946
      76244867
    • Frits van Bommel's avatar
      Remove trailing whitespace. · 5e75ef4a
      Frits van Bommel authored
      llvm-svn: 120945
      5e75ef4a
    • Frits van Bommel's avatar
      Teach SimplifyCFG to turn · 8fb69ee8
      Frits van Bommel authored
        (indirectbr (select cond, blockaddress(@fn, BlockA),
                                  blockaddress(@fn, BlockB)))
      into
        (br cond, BlockA, BlockB).
      
      llvm-svn: 120943
      8fb69ee8
    • Chris Lattner's avatar
      Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags · 68861717
      Chris Lattner authored
      result.  This allows us to compile:
      
      void *test12(long count) {
            return new int[count];
      }
      
      into:
      
      test12:
      	movl	$4, %ecx
      	movq	%rdi, %rax
      	mulq	%rcx
      	movq	$-1, %rdi
      	cmovnoq	%rax, %rdi
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      test12:
      	movl	$4, %ecx
      	movq	%rdi, %rax
      	mulq	%rcx
      	seto	%cl
      	testb	%cl, %cl
      	movq	$-1, %rdi
      	cmoveq	%rax, %rdi
      	jmp	__Znam
      
      Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
      which would eliminate the need for the 'movq %rdi, %rax'.
      
      llvm-svn: 120936
      68861717
    • Chris Lattner's avatar
      it turns out that when ".with.overflow" intrinsics were added to the X86 · 364bb0a0
      Chris Lattner authored
      backend that they were all implemented except umul.  This one fell back
      to the default implementation that did a hi/lo multiply and compared the
      top.  Fix this to check the overflow flag that the 'mul' instruction
      sets, so we can avoid an explicit test.  Now we compile:
      
      void *func(long count) {
            return new int[count];
      }
      
      into:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
      	testb	%cl, %cl                ## encoding: [0x84,0xc9]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      Other than the silly seto+test, this is using the o bit directly, so it's going in the right
      direction.
      
      llvm-svn: 120935
      364bb0a0
    • Chris Lattner's avatar
      fix the rest of the linux miscompares :) · 183ddd8e
      Chris Lattner authored
      llvm-svn: 120933
      183ddd8e
    • Chris Lattner's avatar
      generalize the previous check to handle -1 on either side of the · 116580a1
      Chris Lattner authored
      select, inserting a not to compensate.  Add a missing isZero check
      that I lost somehow.
      
      This improves codegen of:
      
      void *func(long count) {
            return new int[count];
      }
      
      from:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
                                              ## encoding: [0xeb,A]
      
      to:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
      	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
      	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
      	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
      	jmp	__Znam                  ## TAILCALL
                                              ## encoding: [0xeb,A]
      
      llvm-svn: 120932
      116580a1
    • Chris Lattner's avatar
      relax this to handle linux defaulting to -static. · 77a11c61
      Chris Lattner authored
      llvm-svn: 120930
      77a11c61
    • Chris Lattner's avatar
      Improve an integer select optimization in two ways: · 342e6ea5
      Chris Lattner authored
      1. generalize 
          (select (x == 0), -1, 0) -> (sign_bit (x - 1))
      to:
          (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y
      
      2. Handle the identical pattern that happens with !=:
         (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y
      
      cmov is often high latency and can't fold immediates or
      memory operands.  For example for (x == 0) ? -1 : 1, before 
      we got:
      
      < 	testb	%sil, %sil
      < 	movl	$-1, %ecx
      < 	movl	$1, %eax
      < 	cmovel	%ecx, %eax
      
      now we get:
      
      > 	cmpb	$1, %sil
      > 	sbbl	%eax, %eax
      > 	orl	$1, %eax
      
      llvm-svn: 120929
      342e6ea5
    • Chris Lattner's avatar
      merge some tests into select.ll and make them more specific. · 0523388d
      Chris Lattner authored
      llvm-svn: 120928
      0523388d
    • Chris Lattner's avatar
      rename test · b89b6f17
      Chris Lattner authored
      llvm-svn: 120927
      b89b6f17
    • Chris Lattner's avatar
      remove two tests that aren't really testing anything. · d4f8c964
      Chris Lattner authored
      llvm-svn: 120926
      d4f8c964
    • Bill Wendling's avatar
      Initialize HasPOPCNT. · 2bce78e8
      Bill Wendling authored
      llvm-svn: 120923
      2bce78e8
  2. Dec 04, 2010
Loading