Skip to content
  1. Aug 28, 2010
    • Chris Lattner's avatar
      handle the constant case of vector insertion. For something · d0214f3e
      Chris Lattner authored
      like this:
      
      struct S { float A, B, C, D; };
      
      struct S g;
      struct S bar() { 
        struct S A = g;
        ++A.B;
        A.A = 42;
        return A;
      }
      
      we now generate:
      
      _bar:                                   ## @bar
      ## BB#0:                                ## %entry
      	movq	_g@GOTPCREL(%rip), %rax
      	movss	12(%rax), %xmm0
      	pshufd	$16, %xmm0, %xmm0
      	movss	4(%rax), %xmm2
      	movss	8(%rax), %xmm1
      	pshufd	$16, %xmm1, %xmm1
      	unpcklps	%xmm0, %xmm1
      	addss	LCPI1_0(%rip), %xmm2
      	pshufd	$16, %xmm2, %xmm2
      	movss	LCPI1_1(%rip), %xmm0
      	pshufd	$16, %xmm0, %xmm0
      	unpcklps	%xmm2, %xmm0
      	ret
      
      instead of:
      
      _bar:                                   ## @bar
      ## BB#0:                                ## %entry
      	movq	_g@GOTPCREL(%rip), %rax
      	movss	12(%rax), %xmm0
      	pshufd	$16, %xmm0, %xmm0
      	movss	4(%rax), %xmm2
      	movss	8(%rax), %xmm1
      	pshufd	$16, %xmm1, %xmm1
      	unpcklps	%xmm0, %xmm1
      	addss	LCPI1_0(%rip), %xmm2
      	movd	%xmm2, %eax
      	shlq	$32, %rax
      	addq	$1109917696, %rax       ## imm = 0x42280000
      	movd	%rax, %xmm0
      	ret
      
      llvm-svn: 112345
      d0214f3e
    • Chris Lattner's avatar
      optimize bitcasts from large integers to vector into vector · dd660104
      Chris Lattner authored
      element insertion from the pieces that feed into the vector.
      This handles a pattern that occurs frequently due to code
      generated for the x86-64 abi.  We now compile something like
      this:
      
      struct S { float A, B, C, D; };
      struct S g;
      struct S bar() { 
        struct S A = g;
        ++A.A;
        ++A.C;
        return A;
      }
      
      into all nice vector operations:
      
      _bar:                                   ## @bar
      ## BB#0:                                ## %entry
      	movq	_g@GOTPCREL(%rip), %rax
      	movss	LCPI1_0(%rip), %xmm1
      	movss	(%rax), %xmm0
      	addss	%xmm1, %xmm0
      	pshufd	$16, %xmm0, %xmm0
      	movss	4(%rax), %xmm2
      	movss	12(%rax), %xmm3
      	pshufd	$16, %xmm2, %xmm2
      	unpcklps	%xmm2, %xmm0
      	addss	8(%rax), %xmm1
      	pshufd	$16, %xmm1, %xmm1
      	pshufd	$16, %xmm3, %xmm2
      	unpcklps	%xmm2, %xmm1
      	ret
      
      instead of icky integer operations:
      
      _bar:                                   ## @bar
      	movq	_g@GOTPCREL(%rip), %rax
      	movss	LCPI1_0(%rip), %xmm1
      	movss	(%rax), %xmm0
      	addss	%xmm1, %xmm0
      	movd	%xmm0, %ecx
      	movl	4(%rax), %edx
      	movl	12(%rax), %esi
      	shlq	$32, %rdx
      	addq	%rcx, %rdx
      	movd	%rdx, %xmm0
      	addss	8(%rax), %xmm1
      	movd	%xmm1, %eax
      	shlq	$32, %rsi
      	addq	%rax, %rsi
      	movd	%rsi, %xmm1
      	ret
      
      This resolves rdar://8360454
      
      llvm-svn: 112343
      dd660104
    • Dan Gohman's avatar
      Completely disable tail calls when fast-isel is enabled, as fast-isel · e06905d1
      Dan Gohman authored
      doesn't currently support dealing with this.
      
      llvm-svn: 112341
      e06905d1
    • Dan Gohman's avatar
      Trim a #include. · 1e06dbf8
      Dan Gohman authored
      llvm-svn: 112340
      1e06dbf8
    • Dan Gohman's avatar
      Fix an index calculation thinko. · fe22f1d3
      Dan Gohman authored
      llvm-svn: 112337
      fe22f1d3
    • Bob Wilson's avatar
      We don't need to custom-select VLDMQ and VSTMQ anymore. · 8ee93947
      Bob Wilson authored
      llvm-svn: 112336
      8ee93947
    • Benjamin Kramer's avatar
      Update CMake build. Add newline at end of file. · 83f9ff04
      Benjamin Kramer authored
      llvm-svn: 112332
      83f9ff04
    • Bob Wilson's avatar
      When merging Thumb2 loads/stores, do not give up when the offset is one of · ca5af129
      Bob Wilson authored
      the special values that for ARM would be used with IB or DA modes.  Fall
      through and consider materializing a new base address is it would be
      profitable.
      
      llvm-svn: 112329
      ca5af129
    • Owen Anderson's avatar
      Add a prototype of a new peephole optimizing pass that uses LazyValue info to... · cf7f9411
      Owen Anderson authored
      Add a prototype of a new peephole optimizing pass that uses LazyValue info to simplify PHIs and select's.
      This pass addresses the missed optimizations from PR2581 and PR4420.
      
      llvm-svn: 112325
      cf7f9411
    • Owen Anderson's avatar
      Improve the precision of getConstant(). · 38f6b7fe
      Owen Anderson authored
      llvm-svn: 112323
      38f6b7fe
    • Bob Wilson's avatar
      Change ARM VFP VLDM/VSTM instructions to use addressing mode #4, just like · 13ce07fa
      Bob Wilson authored
      all the other LDM/STM instructions.  This fixes asm printer crashes when
      compiling with -O0.  I've changed one of the NEON tests (vst3.ll) to run
      with -O0 to check this in the future.
      
      Prior to this change VLDM/VSTM used addressing mode #5, but not really.
      The offset field was used to hold a count of the number of registers being
      loaded or stored, and the AM5 opcode field was expanded to specify the IA
      or DB mode, instead of the standard ADD/SUB specifier.  Much of the backend
      was not aware of these special cases.  The crashes occured when rewriting
      a frameindex caused the AM5 offset field to be changed so that it did not
      have a valid submode.  I don't know exactly what changed to expose this now.
      Maybe we've never done much with -O0 and NEON.  Regardless, there's no longer
      any reason to keep a count of the VLDM/VSTM registers, so we can use
      addressing mode #4 and clean things up in a lot of places.
      
      llvm-svn: 112322
      13ce07fa
    • Chris Lattner's avatar
      Enhance the shift propagator to handle the case when you have: · 6c1395f6
      Chris Lattner authored
      A = shl x, 42
      ...
      B = lshr ..., 38
      
      which can be transformed into:
      A = shl x, 4
      ...
      
      iff we can prove that the would-be-shifted-in bits
      are already zero.  This eliminates two shifts in the testcase
      and allows eliminate of the whole i128 chain in the real example.
      
      llvm-svn: 112314
      6c1395f6
    • Devang Patel's avatar
      Simplify. · f2855b14
      Devang Patel authored
      llvm-svn: 112305
      f2855b14
    • Chris Lattner's avatar
      Implement a pretty general logical shift propagation · 18d7fc8f
      Chris Lattner authored
      framework, which is good at ripping through bitfield
      operations.  This generalize a bunch of the existing
      xforms that instcombine does, such as 
        (x << c) >> c -> and
      to handle intermediate logical nodes.  This is useful for
      ripping up the "promote to large integer" code produced by
      SRoA.
      
      llvm-svn: 112304
      18d7fc8f
  2. Aug 27, 2010
  3. Aug 26, 2010
Loading