Skip to content
  1. Aug 30, 2010
  2. Aug 29, 2010
  3. Aug 28, 2010
    • Chris Lattner's avatar
      fixme accomplished · 112b6ee3
      Chris Lattner authored
      llvm-svn: 112386
      112b6ee3
    • Chris Lattner's avatar
      fix the buildvector->insertp[sd] logic to not always create a redundant · 94656b1c
      Chris Lattner authored
      insertp[sd] $0, which is a noop.  Before:
      
      _f32:                                   ## @f32
      	pshufd	$1, %xmm1, %xmm2
      	pshufd	$1, %xmm0, %xmm3
      	addss	%xmm2, %xmm3
      	addss	%xmm1, %xmm0
                                              ## kill: XMM0<def> XMM0<kill> XMM0<def>
      	insertps	$0, %xmm0, %xmm0
      	insertps	$16, %xmm3, %xmm0
      	ret
      
      after:
      
      _f32:                                   ## @f32
      	movdqa	%xmm0, %xmm2
      	addss	%xmm1, %xmm2
      	pshufd	$1, %xmm1, %xmm1
      	pshufd	$1, %xmm0, %xmm3
      	addss	%xmm1, %xmm3
      	movdqa	%xmm2, %xmm0
      	insertps	$16, %xmm3, %xmm0
      	ret
      
      The extra movs are due to a random (poor) scheduling decision.
      
      llvm-svn: 112379
      94656b1c
    • Chris Lattner's avatar
      fix the BuildVector -> unpcklps logic to not do pointless shuffles · bcb6090a
      Chris Lattner authored
      when the top elements of a vector are undefined.  This happens all
      the time for X86-64 ABI stuff because only the low 2 elements of
      a 4 element vector are defined.  For example, on:
      
      _Complex float f32(_Complex float A, _Complex float B) {
        return A+B;
      }
      
      We used to produce (with SSE2, SSE4.1+ uses insertps):
      
      _f32:                                   ## @f32
      	movdqa	%xmm0, %xmm2
      	addss	%xmm1, %xmm2
      	pshufd	$16, %xmm2, %xmm2
      	pshufd	$1, %xmm1, %xmm1
      	pshufd	$1, %xmm0, %xmm0
      	addss	%xmm1, %xmm0
      	pshufd	$16, %xmm0, %xmm1
      	movdqa	%xmm2, %xmm0
      	unpcklps	%xmm1, %xmm0
      	ret
      
      We now produce:
      
      _f32:                                   ## @f32
      	movdqa	%xmm0, %xmm2
      	addss	%xmm1, %xmm2
      	pshufd	$1, %xmm1, %xmm1
      	pshufd	$1, %xmm0, %xmm3
      	addss	%xmm1, %xmm3
      	movaps	%xmm2, %xmm0
      	unpcklps	%xmm3, %xmm0
      	ret
      
      This implements rdar://8368414
      
      llvm-svn: 112378
      bcb6090a
    • Benjamin Kramer's avatar
      Update ocaml test. · 2e5c1471
      Benjamin Kramer authored
      llvm-svn: 112364
      2e5c1471
    • Chris Lattner's avatar
      remove unions from LLVM IR. They are severely buggy and not · 13ee795c
      Chris Lattner authored
      being actively maintained, improved, or extended.
      
      llvm-svn: 112356
      13ee795c
    • Chris Lattner's avatar
      remove the ABCD and SSI passes. They don't have any clients that · 504e5100
      Chris Lattner authored
      I'm aware of, aren't maintained, and LVI will be replacing their value.
      nlewycky approved this on irc.
      
      llvm-svn: 112355
      504e5100
    • Chris Lattner's avatar
      handle the constant case of vector insertion. For something · d0214f3e
      Chris Lattner authored
      like this:
      
      struct S { float A, B, C, D; };
      
      struct S g;
      struct S bar() { 
        struct S A = g;
        ++A.B;
        A.A = 42;
        return A;
      }
      
      we now generate:
      
      _bar:                                   ## @bar
      ## BB#0:                                ## %entry
      	movq	_g@GOTPCREL(%rip), %rax
      	movss	12(%rax), %xmm0
      	pshufd	$16, %xmm0, %xmm0
      	movss	4(%rax), %xmm2
      	movss	8(%rax), %xmm1
      	pshufd	$16, %xmm1, %xmm1
      	unpcklps	%xmm0, %xmm1
      	addss	LCPI1_0(%rip), %xmm2
      	pshufd	$16, %xmm2, %xmm2
      	movss	LCPI1_1(%rip), %xmm0
      	pshufd	$16, %xmm0, %xmm0
      	unpcklps	%xmm2, %xmm0
      	ret
      
      instead of:
      
      _bar:                                   ## @bar
      ## BB#0:                                ## %entry
      	movq	_g@GOTPCREL(%rip), %rax
      	movss	12(%rax), %xmm0
      	pshufd	$16, %xmm0, %xmm0
      	movss	4(%rax), %xmm2
      	movss	8(%rax), %xmm1
      	pshufd	$16, %xmm1, %xmm1
      	unpcklps	%xmm0, %xmm1
      	addss	LCPI1_0(%rip), %xmm2
      	movd	%xmm2, %eax
      	shlq	$32, %rax
      	addq	$1109917696, %rax       ## imm = 0x42280000
      	movd	%rax, %xmm0
      	ret
      
      llvm-svn: 112345
      d0214f3e
    • Chris Lattner's avatar
      optimize bitcasts from large integers to vector into vector · dd660104
      Chris Lattner authored
      element insertion from the pieces that feed into the vector.
      This handles a pattern that occurs frequently due to code
      generated for the x86-64 abi.  We now compile something like
      this:
      
      struct S { float A, B, C, D; };
      struct S g;
      struct S bar() { 
        struct S A = g;
        ++A.A;
        ++A.C;
        return A;
      }
      
      into all nice vector operations:
      
      _bar:                                   ## @bar
      ## BB#0:                                ## %entry
      	movq	_g@GOTPCREL(%rip), %rax
      	movss	LCPI1_0(%rip), %xmm1
      	movss	(%rax), %xmm0
      	addss	%xmm1, %xmm0
      	pshufd	$16, %xmm0, %xmm0
      	movss	4(%rax), %xmm2
      	movss	12(%rax), %xmm3
      	pshufd	$16, %xmm2, %xmm2
      	unpcklps	%xmm2, %xmm0
      	addss	8(%rax), %xmm1
      	pshufd	$16, %xmm1, %xmm1
      	pshufd	$16, %xmm3, %xmm2
      	unpcklps	%xmm2, %xmm1
      	ret
      
      instead of icky integer operations:
      
      _bar:                                   ## @bar
      	movq	_g@GOTPCREL(%rip), %rax
      	movss	LCPI1_0(%rip), %xmm1
      	movss	(%rax), %xmm0
      	addss	%xmm1, %xmm0
      	movd	%xmm0, %ecx
      	movl	4(%rax), %edx
      	movl	12(%rax), %esi
      	shlq	$32, %rdx
      	addq	%rcx, %rdx
      	movd	%rdx, %xmm0
      	addss	8(%rax), %xmm1
      	movd	%xmm1, %eax
      	shlq	$32, %rsi
      	addq	%rax, %rsi
      	movd	%rsi, %xmm1
      	ret
      
      This resolves rdar://8360454
      
      llvm-svn: 112343
      dd660104
    • Dan Gohman's avatar
      Completely disable tail calls when fast-isel is enabled, as fast-isel · e06905d1
      Dan Gohman authored
      doesn't currently support dealing with this.
      
      llvm-svn: 112341
      e06905d1
    • Owen Anderson's avatar
      Add a prototype of a new peephole optimizing pass that uses LazyValue info to... · cf7f9411
      Owen Anderson authored
      Add a prototype of a new peephole optimizing pass that uses LazyValue info to simplify PHIs and select's.
      This pass addresses the missed optimizations from PR2581 and PR4420.
      
      llvm-svn: 112325
      cf7f9411
    • Bob Wilson's avatar
      Change ARM VFP VLDM/VSTM instructions to use addressing mode #4, just like · 13ce07fa
      Bob Wilson authored
      all the other LDM/STM instructions.  This fixes asm printer crashes when
      compiling with -O0.  I've changed one of the NEON tests (vst3.ll) to run
      with -O0 to check this in the future.
      
      Prior to this change VLDM/VSTM used addressing mode #5, but not really.
      The offset field was used to hold a count of the number of registers being
      loaded or stored, and the AM5 opcode field was expanded to specify the IA
      or DB mode, instead of the standard ADD/SUB specifier.  Much of the backend
      was not aware of these special cases.  The crashes occured when rewriting
      a frameindex caused the AM5 offset field to be changed so that it did not
      have a valid submode.  I don't know exactly what changed to expose this now.
      Maybe we've never done much with -O0 and NEON.  Regardless, there's no longer
      any reason to keep a count of the VLDM/VSTM registers, so we can use
      addressing mode #4 and clean things up in a lot of places.
      
      llvm-svn: 112322
      13ce07fa
    • Chris Lattner's avatar
      tidy up test. · 954e9557
      Chris Lattner authored
      llvm-svn: 112321
      954e9557
    • Chris Lattner's avatar
      no really, fix the test. · b8b7d526
      Chris Lattner authored
      llvm-svn: 112317
      b8b7d526
    • Chris Lattner's avatar
      fix this test. It's not clear what it's really testing. · c8908b4c
      Chris Lattner authored
      llvm-svn: 112316
      c8908b4c
    • Chris Lattner's avatar
      Enhance the shift propagator to handle the case when you have: · 6c1395f6
      Chris Lattner authored
      A = shl x, 42
      ...
      B = lshr ..., 38
      
      which can be transformed into:
      A = shl x, 4
      ...
      
      iff we can prove that the would-be-shifted-in bits
      are already zero.  This eliminates two shifts in the testcase
      and allows eliminate of the whole i128 chain in the real example.
      
      llvm-svn: 112314
      6c1395f6
    • Chris Lattner's avatar
      Implement a pretty general logical shift propagation · 18d7fc8f
      Chris Lattner authored
      framework, which is good at ripping through bitfield
      operations.  This generalize a bunch of the existing
      xforms that instcombine does, such as 
        (x << c) >> c -> and
      to handle intermediate logical nodes.  This is useful for
      ripping up the "promote to large integer" code produced by
      SRoA.
      
      llvm-svn: 112304
      18d7fc8f
  4. Aug 27, 2010
  5. Aug 26, 2010
    • Chris Lattner's avatar
      optimize bitcast(trunc(bitcast(x))) where the result is a float and 'x' · d4ebd6df
      Chris Lattner authored
      is a vector to be a vector element extraction.  This allows clang to
      compile:
      
      struct S { float A, B, C, D; };
      float foo(struct S A) { return A.A + A.B+A.C+A.D; }
      
      into:
      
      _foo:                                   ## @foo
      ## BB#0:                                ## %entry
      	movd	%xmm0, %rax
      	shrq	$32, %rax
      	movd	%eax, %xmm2
      	addss	%xmm0, %xmm2
      	movapd	%xmm1, %xmm3
      	addss	%xmm2, %xmm3
      	movd	%xmm1, %rax
      	shrq	$32, %rax
      	movd	%eax, %xmm0
      	addss	%xmm3, %xmm0
      	ret
      
      instead of:
      
      _foo:                                   ## @foo
      ## BB#0:                                ## %entry
      	movd	%xmm0, %rax
      	movd	%eax, %xmm0
      	shrq	$32, %rax
      	movd	%eax, %xmm2
      	addss	%xmm0, %xmm2
      	movd	%xmm1, %rax
      	movd	%eax, %xmm1
      	addss	%xmm2, %xmm1
      	shrq	$32, %rax
      	movd	%eax, %xmm0
      	addss	%xmm1, %xmm0
      	ret
      
      ... eliminating half of the horribleness.
      
      llvm-svn: 112227
      d4ebd6df
    • Chris Lattner's avatar
      filecheckize · 3c19d3d5
      Chris Lattner authored
      llvm-svn: 112225
      3c19d3d5
Loading