Skip to content
  1. Apr 22, 2013
    • Eric Christopher's avatar
      Update comment. Whitespace. · 25e3509c
      Eric Christopher authored
      llvm-svn: 179999
      25e3509c
    • David Blaikie's avatar
      Revert "Revert "PR14606: debug info imported_module support"" · f55abeaf
      David Blaikie authored
      This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll
      
      I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
      though the debug info was clearly invalid on all of them, but this ought to fix
      it.
      
      llvm-svn: 179996
      f55abeaf
    • Craig Topper's avatar
      Convert windows line endings to linux/unix line endings. · 7af39d7d
      Craig Topper authored
      llvm-svn: 179995
      7af39d7d
    • Craig Topper's avatar
      Fix indentation. No functional change. · 2172ad64
      Craig Topper authored
      llvm-svn: 179994
      2172ad64
    • Craig Topper's avatar
    • Craig Topper's avatar
      Remove an unreachable 'break' following a 'return'. · b5ba3d3b
      Craig Topper authored
      llvm-svn: 179991
      b5ba3d3b
    • Jim Grosbach's avatar
      Legalize vector truncates by parts rather than just splitting. · 563983c8
      Jim Grosbach authored
      Rather than just splitting the input type and hoping for the best, apply
      a bit more cleverness. Just splitting the types until the source is
      legal often leads to an illegal result time, which is then widened and a
      scalarization step is introduced which leads to truly horrible code
      generation. With the loop vectorizer, these sorts of operations are much
      more common, and so it's worth extra effort to do them well.
      
      Add a legalization hook for the operands of a TRUNCATE node, which will
      be encountered after the result type has been legalized, but if the
      operand type is still illegal. If simple splitting of both types
      ends up with the result type of each half still being legal, just
      do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
      result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
      we can get more clever with power-two vectors. Specifically,
      split the input type, but also widen the result element size, then
      concatenate the halves and truncate again.  For example on ARM,
      To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
        %inlo = v4i32 extract_subvector %in, 0
        %inhi = v4i32 extract_subvector %in, 4
        %lo16 = v4i16 trunc v4i32 %inlo
        %hi16 = v4i16 trunc v4i32 %inhi
        %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
        %res = v8i8 trunc v8i16 %in16
      
      This allows instruction selection to generate three VMOVN instructions
      instead of a sequences of moves, stores and loads.
      
      Update the ARMTargetTransformInfo to take this improved legalization
      into account.
      
      Consider the simplified IR:
      
      define <16 x i8> @test1(<16 x i32>* %ap) {
        %a = load <16 x i32>* %ap
        %tmp = trunc <16 x i32> %a to <16 x i8>
        ret <16 x i8> %tmp
      }
      
      define <8 x i8> @test2(<8 x i32>* %ap) {
        %a = load <8 x i32>* %ap
        %tmp = trunc <8 x i32> %a to <8 x i8>
        ret <8 x i8> %tmp
      }
      
      Previously, we would generate the truly hideous:
      	.syntax unified
      	.section	__TEXT,__text,regular,pure_instructions
      	.globl	_test1
      	.align	2
      _test1:                                 @ @test1
      @ BB#0:
      	push	{r7}
      	mov	r7, sp
      	sub	sp, sp, #20
      	bic	sp, sp, #7
      	add	r1, r0, #48
      	add	r2, r0, #32
      	vld1.64	{d24, d25}, [r0:128]
      	vld1.64	{d16, d17}, [r1:128]
      	vld1.64	{d18, d19}, [r2:128]
      	add	r1, r0, #16
      	vmovn.i32	d22, q8
      	vld1.64	{d16, d17}, [r1:128]
      	vmovn.i32	d20, q9
      	vmovn.i32	d18, q12
      	vmov.u16	r0, d22[3]
      	strb	r0, [sp, #15]
      	vmov.u16	r0, d22[2]
      	strb	r0, [sp, #14]
      	vmov.u16	r0, d22[1]
      	strb	r0, [sp, #13]
      	vmov.u16	r0, d22[0]
      	vmovn.i32	d16, q8
      	strb	r0, [sp, #12]
      	vmov.u16	r0, d20[3]
      	strb	r0, [sp, #11]
      	vmov.u16	r0, d20[2]
      	strb	r0, [sp, #10]
      	vmov.u16	r0, d20[1]
      	strb	r0, [sp, #9]
      	vmov.u16	r0, d20[0]
      	strb	r0, [sp, #8]
      	vmov.u16	r0, d18[3]
      	strb	r0, [sp, #3]
      	vmov.u16	r0, d18[2]
      	strb	r0, [sp, #2]
      	vmov.u16	r0, d18[1]
      	strb	r0, [sp, #1]
      	vmov.u16	r0, d18[0]
      	strb	r0, [sp]
      	vmov.u16	r0, d16[3]
      	strb	r0, [sp, #7]
      	vmov.u16	r0, d16[2]
      	strb	r0, [sp, #6]
      	vmov.u16	r0, d16[1]
      	strb	r0, [sp, #5]
      	vmov.u16	r0, d16[0]
      	strb	r0, [sp, #4]
      	vldmia	sp, {d16, d17}
      	vmov	r0, r1, d16
      	vmov	r2, r3, d17
      	mov	sp, r7
      	pop	{r7}
      	bx	lr
      
      	.globl	_test2
      	.align	2
      _test2:                                 @ @test2
      @ BB#0:
      	push	{r7}
      	mov	r7, sp
      	sub	sp, sp, #12
      	bic	sp, sp, #7
      	vld1.64	{d16, d17}, [r0:128]
      	add	r0, r0, #16
      	vld1.64	{d20, d21}, [r0:128]
      	vmovn.i32	d18, q8
      	vmov.u16	r0, d18[3]
      	vmovn.i32	d16, q10
      	strb	r0, [sp, #3]
      	vmov.u16	r0, d18[2]
      	strb	r0, [sp, #2]
      	vmov.u16	r0, d18[1]
      	strb	r0, [sp, #1]
      	vmov.u16	r0, d18[0]
      	strb	r0, [sp]
      	vmov.u16	r0, d16[3]
      	strb	r0, [sp, #7]
      	vmov.u16	r0, d16[2]
      	strb	r0, [sp, #6]
      	vmov.u16	r0, d16[1]
      	strb	r0, [sp, #5]
      	vmov.u16	r0, d16[0]
      	strb	r0, [sp, #4]
      	ldm	sp, {r0, r1}
      	mov	sp, r7
      	pop	{r7}
      	bx	lr
      
      Now, however, we generate the much more straightforward:
      	.syntax unified
      	.section	__TEXT,__text,regular,pure_instructions
      	.globl	_test1
      	.align	2
      _test1:                                 @ @test1
      @ BB#0:
      	add	r1, r0, #48
      	add	r2, r0, #32
      	vld1.64	{d20, d21}, [r0:128]
      	vld1.64	{d16, d17}, [r1:128]
      	add	r1, r0, #16
      	vld1.64	{d18, d19}, [r2:128]
      	vld1.64	{d22, d23}, [r1:128]
      	vmovn.i32	d17, q8
      	vmovn.i32	d16, q9
      	vmovn.i32	d18, q10
      	vmovn.i32	d19, q11
      	vmovn.i16	d17, q8
      	vmovn.i16	d16, q9
      	vmov	r0, r1, d16
      	vmov	r2, r3, d17
      	bx	lr
      
      	.globl	_test2
      	.align	2
      _test2:                                 @ @test2
      @ BB#0:
      	vld1.64	{d16, d17}, [r0:128]
      	add	r0, r0, #16
      	vld1.64	{d18, d19}, [r0:128]
      	vmovn.i32	d16, q8
      	vmovn.i32	d17, q9
      	vmovn.i16	d16, q8
      	vmov	r0, r1, d16
      	bx	lr
      
      llvm-svn: 179989
      563983c8
    • Jim Grosbach's avatar
      ARM: Split out cost model vcvt testcases. · fb08e55c
      Jim Grosbach authored
      They had a separate RUN line already, so may as well be in a separate file.
      
      llvm-svn: 179988
      fb08e55c
  2. Apr 21, 2013
  3. Apr 20, 2013
    • Arnold Schwaighofer's avatar
      SimplifyCFG: If convert single conditional stores · 3546ccf4
      Arnold Schwaighofer authored
      This transformation will transform a conditional store with a preceeding
      uncondtional store to the same location:
      
       a[i] =
       may-alias with a[i] load
       if (cond)
         a[i] = Y
      
      into an unconditional store.
      
       a[i] = X
       may-alias with a[i] load
       tmp = cond ? Y : X;
       a[i] = tmp
      
      We assume that on average the cost of a mispredicted branch is going to be
      higher than the cost of a second store to the same location, and that the
      secondary benefits of creating a bigger basic block for other optimizations to
      work on outway the potential case were the branch would be correctly predicted
      and the cost of the executing the second store would be noticably reflected in
      performance.
      
      hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
      this change we are on par with gcc's performance (gcc also performs this
      transformation). There was a 1.2 % performance improvement on a ARM swift chip.
      Other tests in the test-suite+external seem to be mostly uninfluenced in my
      experiments:
      This optimization was triggered on 41 tests such that the executable was
      different before/after the patch. Only 1 out of the 40 tests (dealII) was
      reproducable below 100% (by about .4%). Given that hmmer benefits so much I
      believe this to be a fair trade off.
      
      I am going to watch performance numbers across the builtbots and will revert
      this if anything unexpected comes up.
      
      llvm-svn: 179957
      3546ccf4
    • Tim Northover's avatar
      ARM: don't add FrameIndex offset for LDMIA (has no immediate) · d9d4211f
      Tim Northover authored
      Previously, when spilling 64-bit paired registers, an LDMIA with both
      a FrameIndex and an offset was produced. This kind of instruction
      shouldn't exist, and the extra operand was being confused with the
      predicate, causing aborts later on.
      
      This removes the invalid 0-offset from the instruction being
      produced.
      
      llvm-svn: 179956
      d9d4211f
    • Nuno Lopes's avatar
      recommit tests · 36e82760
      Nuno Lopes authored
      llvm-svn: 179955
      36e82760
    • Stephen Lin's avatar
      Minor renaming of tests (for consistency with an in-development patch) · 8fccb8a7
      Stephen Lin authored
      llvm-svn: 179954
      8fccb8a7
    • Tim Northover's avatar
      AArch64: remove useless comment · 56862bd6
      Tim Northover authored
      llvm-svn: 179952
      56862bd6
    • Stephen Lin's avatar
      Move 'kw_align' case to proper section, reorganize function attribute keyword... · 7577ed57
      Stephen Lin authored
      Move 'kw_align' case to proper section, reorganize function attribute keyword case statements to be consistent with r179119
      
      llvm-svn: 179948
      7577ed57
    • Tim Northover's avatar
      Remove unused ShouldFoldAtomicFences flag. · 16aba170
      Tim Northover authored
      I think it's almost impossible to fold atomic fences profitably under
      LLVM/C++11 semantics. As a result, this is now unused and just
      cluttering up the target interface.
      
      llvm-svn: 179940
      16aba170
    • Tim Northover's avatar
      a2b53390
    • Rafael Espindola's avatar
      Remove dead code. · b1a19a8a
      Rafael Espindola authored
      This is part of a future patch to use yamlio that incorrectly ended up in a
      cleanup patch.
      
      Thanks to Benjamin Kramer for reporting it.
      
      llvm-svn: 179938
      b1a19a8a
Loading