Skip to content
  1. May 01, 2013
  2. Apr 30, 2013
    • Hal Finkel's avatar
      LocalStackSlotAllocation improvements · 7153251a
      Hal Finkel authored
      First, taking advantage of the fact that the virtual base registers are allocated in order of the local frame offsets, remove the quadratic register-searching behavior. Because of the ordering, we only need to check the last virtual base register created.
      
      Second, store the frame index in the FrameRef structure, and get the frame index and the local offset from this structure at the top of the loop iteration. This allows us to de-nest the loops in insertFrameReferenceRegisters (and I think makes the code cleaner). I also moved the needsFrameBaseReg check into the first loop over instructions so that we don't bother pushing FrameRefs for instructions that don't want a virtual base register anyway.
      
      Lastly, and this is the only functionality change, avoid the creation of single-use virtual base registers. These are currently not useful because, in general, they end up replacing what would be one r+r instruction with an add and a r+i instruction. Committing this removes the XFAIL in CodeGen/PowerPC/2007-09-07-LoadStoreIdxForms.ll
      
      Jim has okayed this off-list.
      
      llvm-svn: 180799
      7153251a
    • Bill Wendling's avatar
      Emit the TLS initialization function pointers into the correct section. · fb7e32eb
      Bill Wendling authored
      The `llvm.tls_init_funcs' (created by the front-end) holds pointers to the TLS
      initialization functions. These need to be placed into the correct section so
      that they are run before `main()'.
      
      <rdar://problem/13733006>
      
      llvm-svn: 180737
      fb7e32eb
  3. Apr 27, 2013
    • Andrew Trick's avatar
      Generalize the MachineTraceMetrics public API. · 85058af6
      Andrew Trick authored
      Naturally, we should be able to pass in extra instructions, not just
      extra blocks.
      
      llvm-svn: 180667
      85058af6
    • Eric Christopher's avatar
      Use the target triple from the target machine rather than the module · 203e12bf
      Eric Christopher authored
      to determine whether or not we're on a darwin platform for debug code
      emitting.
      
      Solves the problem of a module with no triple on the command line
      and no triple in the module using non-gdb ok features on darwin. Fix
      up the member-pointers test to check the correct things for cross
      platform (DW_FORM_flag is a good prefix).
      
      Unfortunately no testcase because I have no ideas how to test something
      without a triple and without a triple in the module yet check
      precisely on two platforms. Ideas welcome.
      
      llvm-svn: 180660
      203e12bf
  4. Apr 26, 2013
  5. Apr 25, 2013
  6. Apr 24, 2013
    • Andrew Trick's avatar
      MI Sched: eliminate local vreg copies. · 85a1d4cb
      Andrew Trick authored
      For now, we just reschedule instructions that use the copied vregs and
      let regalloc elliminate it. I would really like to eliminate the
      copies on-the-fly during scheduling, but we need a complete
      implementation of repairIntervalsInRange() first.
      
      The general strategy is for the register coalescer to eliminate as
      many global copies as possible and shrink live ranges to be
      extended-basic-block local. The coalescer should not have to worry
      about resolving local copies (e.g. it shouldn't attemp to reorder
      instructions). The scheduler is a much better place to deal with local
      interference. The coalescer side of this equation needs work.
      
      llvm-svn: 180193
      85a1d4cb
    • Andrew Trick's avatar
      Register Coalescing: add a flag to disable rescheduling. · 608a698c
      Andrew Trick authored
      When MachineScheduler is enabled, this functionality can be
      removed. Until then, provide a way to disable it for test cases and
      designing MachineScheduler heuristics.
      
      llvm-svn: 180192
      608a698c
    • Andrew Trick's avatar
      MI Sched: regpressure tracing. · 7c791a3d
      Andrew Trick authored
      llvm-svn: 180191
      7c791a3d
    • Eric Christopher's avatar
      Formatting. · 4eb5eb5b
      Eric Christopher authored
      llvm-svn: 180186
      4eb5eb5b
  7. Apr 23, 2013
    • Owen Anderson's avatar
      DAGCombine should not aggressively fold SEXT(VSETCC(...)) into a wider VSETCC... · 2d4cca35
      Owen Anderson authored
      DAGCombine should not aggressively fold SEXT(VSETCC(...)) into a wider VSETCC without first checking the target's vector boolean contents.
      This exposed an issue with PowerPC AltiVec where it appears it was setting the wrong vector boolean contents.  The included change
      fixes the PowerPC tests, and was OK'd by Hal.
      
      llvm-svn: 180129
      2d4cca35
    • Stephen Lin's avatar
      Add some constraints to use of 'returned': · 6c70dc78
      Stephen Lin authored
      1) Disallow 'returned' on parameter that is also 'sret' (no sensible semantics, as far as I can tell).
      2) Conservatively disallow tail calls through 'returned' parameters that also are 'zext' or 'sext' (for consistency with treatment of other zero-extending and sign-extending operations in tail call position detection...can be revised later to handle situations that can be determined to be safe).
      
      This is a new attribute that is not yet used, so there is no impact.
      
      llvm-svn: 180118
      6c70dc78
    • Matt Arsenault's avatar
      Remove unused DwarfSectionOffsetDirective string · 034ca0fe
      Matt Arsenault authored
      The value isn't actually used, and setting it emits a COFF specific
      directive.
      
      llvm-svn: 180064
      034ca0fe
    • Eric Christopher's avatar
      Move C++ code out of the C headers and into either C++ headers · 04d4e931
      Eric Christopher authored
      or the C++ files themselves. This enables people to use
      just a C compiler to interoperate with LLVM.
      
      llvm-svn: 180063
      04d4e931
  8. Apr 22, 2013
    • Eli Bendersky's avatar
      Optimize MachineBasicBlock::getSymbol by caching the symbol. Since the symbol · 58b04b7e
      Eli Bendersky authored
      name computation is expensive, this helps save about 25% of the time spent in
      this function.
      
      llvm-svn: 180049
      58b04b7e
    • Rafael Espindola's avatar
      Clarify that llvm.used can contain aliases. · 74f2e46e
      Rafael Espindola authored
      Also add a check for llvm.used in the verifier and simplify clients now that
      they can assume they have a ConstantArray.
      
      llvm-svn: 180019
      74f2e46e
    • Eric Christopher's avatar
      Tidy. · 44c6aa67
      Eric Christopher authored
      llvm-svn: 180000
      44c6aa67
    • Eric Christopher's avatar
      Update comment. Whitespace. · 25e3509c
      Eric Christopher authored
      llvm-svn: 179999
      25e3509c
    • David Blaikie's avatar
      Revert "Revert "PR14606: debug info imported_module support"" · f55abeaf
      David Blaikie authored
      This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll
      
      I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
      though the debug info was clearly invalid on all of them, but this ought to fix
      it.
      
      llvm-svn: 179996
      f55abeaf
    • Jim Grosbach's avatar
      Legalize vector truncates by parts rather than just splitting. · 563983c8
      Jim Grosbach authored
      Rather than just splitting the input type and hoping for the best, apply
      a bit more cleverness. Just splitting the types until the source is
      legal often leads to an illegal result time, which is then widened and a
      scalarization step is introduced which leads to truly horrible code
      generation. With the loop vectorizer, these sorts of operations are much
      more common, and so it's worth extra effort to do them well.
      
      Add a legalization hook for the operands of a TRUNCATE node, which will
      be encountered after the result type has been legalized, but if the
      operand type is still illegal. If simple splitting of both types
      ends up with the result type of each half still being legal, just
      do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
      result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
      we can get more clever with power-two vectors. Specifically,
      split the input type, but also widen the result element size, then
      concatenate the halves and truncate again.  For example on ARM,
      To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
        %inlo = v4i32 extract_subvector %in, 0
        %inhi = v4i32 extract_subvector %in, 4
        %lo16 = v4i16 trunc v4i32 %inlo
        %hi16 = v4i16 trunc v4i32 %inhi
        %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
        %res = v8i8 trunc v8i16 %in16
      
      This allows instruction selection to generate three VMOVN instructions
      instead of a sequences of moves, stores and loads.
      
      Update the ARMTargetTransformInfo to take this improved legalization
      into account.
      
      Consider the simplified IR:
      
      define <16 x i8> @test1(<16 x i32>* %ap) {
        %a = load <16 x i32>* %ap
        %tmp = trunc <16 x i32> %a to <16 x i8>
        ret <16 x i8> %tmp
      }
      
      define <8 x i8> @test2(<8 x i32>* %ap) {
        %a = load <8 x i32>* %ap
        %tmp = trunc <8 x i32> %a to <8 x i8>
        ret <8 x i8> %tmp
      }
      
      Previously, we would generate the truly hideous:
      	.syntax unified
      	.section	__TEXT,__text,regular,pure_instructions
      	.globl	_test1
      	.align	2
      _test1:                                 @ @test1
      @ BB#0:
      	push	{r7}
      	mov	r7, sp
      	sub	sp, sp, #20
      	bic	sp, sp, #7
      	add	r1, r0, #48
      	add	r2, r0, #32
      	vld1.64	{d24, d25}, [r0:128]
      	vld1.64	{d16, d17}, [r1:128]
      	vld1.64	{d18, d19}, [r2:128]
      	add	r1, r0, #16
      	vmovn.i32	d22, q8
      	vld1.64	{d16, d17}, [r1:128]
      	vmovn.i32	d20, q9
      	vmovn.i32	d18, q12
      	vmov.u16	r0, d22[3]
      	strb	r0, [sp, #15]
      	vmov.u16	r0, d22[2]
      	strb	r0, [sp, #14]
      	vmov.u16	r0, d22[1]
      	strb	r0, [sp, #13]
      	vmov.u16	r0, d22[0]
      	vmovn.i32	d16, q8
      	strb	r0, [sp, #12]
      	vmov.u16	r0, d20[3]
      	strb	r0, [sp, #11]
      	vmov.u16	r0, d20[2]
      	strb	r0, [sp, #10]
      	vmov.u16	r0, d20[1]
      	strb	r0, [sp, #9]
      	vmov.u16	r0, d20[0]
      	strb	r0, [sp, #8]
      	vmov.u16	r0, d18[3]
      	strb	r0, [sp, #3]
      	vmov.u16	r0, d18[2]
      	strb	r0, [sp, #2]
      	vmov.u16	r0, d18[1]
      	strb	r0, [sp, #1]
      	vmov.u16	r0, d18[0]
      	strb	r0, [sp]
      	vmov.u16	r0, d16[3]
      	strb	r0, [sp, #7]
      	vmov.u16	r0, d16[2]
      	strb	r0, [sp, #6]
      	vmov.u16	r0, d16[1]
      	strb	r0, [sp, #5]
      	vmov.u16	r0, d16[0]
      	strb	r0, [sp, #4]
      	vldmia	sp, {d16, d17}
      	vmov	r0, r1, d16
      	vmov	r2, r3, d17
      	mov	sp, r7
      	pop	{r7}
      	bx	lr
      
      	.globl	_test2
      	.align	2
      _test2:                                 @ @test2
      @ BB#0:
      	push	{r7}
      	mov	r7, sp
      	sub	sp, sp, #12
      	bic	sp, sp, #7
      	vld1.64	{d16, d17}, [r0:128]
      	add	r0, r0, #16
      	vld1.64	{d20, d21}, [r0:128]
      	vmovn.i32	d18, q8
      	vmov.u16	r0, d18[3]
      	vmovn.i32	d16, q10
      	strb	r0, [sp, #3]
      	vmov.u16	r0, d18[2]
      	strb	r0, [sp, #2]
      	vmov.u16	r0, d18[1]
      	strb	r0, [sp, #1]
      	vmov.u16	r0, d18[0]
      	strb	r0, [sp]
      	vmov.u16	r0, d16[3]
      	strb	r0, [sp, #7]
      	vmov.u16	r0, d16[2]
      	strb	r0, [sp, #6]
      	vmov.u16	r0, d16[1]
      	strb	r0, [sp, #5]
      	vmov.u16	r0, d16[0]
      	strb	r0, [sp, #4]
      	ldm	sp, {r0, r1}
      	mov	sp, r7
      	pop	{r7}
      	bx	lr
      
      Now, however, we generate the much more straightforward:
      	.syntax unified
      	.section	__TEXT,__text,regular,pure_instructions
      	.globl	_test1
      	.align	2
      _test1:                                 @ @test1
      @ BB#0:
      	add	r1, r0, #48
      	add	r2, r0, #32
      	vld1.64	{d20, d21}, [r0:128]
      	vld1.64	{d16, d17}, [r1:128]
      	add	r1, r0, #16
      	vld1.64	{d18, d19}, [r2:128]
      	vld1.64	{d22, d23}, [r1:128]
      	vmovn.i32	d17, q8
      	vmovn.i32	d16, q9
      	vmovn.i32	d18, q10
      	vmovn.i32	d19, q11
      	vmovn.i16	d17, q8
      	vmovn.i16	d16, q9
      	vmov	r0, r1, d16
      	vmov	r2, r3, d17
      	bx	lr
      
      	.globl	_test2
      	.align	2
      _test2:                                 @ @test2
      @ BB#0:
      	vld1.64	{d16, d17}, [r0:128]
      	add	r0, r0, #16
      	vld1.64	{d18, d19}, [r0:128]
      	vmovn.i32	d16, q8
      	vmovn.i32	d17, q9
      	vmovn.i16	d16, q8
      	vmov	r0, r1, d16
      	bx	lr
      
      llvm-svn: 179989
      563983c8
  9. Apr 21, 2013
  10. Apr 20, 2013
  11. Apr 19, 2013
Loading