Skip to content
Snippets Groups Projects
  1. Aug 21, 2014
  2. Aug 20, 2014
    • Quentin Colombet's avatar
      [PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of · 03e43f8e
      Quentin Colombet authored
      the isRegSequence property.
      
      This is a follow-up of r215394 and r215404, which respectively introduces the
      isRegSequence property and uses it for ARM.
      
      Thanks to the property introduced by the previous commits, this patch is able
      to optimize the following sequence:
      vmov	d0, r2, r3
      vmov	d1, r0, r1
      vmov	r0, s0
      vmov	r1, s2
      udiv	r0, r1, r0
      vmov	r1, s1
      vmov	r2, s3
      udiv	r1, r2, r1
      vmov.32	d16[0], r0
      vmov.32	d16[1], r1
      vmov	r0, r1, d16
      bx	lr
      
      into:
      udiv	r0, r0, r2
      udiv	r1, r1, r3
      vmov.32	d16[0], r0
      vmov.32	d16[1], r1
      vmov	r0, r1, d16
      bx	lr
      
      This patch refactors how the copy optimizations are done in the peephole
      optimizer. Prior to this patch, we had one copy-related optimization that
      replaced a copy or bitcast by a generic, more suitable (in terms of register
      file), copy.
      
      With this patch, the peephole optimizer features two copy-related optimizations:
      1. One for rewriting generic copies to generic copies:
      PeepholeOptimizer::optimizeCoalescableCopy.
      2. One for replacing non-generic copies with generic copies:
      PeepholeOptimizer::optimizeUncoalescableCopy.
      
      The goals of these two optimizations are slightly different: one rewrite the
      operand of the instruction (#1), the other kills off the non-generic instruction
      and replace it by a (sequence of) generic instruction(s).
      
      Both optimizations rely on the ValueTracker introduced in r212100.
      
      The ValueTracker has been refactored to use the information from the
      TargetInstrInfo for non-generic instruction. As part of the refactoring, we
      switched the tracking from the index of the definition to the actual register
      (virtual or physical). This one change is to provide better consistency with
      register related APIs and to ease the use of the TargetInstrInfo.
      
      Moreover, this patch introduces a new helper class CopyRewriter used to ease the
      rewriting of generic copies (i.e., #1).
      
      Finally, this patch adds a dead code elimination pass right after the peephole
      optimizer to get rid of dead code that may appear after rewriting.
      
      This is related to <rdar://problem/12702965>.
      
      Review: http://reviews.llvm.org/D4874
      llvm-svn: 216088
      03e43f8e
    • Yi Kong's avatar
      ARM: Fix codegen for rbit intrinsic · c655f0c8
      Yi Kong authored
      LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic.
      According to ARM ARM, rbit only takes register as argument, not immediate.
      The correct instruction should be rbit <Rd>, <Rm>.
      
      The bug was originally introduced in r211057.
      
      Differential Revision: http://reviews.llvm.org/D4980
      
      llvm-svn: 216064
      c655f0c8
  3. Aug 19, 2014
    • Juergen Ributzka's avatar
      Reapply [FastISel] Let the target decide first if it wants to materialize a constant (215588). · 4bf6c01c
      Juergen Ributzka authored
      Note: This was originally reverted to track down a buildbot error. This commit
      exposed a latent bug that was fixed in r215753. Therefore it is reapplied
      without any modifications.
      
      I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new
      regeressions.
      
      Original commit message:
      This changes the order in which FastISel tries to materialize a constant.
      Originally it would try to use a simple target-independent approach, which
      can lead to the generation of inefficient code.
      
      On X86 this would result in the use of movabsq to materialize any 64bit
      integer constant - even for simple and small values such as 0 and 1. Also
      some very funny floating-point materialization could be observed too.
      
      On AArch64 it would materialize the constant 0 in a register even the
      architecture has an actual "zero" register.
      
      On ARM it would generate unnecessary mov instructions or not use mvn.
      
      This change simply changes the order and always asks the target first if it
      likes to materialize the constant. This doesn't fix all the issues
      mentioned above, but it enables the targets to implement such
      optimizations.
      
      Related to <rdar://problem/17420988>.
      
      llvm-svn: 216006
      4bf6c01c
  4. Aug 18, 2014
    • Oliver Stannard's avatar
      [ARM,AArch64] Do not tail-call to an externally-defined function with weak linkage · 12993dd9
      Oliver Stannard authored
      Externally-defined functions with weak linkage should not be
      tail-called on ARM or AArch64, as the AAELF spec requires normal calls
      to undefined weak functions to be replaced with a NOP or jump to the
      next instruction. The behaviour of branch instructions in this
      situation (as used for tail calls) is implementation-defined, so we
      cannot rely on the linker replacing the tail call with a return.
      
      llvm-svn: 215890
      12993dd9
    • Saleem Abdulrasool's avatar
      ARM: improve RTABI 4.2 conformance on Linux · 017bd57f
      Saleem Abdulrasool authored
      The set of functions defined in the RTABI was separated for no real reason.
      This brings us closer to proper utilisation of the functions defined by the
      RTABI.  It also sets the ground for correctly emitting function calls to AEABI
      functions on all AEABI conforming platforms.
      
      The previously existing lie on the behaviour of __ldivmod and __uldivmod is
      propagated as it is beyond the scope of the change.
      
      The changes to the test are due to the fact that we now use the divmod functions
      which return both the quotient and remainder and thus we no longer need to
      invoke two functions on Linux (making it closer to EABI's behaviour).
      
      llvm-svn: 215862
      017bd57f
  5. Aug 15, 2014
  6. Aug 14, 2014
    • Juergen Ributzka's avatar
      Revert several FastISel commits to track down a buildbot error. · 790bacf2
      Juergen Ributzka authored
      This reverts:
      r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
      r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
      r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
      r215591 "[FastISel][AArch64] Make use of the zero register when possible."
      r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
      r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."
      
      llvm-svn: 215673
      790bacf2
    • Sanjay Patel's avatar
      optimize vector fneg of bitcasted integer value · 35d31336
      Sanjay Patel authored
      This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.
      
      This patch is very similar to a fabs patch committed at r214892.
      
      Differential Revision: http://reviews.llvm.org/D4852
      
      llvm-svn: 215646
      35d31336
    • Juergen Ributzka's avatar
      [FastISel] Let the target decide first if it wants to materialize a constant. · 7cee768e
      Juergen Ributzka authored
      This changes the order in which FastISel tries to materialize a constant.
      Originally it would try to use a simple target-independent approach, which
      can lead to the generation of inefficient code.
      
      On X86 this would result in the use of movabsq to materialize any 64bit
      integer constant - even for simple and small values such as 0 and 1. Also
      some very funny floating-point materialization could be observed too.
      
      On AArch64 it would materialize the constant 0 in a register even the
      architecture has an actual "zero" register.
      
      On ARM it would generate unnecessary mov instructions or not use mvn.
      
      This change simply changes the order and always asks the target first if it
      likes to materialize the constant. This doesn't fix all the issues
      mentioned above, but it enables the targets to implement such
      optimizations.
      
      Related to <rdar://problem/17420988>.
      
      llvm-svn: 215588
      7cee768e
  7. Aug 13, 2014
  8. Aug 11, 2014
    • Saleem Abdulrasool's avatar
      ARM: try harder to detect non-IT eligible instructions · 27c78bf1
      Saleem Abdulrasool authored
      For many Thumb-1 register register instructions, setting the CPSR is not
      permitted inside an IT block.  We would not correctly flag those instructions.
      The previous change to identify this scenario was insufficient as it did not
      actually catch all the instances.  The current list is formed by manual
      inspection of the ARMv6M ARM.
      
      The change to the Thumb2 IT block test is due to the fact that the new more
      stringent checking of the MIs results in the If Conversion pass being prevented
      from executing (since not all the instructions in the BB are predicable).  This
      results in code gen changes.
      
      Thanks to Tim Northover for pointing out that the previous patch was
      insufficient and hinting that the use of the v6M ARM would be much easier to use
      than the v7 or v8!
      
      llvm-svn: 215382
      27c78bf1
    • Sanjay Patel's avatar
      Correct a missing RUN line in the ARM codegen test for fneg ops. We should... · 1f80cde8
      Sanjay Patel authored
      Correct a missing RUN line in the ARM codegen test for fneg ops. We should also explicitly specify +/-neonfp.
      
      The bug was introduced at r99570 when use of "-arm-use-neon-fp" was removed.
      
      Differential Revision: http://reviews.llvm.org/D4846
      
      llvm-svn: 215377
      1f80cde8
    • Oliver Stannard's avatar
      ARM: __gnu_h2f_ieee and __gnu_f2h_ieee always use the soft-float calling convention · 11790b2d
      Oliver Stannard authored
      By default, LLVM uses the "C" calling convention for all runtime
      library functions. The half-precision FP conversion functions use the
      soft-float calling convention, and are needed for some targets which
      use the hard-float convention by default, so must have their calling
      convention explicitly set.
      
      llvm-svn: 215348
      11790b2d
    • Saleem Abdulrasool's avatar
      ARM: correct isPredicable for MULS in ThHUMB mode · ed8885b4
      Saleem Abdulrasool authored
      The ARM ARM states that CPSR may not be updated by a MUL in thumb mode.  Due to
      an ordering of Thumb 2 Size Reduction and If Conversion, we would end up
      generating a THUMB MULS inside an IT block.
      
      The If Conversion pass uses the TTI isPredicable method to ensure that it can
      transform a Basic Block.  However, because we only check for IT handling on
      Thumb2 functions, we may miss some cases.  Even then, it only validates that the
      CPSR is not *live* rather than it is not accessed.  This corrects the handling
      for that particular case since the same restriction does not hold on the vast
      majority of the instructions.
      
      This does prevent the IfConversion optimization from kicking in in certain
      cases, but generating correct code is more valuable.  Addresses PR20555.
      
      llvm-svn: 215328
      ed8885b4
  9. Aug 08, 2014
  10. Aug 07, 2014
  11. Aug 06, 2014
    • Tim Northover's avatar
      ARM: do not generate BLX instructions on Cortex-M CPUs. · 2a417b96
      Tim Northover authored
      Particularly on MachO, we were generating "blx _dest" instructions on M-class
      CPUs, which don't actually exist. They happen to get fixed up by the linker
      into valid "bl _dest" instructions (which is why such a massive issue has
      remained largely undetected), but we shouldn't rely on that.
      
      llvm-svn: 214959
      2a417b96
    • Tim Northover's avatar
      ARM-MachO: materialize callee address correctly on v4t. · d4d294dd
      Tim Northover authored
      llvm-svn: 214958
      d4d294dd
    • David Blaikie's avatar
      DebugInfo: Assert that any CU for which debug_loc lists are emitted, has at least one range. · fb0412f0
      David Blaikie authored
      This was coming in weird debug info that had variables (and hence
      debug_locs) but was in GMLT mode (because it was missing the 13th field
      of the compile_unit metadata) so no ranges were constructed. We should
      always have at least one range for any CU with a debug_loc in it -
      because the range should cover the debug_loc.
      
      The assertion just ensures that the "!= 1" range case inside the
      subsequent loop doesn't get entered for the case where there are no
      ranges at all, which should never reach here in the first place.
      
      llvm-svn: 214939
      fb0412f0
    • David Blaikie's avatar
      DebugInfo: Fix a bunch of tests that, owing to their compile_unit metadata not... · cabf54a3
      David Blaikie authored
      DebugInfo: Fix a bunch of tests that, owing to their compile_unit metadata not including a 13th field, had some subtle behavior.
      
      Without the 13th field, the "emission kind" field defaults to 0 (which
      is not equal to either of the values of the emission kind enum (1 ==
      full debug info, 2 == line tables only)).
      
      In this particular instance, the comparison with "FullDebugInfo" was
      done when adding elements to the ranges list - so for these test cases
      no values were added to the ranges list.
      
      This got weirder when emitting debug_loc entries as the addresses should
      be relative to the range of the CU if the CU has only one range (the
      reasonable assumption is that if we're emitting debug_loc lists for a CU
      that CU has at least one range - but due to the above situation, it has
      zero) so the ranges were emitted relative to the start of the section
      rather than relative to the start of the CU's singular range.
      
      Fix these tests by accounting for the difference in the description of
      debug_loc entries (in some cases making the test ignorant to these
      differences, in others adding the extra label difference expression,
      etc) or the presence/absence of high/low_pc on the CU, and add the 13th
      field to their CUs to enable proper "full debug info" emission here.
      
      In a future commit I'll fix up a bunch of other test cases that are not
      so rigorously depending on this behavior, but still doing similarly
      weird things due to the missing 13th field.
      
      llvm-svn: 214937
      cabf54a3
  12. Aug 05, 2014
    • Jon Roelofs's avatar
      Re-apply r214881: Fix return sequence on armv4 thumb · ef84bda5
      Jon Roelofs authored
      This reverts r214893, re-applying r214881 with the test case relaxed a bit to
      satiate the build bots.
      
      POP on armv4t cannot be used to change thumb state (unilke later non-m-class
      architectures), therefore we need a different return sequence that uses 'bx'
      instead:
      
        POP {r3}
        ADD sp, #offset
        BX r3
      
      This patch also fixes an issue where the return value in r3 would get clobbered
      for functions that return 128 bits of data. In that case, we generate this
      sequence instead:
      
        MOV ip, r3
        POP {r3}
        ADD sp, #offset
        MOV lr, r3
        MOV r3, ip
        BX lr
      
      http://reviews.llvm.org/D4748
      
      llvm-svn: 214928
      ef84bda5
    • Sanjay Patel's avatar
      Improved test cases that were added with r214892. · 1954f2e9
      Sanjay Patel authored
      1. Added ':' to CHECK-LABELs
      2. Added more CHECKs
      3. Added CHECK-NEXTs
      4. Added verbose hex immediate comments to CHECKs
      
      llvm-svn: 214921
      1954f2e9
    • Jon Roelofs's avatar
      Revert r214881 because it broke lots of build-bots · 064eb5a1
      Jon Roelofs authored
      llvm-svn: 214893
      064eb5a1
    • Sanjay Patel's avatar
      Optimize vector fabs of bitcasted constant integer values. · 8e5beb6e
      Sanjay Patel authored
      Allow vector fabs operations on bitcasted constant integer values to be optimized
      in the same way that we already optimize scalar fabs.
      
      So for code like this:
      %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000
      %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast)
      %ret = bitcast <2 x float> %fabs to i64
      
      Instead of generating something like this:
      
      movabsq (constant pool loadi of mask for sign bits)
      vmovq   (move from integer register to vector/fp register)
      vandps  (mask off sign bits)
      vmovq   (move vector/fp register back to integer return register)
      
      We should generate:
      
      mov     (put constant value in return register)
      
      I have also removed a redundant clause in the first 'if' statement:
      N0.getOperand(0).getValueType().isInteger()
      
      is the same thing as:
      IntVT.isInteger()
      
      Testcases for x86 and ARM added to existing files that deal with vector fabs.
      One existing testcase for x86 removed because it is no longer ideal.
      
      For more background, please see:
      http://reviews.llvm.org/D4770
      
      And:
      http://llvm.org/bugs/show_bug.cgi?id=20354
      
      Differential Revision: http://reviews.llvm.org/D4785
      
      llvm-svn: 214892
      8e5beb6e
    • Jon Roelofs's avatar
      Fix return sequence on armv4 thumb · f5fad376
      Jon Roelofs authored
      POP on armv4t cannot be used to change thumb state (unilke later non-m-class
      architectures), therefore we need a different return sequence that uses 'bx'
      instead:
      
        POP {r3}
        ADD sp, #offset
        BX r3
      
      This patch also fixes an issue where the return value in r3 would get clobbered
      for functions that return 128 bits of data. In that case, we generate this
      sequence instead:
      
        MOV ip, r3
        POP {r3}
        ADD sp, #offset
        MOV lr, r3
        MOV r3, ip
        BX lr
      
      http://reviews.llvm.org/D4748
      
      llvm-svn: 214881
      f5fad376
    • David Blaikie's avatar
      Improve test for merged global debug info by using llvm-dwarfdump. · c74ffa9c
      David Blaikie authored
      It's a bit of a tradeoff, since llvm-dwarfdump doesn't print the name of
      the global symbol being used as an address in the addressing mode, but
      this avoids the dependence on hardcoded set labels that keep changing
      (5+ commits over the last few years that each update the set label as it
      changes due to other, unrelated differences in output). This could've,
      instead, been changed to match the set name then match the name in the
      string pool but that would present other issues (needing to skip over
      the sets that weren't of interest, etc) and checking that the addresses
      (granted, without relocations applied - so it's not the whole story)
      match in the two variable location descriptions seems sufficient and
      fairly stable here.
      
      There are a few similar other tests with similar label dependence that
      I'll update soonish.
      
      llvm-svn: 214878
      c74ffa9c
  13. Aug 02, 2014
  14. Aug 01, 2014
  15. Jul 31, 2014
    • Rafael Espindola's avatar
      Use "weak alias" instead of "alias weak" · 464fe024
      Rafael Espindola authored
      Before this patch we had
      
      @a = weak global ...
      but
      @b = alias weak ...
      
      The patch changes aliases to look more like global variables.
      
      Looking at some really old code suggests that the reason was that the old
      bison based parser had a reduction for alias linkages and another one for
      global variable linkages. Putting the alias first avoided the reduce/reduce
      conflict.
      
      The days of the old .ll parser are long gone. The new one parses just "linkage"
      and a later check is responsible for deciding if a linkage is valid in a
      given context.
      
      llvm-svn: 214355
      464fe024
  16. Jul 29, 2014
  17. Jul 25, 2014
Loading