Skip to content
  1. Jan 19, 2014
  2. Jan 16, 2014
  3. Jan 15, 2014
    • Weiming Zhao's avatar
      PR 18466: Fix ARM Pseudo Expansion · fe26fd27
      Weiming Zhao authored
      When expanding neon pseudo stores, it may miss the implicit uses of sub
      regs, which may cause post RA scheduler reorder instructions that
      breakes anti dependency.
      
      For example:
        VST1d64QPseudo %R0<kill>, 16, %Q9_Q10, pred:14, pred:%noreg
        will be expanded to
          VST1d64Q %R0<kill>, 16, %D18, pred:14, pred:%noreg;
      
      An instruction that defines %D20 may be scheduled before the store by
      mistake.
      
      This patches adds implicit uses for such case. For the example above, it
      emits:
        VST1d64Q %R0<kill>, 8, %D18, pred:14, pred:%noreg, %Q9_Q10<imp-use>
      
      llvm-svn: 199282
      fe26fd27
  4. Jan 14, 2014
  5. Jan 13, 2014
  6. Jan 11, 2014
  7. Jan 10, 2014
  8. Jan 07, 2014
    • Saleem Abdulrasool's avatar
      ARM IAS: improve .eabi_attribute handling · 87ccd367
      Saleem Abdulrasool authored
      Parse tag names as well as expressions.  The former is part of the
      specification, the latter is for improved compatibility with the GNU assembler.
      Fix attribute value handling to be comformant to the specification.
      
      llvm-svn: 198662
      87ccd367
  9. Jan 06, 2014
    • Tim Northover's avatar
      ARM MachO: sort out isTargetDarwin/isTargetIOS/... checks. · d6a729bb
      Tim Northover authored
      The ARM backend has been using most of the MachO related subtarget
      checks almost interchangeably, and since the only target it's had to
      run on has been IOS (which is all three of MachO, Darwin and IOS) it's
      worked out OK so far.
      
      But we'd like to support embedded targets under the "*-*-none-macho"
      triple, which means everything starts falling apart and inconsistent
      behaviours emerge.
      
      This patch should pick a reasonably sensible set of behaviours for the
      new triple (and any others that come along, with luck). Some choices
      were debatable (notably FP == r7 or r11), but we can revisit those
      later when deficiencies become apparent.
      
      llvm-svn: 198617
      d6a729bb
    • Tim Northover's avatar
      ARM: keep special non-AEABIness of "-darwin-eabi" triples for now · 7649ebac
      Tim Northover authored
      Longer term, we want to move users to "*-*-*-macho" for embedded work, but for
      now people are relying on the last thing we told them, which is unfortunately
      "*-*-darwin-eabi".
      
      rdar://problem/15703934
      
      llvm-svn: 198602
      7649ebac
  10. Jan 02, 2014
  11. Dec 30, 2013
  12. Dec 28, 2013
    • Andrew Trick's avatar
      New machine model for cortex-a9. Schedule for resources and latency. · 3ca67d64
      Andrew Trick authored
      Schedule more conservatively to account for stalls on floating point
      resources and latency. Use the AGU resource to model latency stalls
      since it's shared between FP and LD/ST instructions. This might not be
      completely accurate but should work well in practice.
      
      llvm-svn: 198125
      3ca67d64
  13. Dec 19, 2013
    • Josh Magee's avatar
      58fa4939
    • Rafael Espindola's avatar
      Add a triple so that this passes on OS X. · 357d013e
      Rafael Espindola authored
      I am surprised I am the first one to notice this.
      
      llvm-svn: 197689
      357d013e
    • Josh Magee's avatar
      [stackprotector] Use analysis from the StackProtector pass for stack layout in... · 22b8ba2d
      Josh Magee authored
      [stackprotector] Use analysis from the StackProtector pass for stack layout in PEI a nd LocalStackSlot passes.
      
      This changes the MachineFrameInfo API to use the new SSPLayoutKind information
      produced by the StackProtector pass (instead of a boolean flag) and updates a
      few pass dependencies (to preserve the SSP analysis).
      
      The stack layout follows the same approach used prior to this change - i.e.,
      only LargeArray stack objects will be placed near the canary and everything
      else will be laid out normally.  After this change, structures containing large
      arrays will also be placed near the canary - a case previously missed by the
      old implementation.
      
      Out of tree targets will need to update their usage of
      MachineFrameInfo::CreateStackObject to remove the MayNeedSP argument. 
      
      The next patch will implement the rules for sspstrong and sspreq.  The end goal
      is to support ssp-strong stack layout rules.
      
      WIP.
      
      Differential Revision: http://llvm-reviews.chandlerc.com/D2158
      
      llvm-svn: 197653
      22b8ba2d
  14. Dec 18, 2013
  15. Dec 17, 2013
    • Quentin Colombet's avatar
      Add warning capabilities in LLVM. · b4c44d23
      Quentin Colombet authored
      This reapplies r197438 and fixes the link-time circular dependency between
      IR and Support. The fix consists in moving the diagnostic support into IR.
      
      The patch adds a new LLVMContext::diagnose that can be used to communicate to
      the front-end, if any, that something of interest happened.
      The diagnostics are supported by a new abstraction, the DiagnosticInfo class.
      The base class contains the following information:
      - The kind of the report: What this is about.
      - The severity of the report: How bad this is.
      
      This patch also adds 2 classes:
      - DiagnosticInfoInlineAsm: For inline asm reporting. Basically, this diagnostic
      will be used to switch to the new diagnostic API for LLVMContext::emitError.
      - DiagnosticStackSize: For stack size reporting. Comes as a replacement of the
      hard coded warning in PEI.
      
      This patch also features dynamic diagnostic identifiers. In other words plugins
      can use this infrastructure for their own diagnostics (for more details, see
      getNextAvailablePluginDiagnosticKind).
      
      This patch introduces a new DiagnosticHandlerTy and a new DiagnosticContext in
      the LLVMContext that should be set by the front-end to be able to map these
      diagnostics in its own system.
      
      http://llvm-reviews.chandlerc.com/D2376
      <rdar://problem/15515174>
      
      llvm-svn: 197508
      b4c44d23
    • Quentin Colombet's avatar
    • Quentin Colombet's avatar
      Add warning capabilities in LLVM. · 66673f40
      Quentin Colombet authored
      The patch adds a new LLVMContext::diagnose that can be used to communicate to
      the front-end, if any, that something of interest happened.
      The diagnostics are supported by a new abstraction, the DiagnosticInfo class.
      The base class contains the following information:
      - The kind of the report: What this is about.
      - The severity of the report: How bad this is.
      
      This patch also adds 2 classes:
      - DiagnosticInfoInlineAsm: For inline asm reporting. Basically, this diagnostic
      will be used to switch to the new diagnostic API for LLVMContext::emitError.
      - DiagnosticStackSize: For stack size reporting. Comes as a replacement of the
      hard coded warning in PEI.
      
      This patch also features dynamic diagnostic identifiers. In other words plugins
      can use this infrastructure for their own diagnostics (for more details, see
      getNextAvailablePluginDiagnosticKind).
      
      This patch introduces a new DiagnosticHandlerTy and a new DiagnosticContext in
      the LLVMContext that should be set by the front-end to be able to map these
      diagnostics in its own system.
      
      http://llvm-reviews.chandlerc.com/D2376
      <rdar://problem/15515174>
      
      llvm-svn: 197438
      66673f40
  16. Dec 16, 2013
  17. Dec 13, 2013
  18. Dec 11, 2013
    • Tim Northover's avatar
      ARM: constrain register-class in fast-isel · 76fc8a4c
      Tim Northover authored
      The tests were no longer using fast-isel at all (MachO needs an "ios" rather
      than "darwin" triple at the moment and Linux needs ARM mode). Once that was
      corrected, the verifier complained about a t2ADDri created for the alloca.
      
      llvm-svn: 197046
      76fc8a4c
  19. Dec 08, 2013
    • Tim Northover's avatar
      ARM: fix folding of stack-adjustment (yet again). · a4173715
      Tim Northover authored
      When trying to eliminate an "sub sp, sp, #N" instruction by folding
      it into an existing push/pop using dummy registers, we need to account
      for the fact that this might affect precisely how "fp" gets set in the
      prologue.
      
      We were attempting this, but assuming that *whenever* we performed a
      fold it would make a difference. This is false, for example, in:
          push {r4, r7, lr}
          add fp, sp, #4
          vpush {d8}
          sub sp, sp, #8
      
      we can fold the "sub" into the "vpush", forming "vpush {d7, d8}".
      However, in that case the "add fp" instruction mustn't change, which
      we were getting wrong before.
      
      Should fix PR18160.
      
      llvm-svn: 196725
      a4173715
  20. Dec 06, 2013
    • Weiming Zhao's avatar
      Bug 18149: [AArch32] VSel instructions has no ARMCC field · 43d8e6cb
      Weiming Zhao authored
      The current peephole optimizing for compare inst assumes an instr that
      uses CPSR has an MO for ARM Cond code.However, for VSEL instructions
      (vseqeq, vselgt, vselgt, vselvs), there is no such operand nor do
      they support the modification of Cond Code.
      
      llvm-svn: 196588
      43d8e6cb
  21. Dec 05, 2013
    • Andrew Trick's avatar
      MI-Sched: handle latency of in-order operations with the new machine model. · 880e573d
      Andrew Trick authored
      The per-operand machine model allows the target to define "unbuffered"
      processor resources. This change is a quick, cheap way to model stalls
      caused by the latency of operations that use such resources. This only
      applies when the processor's micro-op buffer size is non-zero
      (Out-of-Order). We can't precisely model in-order stalls during
      out-of-order execution, but this is an easy and effective
      heuristic. It benefits cortex-a9 scheduling when using the new
      machine model, which is not yet on by default.
      
      MI-Sched for armv7 was evaluated on Swift (and only not enabled because
      of a performance bug related to predication). However, we never
      evaluated Cortex-A9 performance on MI-Sched in its current form. This
      change adds MI-Sched functionality to reach performance goals on
      A9. The only remaining change is to allow MI-Sched to run as a PostRA
      pass.
      
      I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7:
      -mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false
      
      For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results:
      (min run time over 2 runs, filtering tiny changes)
      
      Speedups:
      | Benchmarks/BenchmarkGame/recursive         |  52.39% |
      | Benchmarks/VersaBench/beamformer           |  20.80% |
      | Benchmarks/Misc/pi                         |  19.97% |
      | Benchmarks/Misc/mandel-2                   |  19.95% |
      | SPEC/CFP2000/188.ammp                      |  18.72% |
      | Benchmarks/McCat/08-main/main              |  18.58% |
      | Benchmarks/Misc-C++/Large/sphereflake      |  18.46% |
      | Benchmarks/Olden/power                     |  17.11% |
      | Benchmarks/Misc-C++/mandel-text            |  16.47% |
      | Benchmarks/Misc/oourafft                   |  15.94% |
      | Benchmarks/Misc/flops-7                    |  14.99% |
      | Benchmarks/FreeBench/distray               |  14.26% |
      | SPEC/CFP2006/470.lbm                       |  14.00% |
      | mediabench/mpeg2/mpeg2dec/mpeg2decode      |  12.28% |
      | Benchmarks/SmallPT/smallpt                 |  10.36% |
      | Benchmarks/Misc-C++/Large/ray              |   8.97% |
      | Benchmarks/Misc/fp-convert                 |   8.75% |
      | Benchmarks/Olden/perimeter                 |   7.10% |
      | Benchmarks/Bullet/bullet                   |   7.03% |
      | Benchmarks/Misc/mandel                     |   6.75% |
      | Benchmarks/Olden/voronoi                   |   6.26% |
      | Benchmarks/Misc/flops-8                    |   5.77% |
      | Benchmarks/Misc/matmul_f64_4x4             |   5.19% |
      | Benchmarks/MiBench/security-rijndael       |   5.15% |
      | Benchmarks/Misc/flops-6                    |   5.10% |
      | Benchmarks/Olden/tsp                       |   4.46% |
      | Benchmarks/MiBench/consumer-lame           |   4.28% |
      | Benchmarks/Misc/flops-5                    |   4.27% |
      | Benchmarks/mafft/pairlocalalign            |   4.19% |
      | Benchmarks/Misc/himenobmtxpa               |   4.07% |
      | Benchmarks/Misc/lowercase                  |   4.06% |
      | SPEC/CFP2006/433.milc                      |   3.99% |
      | Benchmarks/tramp3d-v4                      |   3.79% |
      | Benchmarks/FreeBench/pifft                 |   3.66% |
      | Benchmarks/Ptrdist/ks                      |   3.21% |
      | Benchmarks/Adobe-C++/loop_unroll           |   3.12% |
      | SPEC/CINT2000/175.vpr                      |   3.12% |
      | Benchmarks/nbench                          |   2.98% |
      | SPEC/CFP2000/183.equake                    |   2.91% |
      | Benchmarks/Misc/perlin                     |   2.85% |
      | Benchmarks/Misc/flops-1                    |   2.82% |
      | Benchmarks/Misc-C++-EH/spirit              |   2.80% |
      | Benchmarks/Misc/flops-2                    |   2.77% |
      | Benchmarks/NPB-serial/is                   |   2.42% |
      | Benchmarks/ASC_Sequoia/CrystalMk           |   2.33% |
      | Benchmarks/BenchmarkGame/n-body            |   2.28% |
      | Benchmarks/SciMark2-C/scimark2             |   2.27% |
      | Benchmarks/Olden/bh                        |   2.03% |
      | skidmarks10/skidmarks                      |   1.81% |
      | Benchmarks/Misc/flops                      |   1.72% |
      
      Slowdowns:
      | Benchmarks/llubenchmark/llu                | -14.14% |
      | Benchmarks/Polybench/stencils/seidel-2d    |  -5.67% |
      | Benchmarks/Adobe-C++/functionobjects       |  -5.25% |
      | Benchmarks/Misc-C++/oopack_v1p8            |  -5.00% |
      | Benchmarks/Shootout/hash                   |  -2.35% |
      | Benchmarks/Prolangs-C++/ocean              |  -2.01% |
      | Benchmarks/Polybench/medley/floyd-warshall |  -1.98% |
      | Polybench/linear-algebra/kernels/3mm       |  -1.95% |
      | Benchmarks/McCat/09-vor/vor                |  -1.68% |
      
      llvm-svn: 196516
      880e573d
    • Tim Northover's avatar
      ARM: fix yet another stack-folding bug · e4def5e2
      Tim Northover authored
      We were trying to fold the stack adjustment into the wrong instruction in the
      situation where the entire basic-block was epilogue code. Really, it can only
      ever be valid to do the folding precisely where the "add sp, ..." would be
      placed so there's no need for a separate iterator to track that.
      
      Should fix PR18136.
      
      llvm-svn: 196493
      e4def5e2
  22. Dec 04, 2013
    • David Peixotto's avatar
      Add support for parsing ARM symbol variants on ELF targets · 8ad70b35
      David Peixotto authored
      ARM symbol variants are written with parens instead of @ like this:
      
        .word __GLOBAL_I_a(target1)
      
      This commit adds support for parsing these symbol variants in
      expressions. We introduce a new flag to MCAsmInfo that indicates the
      parser should use parens to parse the symbol variant. The expression
      parser is modified to look for symbol variants using parens instead
      of @ when the corresponding MCAsmInfo flag is true.
      
      The MCAsmInfo parens flag is enabled only for ARM on ELF.
      
      By adding this flag to MCAsmInfo, we are able to get rid of
      redundant ARM-specific symbol variants and use the generic variants
      instead (e.g. VK_GOT instead of VK_ARM_GOT). We use the new
      UseParensForSymbolVariant attribute in MCAsmInfo to correctly print
      the symbol variants for arm.
      
      To achive this we need to keep a handle to the MCAsmInfo in the
      MCSymbolRefExpr class that we can check when printing the symbol
      variant.
      
      Updated Tests:
        Changed case of symbol variant to match the generic kind.
        test/CodeGen/ARM/tls-models.ll
        test/CodeGen/ARM/tls1.ll
        test/CodeGen/ARM/tls2.ll
        test/CodeGen/Thumb2/tls1.ll
        test/CodeGen/Thumb2/tls2.ll
      
      PR18080
      
      llvm-svn: 196424
      8ad70b35
  23. Dec 03, 2013
  24. Dec 02, 2013
  25. Dec 01, 2013
    • Tim Northover's avatar
      ARM: fix bug in -Oz stack adjustment folding · 45479dcf
      Tim Northover authored
      Previously, we clobbered callee-saved registers when folding an "add
      sp, #N" into a "pop {rD, ...}" instruction. This change checks whether
      a register we're going to add to the "pop" could actually be live
      outside the function before doing so and should fix the issue.
      
      This should fix PR18081.
      
      llvm-svn: 196046
      45479dcf
  26. Nov 26, 2013
  27. Nov 25, 2013
  28. Nov 23, 2013
  29. Nov 22, 2013
Loading