Skip to content
  1. Dec 03, 2014
    • Charlie Turner's avatar
      Emit ABI_FP_rounding attribute. · f02c9248
      Charlie Turner authored
      LLVM understands a -enable-sign-dependent-rounding-fp-math codegen option. When
      the user has specified this option, the Tag_ABI_FP_rounding attribute should be
      emitted with value 1. This option currently does not appear to disable
      transformations and optimizations that assume default floating point rounding
      behavior, AFAICT, but the intention should be recorded in the build attributes,
      regardless of what the compiler actually does with the intention.
      
      Change-Id: If838578df3dc652b6f2796b8d152545674bcb30e
      llvm-svn: 223218
      f02c9248
    • Charlie Turner's avatar
      Add tests for default value of Tag_ABI_FP_rounding. · 1620a69f
      Charlie Turner authored
      Change-Id: I051866d073fc6ce87ce3e693a3762da6d81f4393
      llvm-svn: 223217
      1620a69f
    • Rafael Espindola's avatar
      Ask the module for its the identified types. · 2fa1e43a
      Rafael Espindola authored
      When lazy reading a module, the types used in a function will not be visible to
      a TypeFinder until the body is read.
      
      This patch fixes that by asking the module for its identified struct types.
      If a materializer is present, the module asks it. If not, it uses a TypeFinder.
      
      This fixes pr21374.
      
      I will be the first to say that this is ugly, but it was the best I could find.
      
      Some of the options I looked at:
      
      * Asking the LLVMContext. This could be made to work for gold, but not currently
        for ld64. ld64 will load multiple modules into a single context before merging
        them. This causes us to see types from future merges. Unfortunately,
        MappedTypes is not just a cache when it comes to opaque types. Once the
        mapping has been made, we have to remember it for as long as the key may
        be used. This would mean moving MappedTypes to the Linker class and having
        to drop the Linker::LinkModules static methods, which are visible from C.
      
      * Adding an option to ignore function bodies in the TypeFinder. This would
        fix the PR by picking the worst result. It would work, but unfortunately
        we are currently quite dependent on the upfront type merging. I will
        try to reduce our dependency, but it is not clear that we will be able
        to get rid of it for now.
      
      The only clean solution I could think of is making the Module own the types.
      This would have other advantages, but it is a much bigger change. I will
      propose it, but it is nice to have this fixed while that is discussed.
      
      With the gold plugin, this patch takes the number of types in the LTO clang
      binary from 52817 to 49669.
      
      llvm-svn: 223215
      2fa1e43a
    • Matt Arsenault's avatar
      R600/SI: Remove i1 pseudo VALU ops · becd656c
      Matt Arsenault authored
      Select i1 logical ops directly to 64-bit SALU instructions.
      Vector i1 values are always really in SGPRs, with each
      bit for each item in the wave. This saves about 4 instructions
      when and/or/xoring any condition, and also helps write conditions
      that need to be passed in vcc.
      
      This should work correctly now that the SGPR live range
      fixing pass works. More work is needed to eliminate the VReg_1
      pseudo regclass and possibly the entire SILowerI1Copies pass.
      
      llvm-svn: 223206
      becd656c
    • Tom Stellard's avatar
      StructurizeCFG: Use LoopInfo analysis for better loop detection · 1f0dded0
      Tom Stellard authored
      We were assuming that each back-edge in a region represented a unique
      loop, which is not always the case.  We need to use LoopInfo to
      correctly determine which back-edges are loops.
      
      llvm-svn: 223199
      1f0dded0
    • Tom Stellard's avatar
      R600/SI: Enable inline assembly · 36930806
      Tom Stellard authored
      We just needed to remove the assertion in
      AMDGPURegisterInfo::getFrameRegister(), which is called when
      initializing the parser for inline assembly.
      
      llvm-svn: 223197
      36930806
    • Matt Arsenault's avatar
      R600/SI: Change mubuf offsets to print as decimal · fb13b22d
      Matt Arsenault authored
      This matches SC's behavior.
      
      llvm-svn: 223194
      fb13b22d
    • Nick Lewycky's avatar
      Emit the entry block first and the exit block second, then all the blocks in... · 2e8a6219
      Nick Lewycky authored
      Emit the entry block first and the exit block second, then all the blocks in between afterwards. This is what gcc always does, and some out of tree tools depend on that.
      
      llvm-svn: 223193
      2e8a6219
    • Peter Collingbourne's avatar
      Prologue support · 51d2de7b
      Peter Collingbourne authored
      Patch by Ben Gamari!
      
      This redefines the `prefix` attribute introduced previously and
      introduces a `prologue` attribute.  There are a two primary usecases
      that these attributes aim to serve,
      
        1. Function prologue sigils
      
        2. Function hot-patching: Enable the user to insert `nop` operations
           at the beginning of the function which can later be safely replaced
           with a call to some instrumentation facility
      
        3. Runtime metadata: Allow a compiler to insert data for use by the
           runtime during execution. GHC is one example of a compiler that
           needs this functionality for its tables-next-to-code functionality.
      
      Previously `prefix` served cases (1) and (2) quite well by allowing the user
      to introduce arbitrary data at the entrypoint but before the function
      body. Case (3), however, was poorly handled by this approach as it
      required that prefix data was valid executable code.
      
      Here we redefine the notion of prefix data to instead be data which
      occurs immediately before the function entrypoint (i.e. the symbol
      address). Since prefix data now occurs before the function entrypoint,
      there is no need for the data to be valid code.
      
      The previous notion of prefix data now goes under the name "prologue
      data" to emphasize its duality with the function epilogue.
      
      The intention here is to handle cases (1) and (2) with prologue data and
      case (3) with prefix data.
      
      References
      ----------
      
      This idea arose out of discussions[1] with Reid Kleckner in response to a
      proposal to introduce the notion of symbol offsets to enable handling of
      case (3).
      
      [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html
      
      Test Plan: testsuite
      
      Differential Revision: http://reviews.llvm.org/D6454
      
      llvm-svn: 223189
      51d2de7b
    • Ahmed Bougacha's avatar
      [X86][MC] Intel syntax: accept implicit memory operand sizes larger than 80. · d65f787a
      Ahmed Bougacha authored
      The X86AsmParser intel handling was refactored in r216481, making it
      try each different memory operand size to see which one matches.
      Operand sizes larger than 80 ("[xyz]mmword ptr") were forgotten, which
      led to an "invalid operand" error for code such as:
        movdqa [rax], xmm0
      
      llvm-svn: 223187
      d65f787a
    • Hal Finkel's avatar
      [PowerPC] Fix readcyclecounter to be custom expanded for all 32-bit targets · 01fa7701
      Hal Finkel authored
      We need to use the custom expansion of readcyclecounter on all 32-bit targets
      (even those with 64-bit registers). This should fix the ppc64 buildbot.
      
      llvm-svn: 223182
      01fa7701
    • Tim Northover's avatar
      AArch64: strengthen Darwin ABI alignment assumptions · 4a8ac260
      Tim Northover authored
      A global variable without an explicit alignment specified should be assumed to
      be ABI-aligned according to its type, like on other platforms. This allows us
      to use better memory operations when accessing it.
      
      rdar://18533701
      
      llvm-svn: 223180
      4a8ac260
    • Tim Northover's avatar
      AArch64: don't be too greedy when folding :lo12: accesses into mem ops. · ec7ebebe
      Tim Northover authored
      This frequently leads to cases like:
         ldr xD, [xN, :lo12:var]
         add xA, xN, :lo12:var
         ldr xD, [xA, #8]
      
      where the ADD would have been needed anyway, and the two distinct addressing
      modes can prevent the formation of an ldp. Because of how we handle ADRP
      (aggressively forming an ADRP/ADD pseudo-inst at ISel time), this pattern also
      results in duplicated ADRP instructions (one on its own to cover the ldr, and
      one combined with the add).
      
      llvm-svn: 223172
      ec7ebebe
  2. Dec 02, 2014
    • Michael Zolotukhin's avatar
      ea8327b8
    • Michael Zolotukhin's avatar
      Apply loop-rotate to several vectorizer tests. · 540580ca
      Michael Zolotukhin authored
      Such loops shouldn't be vectorized due to the loops form.
      After applying loop-rotate (+simplifycfg) the tests again start to check
      what they are intended to check.
      
      llvm-svn: 223170
      540580ca
    • Simon Pilgrim's avatar
      [X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets · 6b988ad8
      Simon Pilgrim authored
      4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead.
      
      The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch.
      
      Differential Revision: http://reviews.llvm.org/D6458
      
      llvm-svn: 223165
      6b988ad8
    • Hal Finkel's avatar
      [PowerPC] Implement readcyclecounter for PPC32 · bbdee936
      Hal Finkel authored
      We've long supported readcyclecounter on PPC64, but it is easier there (the
      read of the 64-bit time-base register can be accomplished via a single
      instruction). This now provides an implementation for PPC32 as well. On PPC32,
      the time-base register is still 64 bits, but can only be read 32 bits at a time
      via two separate SPRs. The ISA manual explains how to do this properly (it
      involves re-reading the upper bits and looping if the counter has wrapped while
      being read).
      
      This requires PPC to implement a custom integer splitting legalization for the
      READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
      gets turned into a pseudo-instruction, which is then expanded to the necessary
      sequence (which has three SPR reads, the comparison and the branch).
      
      Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.
      
      llvm-svn: 223161
      bbdee936
    • Lang Hames's avatar
      [AArch64][Stackmaps] Optimize stackmap shadows on AArch64. · a7395bf4
      Lang Hames authored
      Reduce the number of nops emitted for stackmap shadows on AArch64 by counting
      non-stackmap instructions up to the next branch target towards the requested
      shadow.
      
      <rdar://problem/14959522>
      
      llvm-svn: 223156
      a7395bf4
    • Tom Stellard's avatar
      R600/SI: Move more information into SIProgramInfo struct · 4df465bd
      Tom Stellard authored
      llvm-svn: 223154
      4df465bd
    • Matt Arsenault's avatar
      R600: Cleanup some tests and add missing testcases · 6f1e96b4
      Matt Arsenault authored
      llvm-svn: 223151
      6f1e96b4
    • Daniel Sanders's avatar
      [mips] Fix passing of small structures for big-endian O32. · d134c9da
      Daniel Sanders authored
      Summary:
      Like N32/N64, they must be passed in the upper bits of the register.
      
      The new code could be merged with the existing if-statements but I've
      refrained from doing this since it will make porting the O32 implementation
      to tablegen harder later.
      
      Reviewers: vmedic
      
      Reviewed By: vmedic
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D6463
      
      llvm-svn: 223148
      d134c9da
    • Roman Divacky's avatar
      Introduce CPUStringIsValid() into MCSubtargetInfo and use it for ARM .cpu parsing. · 7e6b5955
      Roman Divacky authored
      Previously .cpu directive in ARM assembler didnt switch to the new CPU and
      therefore acted as a nop. This implemented real action for .cpu and eg. 
      allows to assembler FreeBSD kernel with -integrated-as.
      
      llvm-svn: 223147
      7e6b5955
    • Philip Reames's avatar
      [Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder · 1a1bdb22
      Philip Reames authored
      This is the third patch in a small series.  It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085).  The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.  
      
      With this change, gc.statepoints should be functionally complete.  The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.
      
      I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated.  The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.  
      
      During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics.  Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints.  Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack.  The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.  
      
      In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator.  In principal, we shouldn't need to eagerly spill at all.  The register allocator should do any spilling required and the statepoint should simply record that fact.  Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.  
      
      Reviewed by: atrick, ributzka
      
      llvm-svn: 223137
      1a1bdb22
    • Bruno Cardoso Lopes's avatar
      [SwitchLowering] Handle destinations on multiple phi instructions · 15520db9
      Bruno Cardoso Lopes authored
      Follow up from r222926. Also handle multiple destinations from merged
      cases on multiple and subsequent phi instructions.
      
      rdar://problem/19106978
      
      llvm-svn: 223135
      15520db9
    • Ahmed Bougacha's avatar
      [MachineCSE] Clear kill-flag on registers imp-def'd by the CSE'd instruction. · 54b7d334
      Ahmed Bougacha authored
      Go through implicit defs of CSMI and MI, and clear the kill flags on
      their uses in all the instructions between CSMI and MI.
      We might have made some of the kill flags redundant, consider:
        subs  ... %NZCV<imp-def>        <- CSMI
        csinc ... %NZCV<imp-use,kill>   <- this kill flag isn't valid anymore
        subs  ... %NZCV<imp-def>        <- MI, to be eliminated
        csinc ... %NZCV<imp-use,kill>
      Since we eliminated MI, and reused a register imp-def'd by CSMI
      (here %NZCV), that register, if it was killed before MI, should have
      that kill flag removed, because it's lifetime was extended.
      
      Also, add an exhaustive testcase for the motivating example.
      
      Reviewed by: Juergen Ributzka <juergen@apple.com>
      
      llvm-svn: 223133
      54b7d334
    • Tim Northover's avatar
      AArch64: make register block rules apply to vector types too. · 24ec87de
      Tim Northover authored
      The blocking code originated in ARM, which is more aggressive about casting
      types to a canonical representative before doing anything else, so I missed out
      most vector HFAs and broke the ABI. This should fix it.
      
      llvm-svn: 223126
      24ec87de
    • Tom Stellard's avatar
      794c8c0f
    • Bruno Cardoso Lopes's avatar
      [LICM] Avoind store sinking if no preheader is available · d035fbb9
      Bruno Cardoso Lopes authored
      Load instructions are inserted into loop preheaders when sinking stores
      and later removed if not used by the SSA updater. Avoid sinking if the
      loop has no preheader and avoid crashes. This fixes one more side effect
      of not handling indirectbr instructions properly on LoopSimplify.
      
      llvm-svn: 223119
      d035fbb9
    • Asiri Rathnayake's avatar
      Add support for ARM modified-immediate assembly syntax. · a0199b9a
      Asiri Rathnayake authored
      Certain ARM instructions accept 32-bit immediate operands encoded as a 8-bit
      integer value (0-255) and a 4-bit rotation (0-30, even). Current ARM assembly
      syntax support in LLVM allows the decoded (32-bit) immediate to be specified
      as a single immediate operand for such instructions:
      
      mov r0, #4278190080
      
      The ARMARM defines an extended assembly syntax allowing the encoding to be made
      more explicit, as in:
      
      mov r0, #255, #8 ; (same 32-bit value as above)
      
      The behaviour of the two instructions can be different w.r.t flags, which is
      documented under "Modified immediate constants" in ARMARM. This patch enables
      support for this extended syntax at the MC layer.
      
      llvm-svn: 223113
      a0199b9a
    • Charlie Turner's avatar
      Emit Tag_ABI_FP_denormal correctly in fast-math mode. · 15f91c52
      Charlie Turner authored
      The default ARM floating-point mode does not support IEEE 754 mode exactly. Of
      relevance to this patch is that input denormals are flushed to zero. The way in
      which they're flushed to zero depends on the architecture,
      
        * For VFPv2, it is implementation defined as to whether the sign of zero is
          preserved.
        * For VFPv3 and above, the sign of zero is always preserved when a denormal
          is flushed to zero.
      
      When FP support has been disabled, the strategy taken by this patch is to
      assume the software support will mirror the behaviour of the hardware support
      for the target *if it existed*. That is, for architectures which can only have
      VFPv2, it is assumed the software will flush to positive zero. For later
      architectures it is assumed the software will flush to zero preserving sign.
      
      Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd
      llvm-svn: 223110
      15f91c52
    • Sonam Kumari's avatar
      [signext.ll] Removal Of Duplicate Test Cases · f2eacabd
      Sonam Kumari authored
      Removed the duplicate test case existing in signext.ll file.
      
      llvm-svn: 223109
      f2eacabd
    • Hal Finkel's avatar
      Simplify pointer comparisons involving memory allocation functions · afcd8dbb
      Hal Finkel authored
      System memory allocation functions, which are identified at the IR level by the
      noalias attribute on the return value, must return a pointer into a memory region
      disjoint from any other memory accessible to the caller. We can use this
      property to simplify pointer comparisons between allocated memory and local
      stack addresses and the addresses of global variables. Neither the stack nor
      global variables can overlap with the region used by the memory allocator.
      
      Fixes PR21556.
      
      llvm-svn: 223093
      afcd8dbb
  3. Dec 01, 2014
    • Philip Reames's avatar
      [Statepoints 1/4] Statepoint infrastructure for garbage collection: IR Intrinsics · 337c4bd4
      Philip Reames authored
      The statepoint intrinsics are intended to enable precise root tracking through the compiler as to support garbage collectors of all types. The addition of the statepoint intrinsics to LLVM should have no impact on the compilation of any program which does not contain them. There are no side tables created, no extra metadata, and no inhibited optimizations.
      
      A statepoint works by transforming a call site (or safepoint poll site) into an explicit relocation operation. It is the frontend's responsibility (or eventually the safepoint insertion pass we've developed, but that's not part of this patch series) to ensure that any live pointer to a GC object is correctly added to the statepoint and explicitly relocated. The relocated value is just a normal SSA value (as seen by the optimizer), so merges of relocated and unrelocated values are just normal phis. The explicit relocation operation, the fact the statepoint is assumed to clobber all memory, and the optimizers standard semantics ensure that the relocations flow through IR optimizations correctly.
      
      This is the first patch in a small series.  This patch contains only the IR parts; the documentation and backend support will be following separately.  The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.
      
      Reviewed by: atrick, ributzka
      
      llvm-svn: 223078
      337c4bd4
    • Jingyue Wu's avatar
      [NVPTX] Do not emit .weak symbols for NVPTX · 5b62eb9b
      Jingyue Wu authored
      Summary:
      ".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the
      weak directive in MCAsmPrinter customizable, and disables emitting ".weak"
      symbols for NVPTX.
      
      Test Plan: weak-linkage.ll
      
      Reviewers: jholewinski
      
      Reviewed By: jholewinski
      
      Subscribers: majnemer, jholewinski, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D6455
      
      llvm-svn: 223077
      5b62eb9b
    • Reid Kleckner's avatar
      Parse 'ghccc' in .ll files as the GHC convention (cc 10) · 35fc363c
      Reid Kleckner authored
      Previously we just used "cc 10" in the .ll files, but that isn't very
      human readable.
      
      llvm-svn: 223076
      35fc363c
    • Ahmed Bougacha's avatar
      [AArch64] Don't combine "select (setcc i1 LHS, RHS), vL, vR". · d0ce058f
      Ahmed Bougacha authored
      r208210 introduced an optimization that improves the vector select
      codegen by doing the setcc on vectors directly.
      This is a problem they the setcc operands are i1s, because the
      optimization would create vectors of i1, which aren't legal.
      
      Part of PR21549.
      
      Differential Revision: http://reviews.llvm.org/D6308
      
      llvm-svn: 223075
      d0ce058f
    • Ahmed Bougacha's avatar
      [AArch64] Fix v2i8->i16 bitcast legalization. · 87946320
      Ahmed Bougacha authored
      r213378 improved f16 bitcasts, so that they go directly through subregs,
      instead of through the stack.  That code now causes an assertion failure
      for bitcasts from other 16-bits types (most importantly v2i8).
      
      Correct that by doing the custom lowering for i16 bitcasts only when the
      input is an f16.
      
      Part of PR21549.
      
      Differential Revision: http://reviews.llvm.org/D6307
      
      llvm-svn: 223074
      87946320
    • Peter Zotov's avatar
      [OCaml] Move Llvm.clone_module to its own Llvm_transform_utils module. · 0d040f66
      Peter Zotov authored
      This way most code won't link this (substantially large) library,
      if compiled statically with LLVM.
      
      llvm-svn: 223072
      0d040f66
    • Peter Zotov's avatar
      [OCaml] [cmake] Add CMake buildsystem for OCaml. · b20073c6
      Peter Zotov authored
      Closes PR15325.
      
      llvm-svn: 223071
      b20073c6
    • Ahmed Bougacha's avatar
      [MachineVerifier] Accept a MBB with a single landing pad successor. · fb6eeb74
      Ahmed Bougacha authored
      The MachineVerifier used to check that there was always exactly one
      unconditional branch to a non-landingpad (normal) successor.
      If that normal successor to an invoke BB is unreachable, it seems
      reasonable to only have one successor, the landing pad.
      On targets other than AArch64 (and on AArch64 with a different testcase),
      the branch folder turns the branch to the landing pad into a fallthrough.
      The MachineVerifier, which relies on AnalyzeBranch, is unable to check
      the condition, and doesn't complain. However, it does in this specific
      testcase, where the branch to the landing pad remained.
      Make the MachineVerifier accept it.
      
      llvm-svn: 223059
      fb6eeb74
Loading