Skip to content
  1. Feb 07, 2014
  2. Feb 06, 2014
    • Quentin Colombet's avatar
      [CodeGenPrepare] Move away sign extensions that get in the way of addressing · 3a4bf040
      Quentin Colombet authored
      mode.
      
      Basically the idea is to transform code like this:
      %idx = add nsw i32 %a, 1
      %sextidx = sext i32 %idx to i64
      %gep = gep i8* %myArray, i64 %sextidx
      load i8* %gep
      
      Into:
      %sexta = sext i32 %a to i64
      %idx = add nsw i64 %sexta, 1
      %gep = gep i8* %myArray, i64 %idx
      load i8* %gep
      
      That way the computation can be folded into the addressing mode.
      
      This transformation is done as part of the addressing mode matcher.
      If the matching fails (not profitable, addressing mode not legal, etc.), the
      matcher will revert the related promotions.
      
      <rdar://problem/15519855>
      
      llvm-svn: 200947
      3a4bf040
    • Andrew Trick's avatar
      Track register pressure a bit more carefully (weird corner case). · 2a15637e
      Andrew Trick authored
      This solves a problem where a def machine operand has no uses but has
      not been marked dead. In this case, the initial RP analysis was being
      extra precise and determining from LiveIntervals the the register was
      actually dead. This caused us to omit the register from the RP
      tracker's block live out. That's all good, but the per-instruction
      summary still accounted for it as a valid def. This could cause an
      assertion in the tracker later when we underflow pressure.
      
      This is from a bug report on an out-of-tree target. It is not
      reproducible on well-behaved targets. I'm just making an obvious fix
      without unit test.
      
      llvm-svn: 200941
      2a15637e
    • Evan Cheng's avatar
      Revert r200095 and r200152. It turns out when compiling with -arch armv7... · 91f205bf
      Evan Cheng authored
      Revert r200095 and r200152. It turns out when compiling with -arch armv7 -mcpu=cortex-m3, the triple would still set iOS as the OS so the hack is still needed. rdar://15984891
      
      llvm-svn: 200937
      91f205bf
    • Tom Stellard's avatar
      R600/SI: Add a MUBUF store pattern for Reg+Imm offsets · e2367945
      Tom Stellard authored
      llvm-svn: 200935
      e2367945
    • Tom Stellard's avatar
      R600/SI: Add a MUBUF store pattern for Imm offsets · 2937cbc0
      Tom Stellard authored
      llvm-svn: 200934
      2937cbc0
    • Tom Stellard's avatar
      R600/SI: Add a MUBUF load pattern for Reg+Imm offsets · 11624bc5
      Tom Stellard authored
      llvm-svn: 200933
      11624bc5
    • Tom Stellard's avatar
      R600/SI: Use immediates offsets for SMRD instructions whenever possible · 044e418f
      Tom Stellard authored
      There was a problem with the old pattern, so we were copying some
      larger immediates into registers when we could have been encoding
      them in the instruction.
      
      llvm-svn: 200932
      044e418f
    • David Peixotto's avatar
      Remove const_cast for STI when parsing inline asm · ea2bcb9e
      David Peixotto authored
      In a previous commit (r199818) we added a const_cast to an existing
      subtarget info instead of creating a new one so that we could reuse
      it when creating the TargetAsmParser for parsing inline assembly.
      This cast was necessary because we needed to reuse the existing STI
      to avoid generating incorrect code when the inline asm contained
      mode-switching directives (e.g. .code 16).
      
      The root cause of the failure was that there was an implicit sharing
      of the STI between the parser and the MCCodeEmitter. To fix a
      different but related issue, we now explicitly pass the STI to the
      MCCodeEmitter (see commits r200345-r200351).
      
      The const_cast is no longer necessary and we can now create a fresh
      STI for the inline asm parser to use.
      
      Differential Revision: http://llvm-reviews.chandlerc.com/D2709
      
      llvm-svn: 200929
      ea2bcb9e
    • Tim Northover's avatar
      X86: add costs for 64-bit vector ext/trunc & rebalance · f0e21616
      Tim Northover authored
      The most important part of this is probably adding any cost at all for
      operations like zext <8 x i8> to <8 x i32>. Before they were being
      recorded as extremely costly (24, I believe) which made LLVM fall back
      on a 4-wide vectorisation of a loop.
      
      It also rebalances the values for sext, zext and trunc. Lacking any
      other sane metric that might work across CPU microarchitectures I went
      for instructions. This seems to be in reasonable accord with the rest
      of the table (sitofp, ...) though no doubt at least one value is
      sub-optimal for some bizarre reason.
      
      Finally, separate AVX and AVX2 values are provided where appropriate.
      The CodeGen is quite different in many cases.
      
      rdar://problem/15981990
      
      llvm-svn: 200928
      f0e21616
    • Eli Bendersky's avatar
      Add a -suppress-warnings option to bitcode linking. · e17f3708
      Eli Bendersky authored
      llvm-svn: 200927
      e17f3708
    • Puyan Lotfi's avatar
      Yet another patch to reduce compile time for small programs: · efbcf494
      Puyan Lotfi authored
      The aim in this patch is to reduce work that VirtRegRewriter needs to do when
      telling MachineRegisterInfo which physregs are in use. Up until now
      VirtRegRewriter::rewrite has been doing rewriting and populating def info and
      then proceeding to set whether a physreg is used based this info for every
      physreg that the target provides. This can be expensive when a target has an
      unusually high number of supported physregs, and is a noticeable chunk of
      compile time for small programs on such targets.
      
      So to reduce compile time, this patch simply adds the use of a SparseSet to the
      rewrite function that is used to flag each physreg that is encountered in a
      MachineFunction. Afterward, rather than iterating over the set of all physregs
      for a given target to set the physregs used in MachineRegisterInfo, the new way
      is to iterate over the set of physregs that were actually encountered and set
      in the SparseSet. This improves compile time because the existing rewrite
      function was iterating over all MachineOperands already, and because the
      iterations afterward to setPhysRegUsed is reduced by use of the SparseSet data.
      
      llvm-svn: 200919
      efbcf494
    • Tim Northover's avatar
      X86: deduplicate V[SZ]EXT_MOVL and V[SZ]EXT nodes · 546b57b0
      Tim Northover authored
      I believe VZEXT_MOVL means "zero all vector elements except the first" (and
      should have identical input & output types) whereas VZEXT means "zero extend
      each element of a vector (discarding higher elements if necessary)".
      
      For example:
          (v4i32 (vzext (v16i8 ...)))
      
      should zero extend the low 4 bytes of the incoming vector to 32-bits,
      discarding higher bytes.
      
      However, somewhere in the past, these two concepts had become confused, even
      leading to a nonsensical VSEXT_MOVL.
      
      This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL
      -> VZEXT when it's an actual extension).
      
      rdar://problem/15981990
      
      llvm-svn: 200918
      546b57b0
    • Puyan Lotfi's avatar
      The following patch' purpose is to reduce compile time for compilation of small · 5eb10048
      Puyan Lotfi authored
      programs on targets with large register files. The root of the compile time
      overhead was in the use of llvm::SmallVector to hold PhysRegEntries, which
      resulted in slow-down from calling llvm::SmallVector::assign(N, 0). In contrast
      std::vector uses the faster __platform_bzero to zero out primitive buffers when
      assign is called, while SmallVector uses an iterator.
      
      The fix for this was simply to replace the SmallVector with a dynamically
      allocated buffer and to initialize or reinitialize the buffer based on the
      total registers that the target architecture requires. The changes support
      cases where a pass manager may be reused for different targets, and note that
      the PhysRegEntries is allocated using calloc mainly for good for, and also to
      quite tools like Valgrind (see comments for more info on this).
      
      There is an rdar to track the fact that SmallVector doesn't have platform
      specific speedup optimizations inside of it for things like this, and I'll
      create a bugzilla entry at some point soon as well.
      
      TL;DR: This fix replaces the expensive llvm::SmallVector<unsigned
      char>::assign(N, 0) with a call to calloc for N bytes which is much faster
      because SmallVector's assign uses iterators.
      
      llvm-svn: 200917
      5eb10048
    • Puyan Lotfi's avatar
      This small change reduces compile time for small programs on targets that have · 12ae04bd
      Puyan Lotfi authored
      large register files. The omission of Queries.clear() is perfectly safe because
      LiveIntervalUnion::Query doesn't contain any data that needs freeing and
      because LiveRegMatrix::runOnFunction happens to reset the OwningArrayPtr
      holding Queries every time it is run, so there's no need to zero out the
      queries either. Not having to do this for very large numbers of physregs
      is a noticeable constant cost reduction in compilation of small programs.
      
      llvm-svn: 200913
      12ae04bd
    • Nick Lewycky's avatar
      99384949
    • Craig Topper's avatar
      Delete all of the CodeGenInstructions from CodeGenTarget destructor. · f1aab450
      Craig Topper authored
      llvm-svn: 200906
      f1aab450
    • Chandler Carruth's avatar
      [PM] Fix horrible typos that somehow didn't cause a failure in a C++11 · d1ba2efb
      Chandler Carruth authored
      build but spectacularly changed behavior of the C++98 build. =]
      
      This shows my one problem with not having unittests -- basic API
      expectations aren't well exercised by the integration tests because they
      *happen* to not come up, even though they might later. I'll probably add
      a basic unittest to complement the integration testing later, but
      I wanted to revive the bots.
      
      llvm-svn: 200905
      d1ba2efb
    • Chandler Carruth's avatar
      [PM] Add a new "lazy" call graph analysis pass for the new pass manager. · bf71a34e
      Chandler Carruth authored
      The primary motivation for this pass is to separate the call graph
      analysis used by the new pass manager's CGSCC pass management from the
      existing call graph analysis pass. That analysis pass is (somewhat
      unfortunately) over-constrained by the existing CallGraphSCCPassManager
      requirements. Those requirements make it *really* hard to cleanly layer
      the needed functionality for the new pass manager on top of the existing
      analysis.
      
      However, there are also a bunch of things that the pass manager would
      specifically benefit from doing differently from the existing call graph
      analysis, and this new implementation tries to address several of them:
      
      - Be lazy about scanning function definitions. The existing pass eagerly
        scans the entire module to build the initial graph. This new pass is
        significantly more lazy, and I plan to push this even further to
        maximize locality during CGSCC walks.
      - Don't use a single synthetic node to partition functions with an
        indirect call from functions whose address is taken. This node creates
        a huge choke-point which would preclude good parallelization across
        the fanout of the SCC graph when we got to the point of looking at
        such changes to LLVM.
      - Use a memory dense and lightweight representation of the call graph
        rather than value handles and tracking call instructions. This will
        require explicit update calls instead of some updates working
        transparently, but should end up being significantly more efficient.
        The explicit update calls ended up being needed in many cases for the
        existing call graph so we don't really lose anything.
      - Doesn't explicitly model SCCs and thus doesn't provide an "identity"
        for an SCC which is stable across updates. This is essential for the
        new pass manager to work correctly.
      - Only form the graph necessary for traversing all of the functions in
        an SCC friendly order. This is a much simpler graph structure and
        should be more memory dense. It does limit the ways in which it is
        appropriate to use this analysis. I wish I had a better name than
        "call graph". I've commented extensively this aspect.
      
      This is still very much a WIP, in fact it is really just the initial
      bits. But it is about the fourth version of the initial bits that I've
      implemented with each of the others running into really frustrating
      problms. This looks like it will actually work and I'd like to split the
      actual complexity across commits for the sake of my reviewers. =] The
      rest of the implementation along with lots of wiring will follow
      somewhat more rapidly now that there is a good path forward.
      
      Naturally, this doesn't impact any of the existing optimizer. This code
      is specific to the new pass manager.
      
      A bunch of thanks are deserved for the various folks that have helped
      with the design of this, especially Nick Lewycky who actually sat with
      me to go through the fundamentals of the final version here.
      
      llvm-svn: 200903
      bf71a34e
    • Chandler Carruth's avatar
      [PM] Back out one hunk of the patch in r200901 that was *supposed* to go · e309d376
      Chandler Carruth authored
      in my next patch. Sorry for the breakage.
      
      llvm-svn: 200902
      e309d376
    • Chandler Carruth's avatar
      [PM] Wire up the analysis managers in the opt driver. This isn't really · c68d0824
      Chandler Carruth authored
      necessary until we add analyses to the driver, but I have such an
      analysis ready and wanted to split this out. This is actually exercised
      by the existing tests of the new pass manager as the analysis managers
      are cross-checked and validated by the function and module managers.
      
      llvm-svn: 200901
      c68d0824
    • Juergen Ributzka's avatar
      [DAG] Don't pull the binary operation though the shift if the operands have opaque constants. · fa0eba6c
      Juergen Ributzka authored
      During DAGCombine visitShiftByConstant assumes that certain binary operations
      with only constant operands can always be folded successfully. This is no longer
      true when the constant is opaque. This commit fixes visitShiftByConstant by not
      performing the optimization for opaque constants. Otherwise we would end up in
      an infinite DAGCombine loop.
      
      llvm-svn: 200900
      fa0eba6c
    • Manman Ren's avatar
      Set default of inlinecold-threshold to 225. · d4612449
      Manman Ren authored
      225 is the default value of inline-threshold. This change will make sure
      we have the same inlining behavior as prior to r200886.
      
      As Chandler points out, even though we don't have code in our testing
      suite that uses cold attribute, there are larger applications that do
      use cold attribute.
      
      r200886 + this commit intend to keep the same behavior as prior to r200886.
      We can later on tune the inlinecold-threshold.
      
      The main purpose of r200886 is to help performance of instrumentation based
      PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
      For instrumentation based PGO, we try to increase inlining of hot functions and
      reduce inlining of cold functions by setting inlinecold-threshold.
      
      Another option suggested by Chandler is to use a boolean flag that controls
      if we should use OptSizeThreshold for cold functions. The default value
      of the boolean flag should not change the current behavior. But it gives us
      less freedom in controlling inlining of cold functions.
      
      llvm-svn: 200898
      d4612449
    • Kevin Enderby's avatar
      Update the X86 assembler for .intel_syntax to accept · d6b10713
      Kevin Enderby authored
      the << and >> bitwise operators.
      
      rdar://15975725
      
      llvm-svn: 200896
      d6b10713
    • Rafael Espindola's avatar
      don't set HasReliableSymbolDifference for ELF. · 6a383f9a
      Rafael Espindola authored
      It is only used in MachObjectWriter.cpp. Another leftover from early days
      of ELF in MC.
      
      llvm-svn: 200895
      6a383f9a
    • Rafael Espindola's avatar
      doesSectionRequireSymbols is meaningless on ELF, remove. · 12f04984
      Rafael Espindola authored
      This is a nop. doesSectionRequireSymbols is only used from
      isSymbolLinkerVisible. isSymbolLinkerVisible only use from ELF was in
      
      if (!Asm.isSymbolLinkerVisible(Symbol) && !Symbol.isUndefined())
        return false;
      
      if (Symbol.isTemporary())
        return false;
      
      If the symbol is a temporary this code returns false and it is irrelevant if
      we take the first if or not. If the symbol is not a temporary,
      Asm.isSymbolLinkerVisible returns true without ever calling
      doesSectionRequireSymbols.
      
      This was an horrible leftover from when support for ELF was first added.
      
      llvm-svn: 200894
      12f04984
    • Paul Robinson's avatar
      Disable most IR-level transform passes on functions marked 'optnone'. · af4e64d0
      Paul Robinson authored
      Ideally only those transform passes that run at -O0 remain enabled,
      in reality we get as close as we reasonably can.
      Passes are responsible for disabling themselves, it's not the job of
      the pass manager to do it for them.
      
      llvm-svn: 200892
      af4e64d0
    • Rafael Espindola's avatar
      Just returning false is the default. · 4998280f
      Rafael Espindola authored
      llvm-svn: 200890
      4998280f
    • Matt Arsenault's avatar
      Pass address space to allowsUnalignedMemoryAccesses · 1b55dd9a
      Matt Arsenault authored
      llvm-svn: 200888
      1b55dd9a
    • Matt Arsenault's avatar
      Add address space argument to allowsUnalignedMemoryAccess. · 25793a3f
      Matt Arsenault authored
      On R600, some address spaces have more strict alignment
      requirements than others.
      
      llvm-svn: 200887
      25793a3f
  3. Feb 05, 2014
    • Manman Ren's avatar
      Inliner uses a smaller inline threshold for callees with cold attribute. · e8781b1a
      Manman Ren authored
      Added command line option inlinecold-threshold to set threshold for inlining
      functions with cold attribute. Listen to the cold attribute when it would
      decrease the inline threshold.
      
      llvm-svn: 200886
      e8781b1a
    • Nick Kledzik's avatar
      Fix layering StringRef copy using BumpPtrAllocator. · 4d6d9812
      Nick Kledzik authored
      Now to copy a string into a BumpPtrAllocator and get a StringRef to the copy:
      
         StringRef myCopy = myStr.copy(myAllocator);
         
      
      llvm-svn: 200885
      4d6d9812
    • Quentin Colombet's avatar
      [RegAlloc] Add a last chance recoloring mechanism when everything else failed to · 87769713
      Quentin Colombet authored
      find a register.
      
      The idea is to choose a color for the variable that cannot be allocated and
      recolor its interferences around. Unlike the current register allocation scheme,
      it is allowed to change the color of an already assigned (but maybe not
      splittable or spillable) live interval while propagating this change to its
      neighbors.
      In other word, there are two things that may help finding an available color:
      - Already assigned variables (RS_Done) can be recolored to different color.
      - The recoloring allows to catch solutions that needs to touch more that just
        the neighbors of the current allocated variable.
      
      E.g.,
      vA can use {R1, R2    }
      vB can use {    R2, R3}
      vC can use {R1        }
      Where vA, vB, and vC cannot be split anymore (they are reloads for instance) and
      they all interfere.
      
      vA is assigned R1
      vB is assigned R2
      vC tries to evict vA but vA is already done.
      => Regular register allocation heuristic fails.
      
      Last chance recoloring kicks in:
      vC does as if vA was evicted => vC uses R1.
      vC is marked as fixed.
      vA needs to find a color.
      None are available.
      vA cannot evict vC: vC is a fixed virtual register now.
      vA does as if vB was evicted => vA uses R2.
      vB needs to find a color.
      R3 is available.
      Recoloring => vC = R1, vA = R2, vB = R3.
      
      <rdar://problem/15947839>
      
      llvm-svn: 200883
      87769713
    • Chandler Carruth's avatar
      [PM] Don't require analysis results to be const in the new pass manager. · eedf9fca
      Chandler Carruth authored
      I think this was just over-eagerness on my part. The analysis results
      need to often be non-const because they need to (in some cases at least)
      be updated by the transformation pass in order to remain correct. It
      also makes lazy analyses (a common case) needlessly annoying to write in
      order to make their entire state mutable.
      
      llvm-svn: 200881
      eedf9fca
    • Rafael Espindola's avatar
      Remove support for not using .loc directives. · b4eec1da
      Rafael Espindola authored
      Clang itself was not using this. The only way to access it was via llc.
      
      llvm-svn: 200862
      b4eec1da
    • Rafael Espindola's avatar
      Revert "Fix an invalid check for duplicate option categories." · 0bca63a3
      Rafael Espindola authored
      This reverts commit r200853.
      
      It was causing clang/Analysis/checker-plugins.c to crash.
      
      llvm-svn: 200858
      0bca63a3
    • Petar Jovanovic's avatar
      [mips] Add NaCl target and forbid indexed loads and stores for it · 9725016a
      Petar Jovanovic authored
      This patch adds NaCl target for Mips. It also forbids indexed loads and
      stores if the target is NaCl.
      
      Patch by Sasa Stankovic.
      
      Differential Revision: http://llvm-reviews.chandlerc.com/D2690
      
      llvm-svn: 200855
      9725016a
Loading