Skip to content
  1. Feb 03, 2017
    • Alexey Bataev's avatar
      [SelectionDAG] Fix for PR30775: Assertion `NodeToMatch->getOpcode() != · a0d9f258
      Alexey Bataev authored
      ISD::DELETED_NODE && "NodeToMatch was removed partway through
      selection"' failed.
      
      NodeToMatch can be modified during matching, but code does not handle
      this situation.
      
      Differential Revision: https://reviews.llvm.org/D29292
      
      llvm-svn: 294003
      a0d9f258
    • David Blaikie's avatar
      DebugInfo: ensure type and namespace names are included in pubnames/pubtypes... · a0e3c751
      David Blaikie authored
      DebugInfo: ensure type and namespace names are included in pubnames/pubtypes even when they are only present in type units
      
      While looking to add support for placing singular types (types that will
      only be emitted in one place (such as attached to a strong vtable or
      explicit template instantiation definition)) not in type units (since
      type units have overhead) I stumbled across that change causing an
      increase in pubtypes.
      
      Turns out we were missing some types from type units if they were only
      referenced from other type units and not from the debug_info section.
      
      This fixes that, following GCC's line of describing the offset of such
      entities as the CU die (since there's no compile unit-relative offset
      that would describe such an entity - they aren't in the CU). Also like
      GCC, this change prefers to describe the type stub within the CU rather
      than the "just use the CU offset" fallback where possible. This may give
      the DWARF consumer some opportunity to find the extra info in the type
      stub - though I'm not sure GDB does anything with this currently.
      
      The size of the pubnames/pubtypes sections now match exactly with or
      without type units enabled.
      
      This nearly triples (+189%) the pubtypes section for a clang self-host
      and grows pubnames by 0.07% (without compression). For a total of 8%
      increase in debug info sections of the objects of a Split DWARF build
      when using type units.
      
      llvm-svn: 293971
      a0e3c751
    • Bob Haarman's avatar
      [lto] add getLinkerOpts() · dd4ebc1d
      Bob Haarman authored
      Summary: Some compilers, including MSVC and Clang, allow linker options to be specified in source files. In the legacy LTO API, there is a getLinkerOpts() method that returns linker options for the bitcode module being processed. This change adds that method to the new API, so that the COFF linker can get the right linker options when using the new LTO API.
      
      Reviewers: pcc, ruiu, mehdi_amini, tejohnson
      
      Reviewed By: pcc
      
      Differential Revision: https://reviews.llvm.org/D29207
      
      llvm-svn: 293950
      dd4ebc1d
  2. Feb 02, 2017
    • Reid Kleckner's avatar
      [CodeGen] Remove dead call-or-prologue enum from CCState · c35139ec
      Reid Kleckner authored
      This enum has been dead since Olivier Stannard re-implemented ARM byval
      handling in r202985 (2014).
      
      llvm-svn: 293943
      c35139ec
    • Xinliang David Li's avatar
      [PGO] internal option cleanups · 58fcc9bd
      Xinliang David Li authored
      1. Added comments for options
      2. Added missing option cl::desc field
      3. Uniified function filter option for graph viewing.
         Now PGO count/raw-counts share the same
         filter option: -view-bfi-func-name=.
      
      llvm-svn: 293938
      58fcc9bd
    • Quentin Colombet's avatar
      [LiveRangeEdit] Don't mess up with LiveInterval when a new vreg is created. · 5725f56b
      Quentin Colombet authored
      In r283838, we added the capability of splitting unspillable register.
      When doing so we had to make sure the split live-ranges were also
      unspillable and we did that by marking the related live-ranges in the
      delegate method that is called when a new vreg is created.
      However, by accessing the live-range there, we also triggered their lazy
      computation (LiveIntervalAnalysis::getInterval) which is not what we
      want in general. Indeed, later code in LiveRangeEdit is going to build
      the live-ranges this lazy computation may mess up that computation
      resulting in assertion failures. Namely, the createEmptyIntervalFrom
      method expect that the live-range is going to be empty, not computed.
      
      Thanks to Mikael Holmén <mikael.holmen@ericsson.com> for noticing and
      reporting the problem.
      
      llvm-svn: 293934
      5725f56b
    • Xinliang David Li's avatar
      [PGO] make graph view internal options available for all builds · 1eb4ec6a
      Xinliang David Li authored
      Differential Revision: https://reviews.llvm.org/D29259
      
      llvm-svn: 293921
      1eb4ec6a
    • Nirav Dave's avatar
      Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." · 93f9d5ce
      Nirav Dave authored
      This reverts commit r293893 which is miscompiling lua on ARM and
      bootstrapping for x86-windows.
      
      llvm-svn: 293915
      93f9d5ce
    • Amaury Sechet's avatar
      Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFC · f3e421d6
      Amaury Sechet authored
      llvm-svn: 293903
      f3e421d6
    • Nirav Dave's avatar
      In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. · 4442667f
      Nirav Dave authored
          Recommiting after fixing X86 inc/dec chain bug.
      
          * Simplify Consecutive Merge Store Candidate Search
      
          Now that address aliasing is much less conservative, push through
          simplified store merging search and chain alias analysis which only
          checks for parallel stores through the chain subgraph. This is cleaner
          as the separation of non-interfering loads/stores from the
          store-merging logic.
      
          When merging stores search up the chain through a single load, and
          finds all possible stores by looking down from through a load and a
          TokenFactor to all stores visited.
      
          This improves the quality of the output SelectionDAG and the output
          Codegen (save perhaps for some ARM cases where we correctly constructs
          wider loads, but then promotes them to float operations which appear
          but requires more expensive constant generation).
      
          Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
      
          Additional Minor Changes:
      
            1. Finishes removing unused AliasLoad code
      
            2. Unifies the chain aggregation in the merged stores across code
               paths
      
            3. Re-add the Store node to the worklist after calling
               SimplifyDemandedBits.
      
            4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
               arbitrary, but seems sufficient to not cause regressions in
               tests.
      
            5. Remove Chain dependencies of Memory operations on CopyfromReg
               nodes as these are captured by data dependence
      
            6. Forward loads-store values through tokenfactors containing
                {CopyToReg,CopyFromReg} Values.
      
            7. Peephole to convert buildvector of extract_vector_elt to
               extract_subvector if possible (see
               CodeGen/AArch64/store-merge.ll)
      
            8. Store merging for the ARM target is restricted to 32-bit as
               some in some contexts invalid 64-bit operations are being
               generated. This can be removed once appropriate checks are
               added.
      
          This finishes the change Matt Arsenault started in r246307 and
          jyknight's original patch.
      
          Many tests required some changes as memory operations are now
          reorderable, improving load-store forwarding. One test in
          particular is worth noting:
      
            CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
            forwarding converts a load-store pair into a parallel store and
            a memory-realized bitcast of the same value. However, because we
            lose the sharing of the explicit and implicit store values we
            must create another local store. A similar transformation
            happens before SelectionDAG as well.
      
          Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
      
      llvm-svn: 293893
      4442667f
    • Matthias Braun's avatar
      RegisterCoalescer: Cleanup joinReservedPhysReg(); NFC · 9dc3b5ff
      Matthias Braun authored
      - Factor out a common subexpression
      - Add some helpful comments
      - Fix printing of a register in a debug message
      
      llvm-svn: 293856
      9dc3b5ff
    • Paul Robinson's avatar
      Remove an assertion that doesn't hold when mixing -g and -gmlt through · 5362216c
      Paul Robinson authored
      LTO.  Replace it with a related assertion, ensuring that abstract
      variables appear only in abstract scopes.
      Part of PR31437.
      
      Differential Revision: http://reviews.llvm.org/D29430
      
      llvm-svn: 293841
      5362216c
  3. Feb 01, 2017
    • Dehao Chen's avatar
      Change debug-info-for-profiling from a TargetOption to a function attribute. · 0944a8c2
      Dehao Chen authored
      Summary: LTO requires the debug-info-for-profiling to be a function attribute.
      
      Reviewers: echristo, mehdi_amini, dblaikie, probinson, aprantl
      
      Reviewed By: mehdi_amini, dblaikie, aprantl
      
      Subscribers: aprantl, probinson, ahatanak, llvm-commits, mehdi_amini
      
      Differential Revision: https://reviews.llvm.org/D29203
      
      llvm-svn: 293833
      0944a8c2
    • Paul Robinson's avatar
      Remove an assertion that doesn't hold when mixing -g and -gmlt through · a380e613
      Paul Robinson authored
      LTO.  Part of PR31437.
      
      Differential Revision: http://reviews.llvm.org/D29310
      
      llvm-svn: 293818
      a380e613
    • Sanjoy Das's avatar
      [ImplicitNullCheck] Extend canReorder scope · 08da2e28
      Sanjoy Das authored
      Summary:
      This change allows a re-order of two intructions if their uses
      are overlapped.
      
      Patch by Serguei Katkov!
      
      Reviewers: reames, sanjoy
      
      Reviewed By: sanjoy
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D29120
      
      llvm-svn: 293775
      08da2e28
    • Florian Hahn's avatar
      [legalizetypes] Push fp16 -> fp32 extension node to worklist. · 7a5ec55f
      Florian Hahn authored
      Summary:
      This way, the type legalization machinery will take care of registering
      the result of this node properly.
      
      This patches fixes all failing fp16 test cases  with expensive checks.
      (CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll
      CodeGen/X86/soft-fp.ll) 
      
      
      Reviewers: t.p.northover, baldrick, olista01, bogner, jmolloy, davidxl, ab, echristo, hfinkel
      
      Reviewed By: hfinkel
      
      Subscribers: mehdi_amini, hfinkel, davide, RKSimon, aemerson, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D28195
      
      llvm-svn: 293765
      7a5ec55f
    • Evandro Menezes's avatar
      [CodeGen] Move MacroFusion to the target · 94edf029
      Evandro Menezes authored
      This patch moves the class for scheduling adjacent instructions,
      MacroFusion, to the target.
      
      In AArch64, it also expands the fusion to all instructions pairs in a
      scheduling block, beyond just among the predecessors of the branch at the
      end.
      
      Differential revision: https://reviews.llvm.org/D28489
      
      llvm-svn: 293737
      94edf029
    • Sanjoy Das's avatar
      [ImplicitNullCheck] NFC isSuitableMemoryOp cleanup · 15e50b51
      Sanjoy Das authored
      Summary:
      isSuitableMemoryOp method is repsonsible for verification
      that instruction is a candidate to use in implicit null check.
      Additionally it checks that base register is not re-defined before.
      In case base has been re-defined it just returns false and lookup
      is continued while any suitable instruction will not succeed this check
      as well. This results in redundant further operations.
      
      So when we found that base register has been re-defined we just
      stop.
      
      Patch by Serguei Katkov!
      
      Reviewers: reames, sanjoy
      
      Reviewed By: sanjoy
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D29119
      
      llvm-svn: 293736
      15e50b51
    • Stanislav Mekhanoshin's avatar
      Fix regalloc assignment of overlapping registers · 70c245e9
      Stanislav Mekhanoshin authored
      SplitEditor::defFromParent() can create a register copy.
      If register is a tuple of other registers and not all lanes are used
      a copy will be done on a full tuple regardless. Later register unit
      for an unused lane will be considered free and another overlapping
      register tuple can be assigned to a different value even though first
      register is live at that point. That is because interference only look at
      liveness info, while full register copy clobbers all lanes, even unused.
      
      This patch fixes copy to only cover used lanes.
      
      Differential Revision: https://reviews.llvm.org/D29105
      
      llvm-svn: 293728
      70c245e9
    • Kyle Butt's avatar
      CodeGen: Allow small copyable blocks to "break" the CFG. · b15c0667
      Kyle Butt authored
      When choosing the best successor for a block, ordinarily we would have preferred
      a block that preserves the CFG unless there is a strong probability the other
      direction. For small blocks that can be duplicated we now skip that requirement
      as well, subject to some simple frequency calculations.
      
      Differential Revision: https://reviews.llvm.org/D28583
      
      llvm-svn: 293716
      b15c0667
  4. Jan 31, 2017
    • Tim Northover's avatar
      GlobalISel: the translation of an invoke must branch to the good block. · c6bfa481
      Tim Northover authored
      Otherwise bad things happen if the basic block order isn't trivial after an
      invoke.
      
      llvm-svn: 293679
      c6bfa481
    • Matthias Braun's avatar
      InterleaveAccessPass: Avoid constructing invalid shuffle masks · 01fa9622
      Matthias Braun authored
      Fix a bug where we would construct shufflevector instructions addressing
      invalid elements.
      
      Differential Revision: https://reviews.llvm.org/D29313
      
      llvm-svn: 293673
      01fa9622
    • Tim Northover's avatar
      GlobalISel: merge invoke and call translation paths. · 293f7435
      Tim Northover authored
      Well, sort of. But the lower-level code that invoke used to be using completely
      botched the handling of varargs functions, which hopefully won't be possible if
      they're using the same code.
      
      llvm-svn: 293670
      293f7435
    • Nirav Dave's avatar
      [X86] Implement -mfentry · a7c041d1
      Nirav Dave authored
      Summary: Insert calls to __fentry__ at function entry.
      
      Reviewers: hfinkel, craig.topper
      
      Subscribers: mgorny, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D28000
      
      llvm-svn: 293648
      a7c041d1
    • Nicolai Haehnle's avatar
      [DAGCombine] require UnsafeFPMath for re-association of addition · 8813d5d2
      Nicolai Haehnle authored
      Summary:
      The affected transforms all implicitly use associativity of addition,
      for which we usually require unsafe math to be enabled.
      
      The "Aggressive" flag is only meant to convey information about the
      performance of the fused ops relative to a fmul+fadd sequence.
      
      Fixes Bug 31626.
      
      Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD
      
      Subscribers: jholewinski, nemanjai, wdng, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D28675
      
      llvm-svn: 293635
      8813d5d2
    • Keno Fischer's avatar
      [ExecutionDepsFix] Improve clearance calculation for loops · 578cf7aa
      Keno Fischer authored
      Summary:
      In revision rL278321, ExecutionDepsFix learned how to pick a better
      register for undef register reads, e.g. for instructions such as
      `vcvtsi2sdq`. While this revision improved performance on a good number
      of our benchmarks, it unfortunately also caused significant regressions
      (up to 3x) on others. This regression turned out to be caused by loops
      such as:
      
      PH -> A -> B (xmm<Undef> -> xmm<Def>) -> C -> D -> EXIT
            ^                                  |
            +----------------------------------+
      
      In the previous version of the clearance calculation, we would visit
      the blocks in order, remembering for each whether there were any
      incoming backedges from blocks that we hadn't processed yet and if
      so queuing up the block to be re-processed. However, for loop structures
      such as the above, this is clearly insufficient, since the block B
      does not have any unknown backedges, so we do not see the false
      dependency from the previous interation's Def of xmm registers in B.
      
      To fix this, we need to consider all blocks that are part of the loop
      and reprocess them one the correct clearance values are known. As
      an optimization, we also want to avoid reprocessing any later blocks
      that are not part of the loop.
      
      In summary, the iteration order is as follows:
      Before: PH A B C D A'
      Corrected (Naive): PH A B C D A' B' C' D'
      Corrected (w/ optimization): PH A B C A' B' C' D
      
      To facilitate this optimization we introduce two new counters for each
      basic block. The first counts how many of it's predecssors have
      completed primary processing. The second counts how many of its
      predecessors have completed all processing (we will call such a block
      *done*. Now, the criteria to reprocess a block is as follows:
          - All Predecessors have completed primary processing
          - For x the number of predecessors that have completed primary
            processing *at the time of primary processing of this block*,
            the number of predecessors that are done has reached x.
      
      The intuition behind this criterion is as follows:
      We need to perform primary processing on all predecessors in order to
      find out any direct defs in those predecessors. When predecessors are
      done, we also know that we have information about indirect defs (e.g.
      in block B though that were inherited through B->C->A->B). However,
      we can't wait for all predecessors to be done, since that would
      cause cyclic dependencies. However, it is guaranteed that all those
      predecessors that are prior to us in reverse postorder will be done
      before us. Since we iterate of the basic blocks in reverse postorder,
      the number x above, is precisely the count of the number of predecessors
      prior to us in reverse postorder.
      
      Reviewers: myatsina
      Differential Revision: https://reviews.llvm.org/D28759
      
      llvm-svn: 293571
      578cf7aa
  5. Jan 30, 2017
  6. Jan 29, 2017
Loading