Skip to content
  1. Mar 01, 2017
    • Reid Kleckner's avatar
      Elide argument copies during instruction selection · f7c0980c
      Reid Kleckner authored
      Summary:
      Avoids tons of prologue boilerplate when arguments are passed in memory
      and left in memory. This can happen in a debug build or in a release
      build when an argument alloca is escaped.  This will dramatically affect
      the code size of x86 debug builds, because X86 fast isel doesn't handle
      arguments passed in memory at all. It only handles the x86_64 case of up
      to 6 basic register parameters.
      
      This is implemented by analyzing the entry block before ISel to identify
      copy elision candidates. A copy elision candidate is an argument that is
      used to fully initialize an alloca before any other possibly escaping
      uses of that alloca. If an argument is a copy elision candidate, we set
      a flag on the InputArg. If the the target generates loads from a fixed
      stack object that matches the size and alignment requirements of the
      alloca, the SelectionDAG builder will delete the stack object created
      for the alloca and replace it with the fixed stack object. The load is
      left behind to satisfy any remaining uses of the argument value. The
      store is now dead and is therefore elided. The fixed stack object is
      also marked as mutable, as it may now be modified by the user, and it
      would be invalid to rematerialize the initial load from it.
      
      Supersedes D28388
      
      Fixes PR26328
      
      Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans
      
      Subscribers: igorb, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D29668
      
      llvm-svn: 296683
      f7c0980c
    • Nirav Dave's avatar
      [DAG] Prevent Stale nodes from entering worklist · 0a4703b5
      Nirav Dave authored
      Add check that deleted nodes do not get added to worklist. This can
      occur when a node's operand is simplified to an existing node.
      
      This fixes PR32108.
      
      Reviewers: jyknight, hfinkel, chandlerc
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D30506
      
      llvm-svn: 296668
      0a4703b5
    • Artur Pilipenko's avatar
      [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine · e1b2d314
      Artur Pilipenko authored
      Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336).
      
      Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
      
      Reviewed By: filcab
      
      Differential Revision: https://reviews.llvm.org/D29591
      
      llvm-svn: 296651
      e1b2d314
    • Ahmed Bougacha's avatar
      [CodeGen] Remove dead FastISel code after SDAG emitted a tailcall. · 20b3e9a8
      Ahmed Bougacha authored
      When SDAGISel (top-down) selects a tail-call, it skips the remainder
      of the block.
      
      If, before that, FastISel (bottom-up) selected some of the (no-op) next
      few instructions, we can end up with dead instructions following the
      terminator (selected by SDAGISel).
      
      We need to erase them, as we know they aren't necessary (in addition to
      being incorrect).
      
      We already do this when FastISel falls back on the tail-call itself.
      Also remove the FastISel-emitted code if we fallback on the
      instructions between the tail-call and the return.
      
      llvm-svn: 296552
      20b3e9a8
  2. Feb 28, 2017
    • Sanjay Patel's avatar
      [DAGCombiner] use dyn_cast values in foldSelectOfConstants(); NFC · ea61ea9f
      Sanjay Patel authored
      llvm-svn: 296502
      ea61ea9f
    • Craig Topper's avatar
      [DAGISel] When checking if chain node is foldable, make sure the intermediate... · 419f145e
      Craig Topper authored
      [DAGISel] When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.
      
      This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.
      
      llvm-svn: 296486
      419f145e
    • Nirav Dave's avatar
      In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. · f830dec3
      Nirav Dave authored
          Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
      
          * Simplify Consecutive Merge Store Candidate Search
      
          Now that address aliasing is much less conservative, push through
          simplified store merging search and chain alias analysis which only
          checks for parallel stores through the chain subgraph. This is cleaner
          as the separation of non-interfering loads/stores from the
          store-merging logic.
      
          When merging stores search up the chain through a single load, and
          finds all possible stores by looking down from through a load and a
          TokenFactor to all stores visited.
      
          This improves the quality of the output SelectionDAG and the output
          Codegen (save perhaps for some ARM cases where we correctly constructs
          wider loads, but then promotes them to float operations which appear
          but requires more expensive constant generation).
      
          Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
      
          Additional Minor Changes:
      
            1. Finishes removing unused AliasLoad code
      
            2. Unifies the chain aggregation in the merged stores across code
               paths
      
            3. Re-add the Store node to the worklist after calling
               SimplifyDemandedBits.
      
            4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
               arbitrary, but seems sufficient to not cause regressions in
               tests.
      
            5. Remove Chain dependencies of Memory operations on CopyfromReg
               nodes as these are captured by data dependence
      
            6. Forward loads-store values through tokenfactors containing
                {CopyToReg,CopyFromReg} Values.
      
            7. Peephole to convert buildvector of extract_vector_elt to
               extract_subvector if possible (see
               CodeGen/AArch64/store-merge.ll)
      
            8. Store merging for the ARM target is restricted to 32-bit as
               some in some contexts invalid 64-bit operations are being
               generated. This can be removed once appropriate checks are
               added.
      
          This finishes the change Matt Arsenault started in r246307 and
          jyknight's original patch.
      
          Many tests required some changes as memory operations are now
          reorderable, improving load-store forwarding. One test in
          particular is worth noting:
      
            CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
            forwarding converts a load-store pair into a parallel store and
            a memory-realized bitcast of the same value. However, because we
            lose the sharing of the explicit and implicit store values we
            must create another local store. A similar transformation
            happens before SelectionDAG as well.
      
          Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
      
      llvm-svn: 296476
      f830dec3
  3. Feb 27, 2017
  4. Feb 26, 2017
  5. Feb 25, 2017
    • Artyom Skrobov's avatar
      No need to copy the variable [NFC] · ac567192
      Artyom Skrobov authored
      llvm-svn: 296259
      ac567192
    • Nirav Dave's avatar
      In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. · beabf456
      Nirav Dave authored
          Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
      
          * Simplify Consecutive Merge Store Candidate Search
      
          Now that address aliasing is much less conservative, push through
          simplified store merging search and chain alias analysis which only
          checks for parallel stores through the chain subgraph. This is cleaner
          as the separation of non-interfering loads/stores from the
          store-merging logic.
      
          When merging stores search up the chain through a single load, and
          finds all possible stores by looking down from through a load and a
          TokenFactor to all stores visited.
      
          This improves the quality of the output SelectionDAG and the output
          Codegen (save perhaps for some ARM cases where we correctly constructs
          wider loads, but then promotes them to float operations which appear
          but requires more expensive constant generation).
      
          Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
      
          Additional Minor Changes:
      
            1. Finishes removing unused AliasLoad code
      
            2. Unifies the chain aggregation in the merged stores across code
               paths
      
            3. Re-add the Store node to the worklist after calling
               SimplifyDemandedBits.
      
            4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
               arbitrary, but seems sufficient to not cause regressions in
               tests.
      
            5. Remove Chain dependencies of Memory operations on CopyfromReg
               nodes as these are captured by data dependence
      
            6. Forward loads-store values through tokenfactors containing
                {CopyToReg,CopyFromReg} Values.
      
            7. Peephole to convert buildvector of extract_vector_elt to
               extract_subvector if possible (see
               CodeGen/AArch64/store-merge.ll)
      
            8. Store merging for the ARM target is restricted to 32-bit as
               some in some contexts invalid 64-bit operations are being
               generated. This can be removed once appropriate checks are
               added.
      
          This finishes the change Matt Arsenault started in r246307 and
          jyknight's original patch.
      
          Many tests required some changes as memory operations are now
          reorderable, improving load-store forwarding. One test in
          particular is worth noting:
      
            CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
            forwarding converts a load-store pair into a parallel store and
            a memory-realized bitcast of the same value. However, because we
            lose the sharing of the explicit and implicit store values we
            must create another local store. A similar transformation
            happens before SelectionDAG as well.
      
          Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
      
      llvm-svn: 296252
      beabf456
  6. Feb 24, 2017
  7. Feb 23, 2017
  8. Feb 22, 2017
  9. Feb 21, 2017
  10. Feb 20, 2017
  11. Feb 19, 2017
  12. Feb 17, 2017
  13. Feb 16, 2017
  14. Feb 15, 2017
  15. Feb 14, 2017
  16. Feb 13, 2017
Loading