Skip to content
  1. Feb 03, 2017
  2. Feb 02, 2017
    • Nirav Dave's avatar
      Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." · 93f9d5ce
      Nirav Dave authored
      This reverts commit r293893 which is miscompiling lua on ARM and
      bootstrapping for x86-windows.
      
      llvm-svn: 293915
      93f9d5ce
    • Amaury Sechet's avatar
      Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFC · f3e421d6
      Amaury Sechet authored
      llvm-svn: 293903
      f3e421d6
    • Nirav Dave's avatar
      In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. · 4442667f
      Nirav Dave authored
          Recommiting after fixing X86 inc/dec chain bug.
      
          * Simplify Consecutive Merge Store Candidate Search
      
          Now that address aliasing is much less conservative, push through
          simplified store merging search and chain alias analysis which only
          checks for parallel stores through the chain subgraph. This is cleaner
          as the separation of non-interfering loads/stores from the
          store-merging logic.
      
          When merging stores search up the chain through a single load, and
          finds all possible stores by looking down from through a load and a
          TokenFactor to all stores visited.
      
          This improves the quality of the output SelectionDAG and the output
          Codegen (save perhaps for some ARM cases where we correctly constructs
          wider loads, but then promotes them to float operations which appear
          but requires more expensive constant generation).
      
          Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
      
          Additional Minor Changes:
      
            1. Finishes removing unused AliasLoad code
      
            2. Unifies the chain aggregation in the merged stores across code
               paths
      
            3. Re-add the Store node to the worklist after calling
               SimplifyDemandedBits.
      
            4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
               arbitrary, but seems sufficient to not cause regressions in
               tests.
      
            5. Remove Chain dependencies of Memory operations on CopyfromReg
               nodes as these are captured by data dependence
      
            6. Forward loads-store values through tokenfactors containing
                {CopyToReg,CopyFromReg} Values.
      
            7. Peephole to convert buildvector of extract_vector_elt to
               extract_subvector if possible (see
               CodeGen/AArch64/store-merge.ll)
      
            8. Store merging for the ARM target is restricted to 32-bit as
               some in some contexts invalid 64-bit operations are being
               generated. This can be removed once appropriate checks are
               added.
      
          This finishes the change Matt Arsenault started in r246307 and
          jyknight's original patch.
      
          Many tests required some changes as memory operations are now
          reorderable, improving load-store forwarding. One test in
          particular is worth noting:
      
            CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
            forwarding converts a load-store pair into a parallel store and
            a memory-realized bitcast of the same value. However, because we
            lose the sharing of the explicit and implicit store values we
            must create another local store. A similar transformation
            happens before SelectionDAG as well.
      
          Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
      
      llvm-svn: 293893
      4442667f
  3. Feb 01, 2017
    • Florian Hahn's avatar
      [legalizetypes] Push fp16 -> fp32 extension node to worklist. · 7a5ec55f
      Florian Hahn authored
      Summary:
      This way, the type legalization machinery will take care of registering
      the result of this node properly.
      
      This patches fixes all failing fp16 test cases  with expensive checks.
      (CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll
      CodeGen/X86/soft-fp.ll) 
      
      
      Reviewers: t.p.northover, baldrick, olista01, bogner, jmolloy, davidxl, ab, echristo, hfinkel
      
      Reviewed By: hfinkel
      
      Subscribers: mehdi_amini, hfinkel, davide, RKSimon, aemerson, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D28195
      
      llvm-svn: 293765
      7a5ec55f
  4. Jan 31, 2017
    • Nicolai Haehnle's avatar
      [DAGCombine] require UnsafeFPMath for re-association of addition · 8813d5d2
      Nicolai Haehnle authored
      Summary:
      The affected transforms all implicitly use associativity of addition,
      for which we usually require unsafe math to be enabled.
      
      The "Aggressive" flag is only meant to convey information about the
      performance of the fused ops relative to a fmul+fadd sequence.
      
      Fixes Bug 31626.
      
      Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD
      
      Subscribers: jholewinski, nemanjai, wdng, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D28675
      
      llvm-svn: 293635
      8813d5d2
  5. Jan 30, 2017
  6. Jan 29, 2017
  7. Jan 28, 2017
    • Matthias Braun's avatar
      Cleanup dump() functions. · 8c209aa8
      Matthias Braun authored
      We had various variants of defining dump() functions in LLVM. Normalize
      them (this should just consistently implement the things discussed in
      http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html
      
      For reference:
      - Public headers should just declare the dump() method but not use
        LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
      - The definition of a dump method should look like this:
        #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
        LLVM_DUMP_METHOD void MyClass::dump() {
          // print stuff to dbgs()...
        }
        #endif
      
      llvm-svn: 293359
      8c209aa8
  8. Jan 27, 2017
    • Jonas Paulsson's avatar
      [DAGTypeLegalizer] Handle SIGN/ZERO_EXTEND in WidenVecRes_Convert(). · bb0ed3e7
      Jonas Paulsson authored
      In case of a SIGN/ZERO_EXTEND of an incomplete vector type (using only a
      partial number of available vector elements), WidenVecRes_Convert() used to
      resort to scalarization.
      
      This patch adds a handling of the (common) case where an input vector can be
      found of same width as the widened result vector, by converting the node to
      SIGN/ZERO_EXTEND_VECTOR_INREG.
      
      Review: Eli Friedman
      llvm-svn: 293268
      bb0ed3e7
    • Andrew Kaylor's avatar
      Add intrinsics for constrained floating point operations · a0a1164c
      Andrew Kaylor authored
      This commit introduces a set of experimental intrinsics intended to prevent
      optimizations that make assumptions about the rounding mode and floating point
      exception behavior.  These intrinsics will later be extended to specify
      flush-to-zero behavior.  More work is also required to model instruction
      dependencies in machine code and to generate these instructions from clang
      (when required by pragmas and/or command line options that are not currently
      supported).
      
      Differential Revision: https://reviews.llvm.org/D27028
      
      llvm-svn: 293226
      a0a1164c
  9. Jan 26, 2017
    • Nirav Dave's avatar
      Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." · d32a421f
      Nirav Dave authored
      This reverts commit r293184 which is failing in LTO builds
      
      llvm-svn: 293188
      d32a421f
    • Nirav Dave's avatar
      In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. · de6516c4
      Nirav Dave authored
          * Simplify Consecutive Merge Store Candidate Search
      
          Now that address aliasing is much less conservative, push through
          simplified store merging search and chain alias analysis which only
          checks for parallel stores through the chain subgraph. This is cleaner
          as the separation of non-interfering loads/stores from the
          store-merging logic.
      
          When merging stores search up the chain through a single load, and
          finds all possible stores by looking down from through a load and a
          TokenFactor to all stores visited.
      
          This improves the quality of the output SelectionDAG and the output
          Codegen (save perhaps for some ARM cases where we correctly constructs
          wider loads, but then promotes them to float operations which appear
          but requires more expensive constant generation).
      
          Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
      
          Additional Minor Changes:
      
            1. Finishes removing unused AliasLoad code
      
            2. Unifies the chain aggregation in the merged stores across code
               paths
      
            3. Re-add the Store node to the worklist after calling
               SimplifyDemandedBits.
      
            4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
               arbitrary, but seems sufficient to not cause regressions in
               tests.
      
            5. Remove Chain dependencies of Memory operations on CopyfromReg
               nodes as these are captured by data dependence
      
            6. Forward loads-store values through tokenfactors containing
                {CopyToReg,CopyFromReg} Values.
      
            7. Peephole to convert buildvector of extract_vector_elt to
               extract_subvector if possible (see
               CodeGen/AArch64/store-merge.ll)
      
            8. Store merging for the ARM target is restricted to 32-bit as
               some in some contexts invalid 64-bit operations are being
               generated. This can be removed once appropriate checks are
               added.
      
          This finishes the change Matt Arsenault started in r246307 and
          jyknight's original patch.
      
          Many tests required some changes as memory operations are now
          reorderable, improving load-store forwarding. One test in
          particular is worth noting:
      
            CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
            forwarding converts a load-store pair into a parallel store and
            a memory-realized bitcast of the same value. However, because we
            lose the sharing of the explicit and implicit store values we
            must create another local store. A similar transformation
            happens before SelectionDAG as well.
      
          Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
      
      llvm-svn: 293184
      de6516c4
    • Craig Topper's avatar
  10. Jan 25, 2017
    • Tim Northover's avatar
      SDag: fix how initial loads are formed when splitting vector ops. · 470f070b
      Tim Northover authored
      Later code expects the vector loads produced to be directly
      concatenable, which means we shouldn't pad anything except the last load
      produced with UNDEF.
      
      llvm-svn: 293088
      470f070b
    • Krzysztof Parzyszek's avatar
      ee9aa3ff
    • Artur Pilipenko's avatar
      Fix buildbot failures introduced by 293036 · bc934524
      Artur Pilipenko authored
      Fix unused variable, specify types explicitly to make VC compiler happy.
      
      llvm-svn: 293039
      bc934524
    • Artur Pilipenko's avatar
      [DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2. · 41c0005a
      Artur Pilipenko authored
      The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm.
      http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html
      
      This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used.
      
      From the original commit:
      
      Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.
      
      Assuming little endian target:
        i8 *a = ...
        i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
      =>
        i32 val = *((i32)a)
      
        i8 *a = ...
        i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
      =>
        i32 val = BSWAP(*((i32)a))
      
      This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.
      
      Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
        i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)
      
      Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.
      
      The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.
      
      Reviewed By: RKSimon, filcab, chandlerc 
      
      Differential Revision: https://reviews.llvm.org/D27861
      
      llvm-svn: 293036
      41c0005a
    • Matt Arsenault's avatar
      DAG: Recognize no-signed-zeros-fp-math attribute · 732a5315
      Matt Arsenault authored
      clang already emits this with -cl-no-signed-zeros, but codegen
      doesn't do anything with it. Treat it like the other fast math
      attributes, and change one place to use it.
      
      llvm-svn: 293024
      732a5315
    • Matt Arsenault's avatar
      DAGCombiner: Allow negating ConstantFP after legalize · 8a27aee6
      Matt Arsenault authored
      llvm-svn: 293019
      8a27aee6
  11. Jan 24, 2017
    • Geoff Berry's avatar
      [SelectionDAG] Handle inverted conditions when splitting into multiple branches. · 92a286ae
      Geoff Berry authored
      Summary:
      When conditional branches with complex conditions are split into
      multiple branches in SelectionDAGBuilder::FindMergedConditions, also
      handle inverted conditions.  These may sometimes appear without having
      been optimized by InstCombine when CodeGenPrepare decides to sink and
      duplicate cmp instructions, causing them to have only one use.  This
      problem can be increased by e.g. GVNHoist hiding more cmps from
      InstCombine by combining equivalent cmps from different blocks.
      
      For example codegen X & !(Y | Z) as:
          jmp_if_X TmpBB
          jmp FBB
        TmpBB:
          jmp_if_notY Tmp2BB
          jmp FBB
        Tmp2BB:
          jmp_if_notZ TBB
          jmp FBB
      
      Reviewers: bogner, MatzeB, qcolombet
      
      Subscribers: llvm-commits, hiraditya, mcrosier, sebpop
      
      Differential Revision: https://reviews.llvm.org/D28380
      
      llvm-svn: 292944
      92a286ae
    • Craig Topper's avatar
      [SelectionDAG] Teach getNode to simplify a couple easy cases of EXTRACT_SUBVECTOR · ff272ad4
      Craig Topper authored
      Summary:
      This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations.
      
      For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there.
      
      Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines.
      
      Reviewers: RKSimon, delena
      
      Reviewed By: RKSimon
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D29000
      
      llvm-svn: 292876
      ff272ad4
    • David L. Jones's avatar
      [Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC) · d21529fa
      David L. Jones authored
      Summary:
      The LibFunc::Func enum holds enumerators named for libc functions.
      Unfortunately, there are real situations, including libc implementations, where
      function names are actually macros (musl uses "#define fopen64 fopen", for
      example; any other transitively visible macro would have similar effects).
      
      Strictly speaking, a conforming C++ Standard Library should provide any such
      macros as functions instead (via <cstdio>). However, there are some "library"
      functions which are not part of the standard, and thus not subject to this
      rule (fopen64, for example). So, in order to be both portable and consistent,
      the enum should not use the bare function names.
      
      The old enum naming used a namespace LibFunc and an enum Func, with bare
      enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
      with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
      macros.)
      
      There are additional changes required in clang.
      
      Reviewers: rsmith
      
      Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D28476
      
      llvm-svn: 292848
      d21529fa
  12. Jan 23, 2017
  13. Jan 19, 2017
    • Simon Pilgrim's avatar
      [SelectionDAG] Improve knownbits handling of UMIN/UMAX (PR31293) · fb32eea1
      Simon Pilgrim authored
      This patch improves the knownbits logic for unsigned integer min/max opcodes.
      
      For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits.
      
      This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set.
      
      Differential Revision: https://reviews.llvm.org/D28853
      
      llvm-svn: 292528
      fb32eea1
    • Mikael Holmen's avatar
      [DAG] Don't increase SDNodeOrder for dbg.value/declare. · 2074e749
      Mikael Holmen authored
      Summary:
      The SDNodeOrder is saved in the IROrder field in the SDNode, and this
      field may affects scheduling. Thus, letting dbg.value/declare increase
      the order numbers may in turn affect scheduling.
      
      Because of this change we also need to update the code deciding when
      dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues.
      
      Dbg values now have the same order as the SDNode they are connected to,
      not the following orders.
      
      Test cases provided by Florian Hahn.
      
      Reviewers: bogner, aprantl, sunfish, atrick
      
      Reviewed By: atrick
      
      Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB
      
      Differential Revision: https://reviews.llvm.org/D25318
      
      llvm-svn: 292485
      2074e749
  14. Jan 18, 2017
  15. Jan 17, 2017
    • Ahmed Bougacha's avatar
      Revert "[TLI] Robustize SDAG proto checking by merging it into TLI." · 9e5a085c
      Ahmed Bougacha authored
      This reverts commit r292189, as it causes issues on SystemZ bots.
      
      llvm-svn: 292191
      9e5a085c
    • Ahmed Bougacha's avatar
      [TLI] Robustize SDAG proto checking by merging it into TLI. · c018efd6
      Ahmed Bougacha authored
      SelectionDAGBuilder recognizes libfuncs using some homegrown
      parameter type-checking.
      
      Use TLI instead, removing another heap of redundant code.
      
      This isn't strictly NFC, as the SDAG code was too lax.
      Concretely, this means changes are required to two tests:
      - calling a non-variadic function via a variadic prototype isn't OK;
        it just happens to work on x86_64 (but not on, e.g., aarch64).
      - mempcpy has a size_t parameter;  the SDAG code accepts any integer
        type, which meant using i32 on x86_64 worked.
      
      I don't think it's worth supporting either of these (IMO) broken
      testcases.  Instead, fix them to be more correct.
      
      llvm-svn: 292189
      c018efd6
  16. Jan 16, 2017
  17. Jan 13, 2017
Loading