Commits · a0d9f2582b7c31e604f4dc82fd5eae10d33aae7e · Lorenzo Albano / LLVM bpEVL

Feb 03, 2017

[SelectionDAG] Fix for PR30775: Assertion `NodeToMatch->getOpcode() != · a0d9f258

Alexey Bataev authored Feb 03, 2017

ISD::DELETED_NODE && "NodeToMatch was removed partway through
selection"' failed.

NodeToMatch can be modified during matching, but code does not handle
this situation.

Differential Revision: https://reviews.llvm.org/D29292

llvm-svn: 294003

a0d9f258

Feb 02, 2017

Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." · 93f9d5ce
Nirav Dave authored Feb 02, 2017
```
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915
```
93f9d5ce
Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFC · f3e421d6
Amaury Sechet authored Feb 02, 2017
```
llvm-svn: 293903
```
f3e421d6

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. · 4442667f

Nirav Dave authored Feb 02, 2017

    Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893

4442667f

Feb 01, 2017

[legalizetypes] Push fp16 -> fp32 extension node to worklist. · 7a5ec55f

Florian Hahn authored Feb 01, 2017

Summary:
This way, the type legalization machinery will take care of registering
the result of this node properly.

This patches fixes all failing fp16 test cases  with expensive checks.
(CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll
CodeGen/X86/soft-fp.ll) 


Reviewers: t.p.northover, baldrick, olista01, bogner, jmolloy, davidxl, ab, echristo, hfinkel

Reviewed By: hfinkel

Subscribers: mehdi_amini, hfinkel, davide, RKSimon, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28195

llvm-svn: 293765

7a5ec55f

Jan 31, 2017

[DAGCombine] require UnsafeFPMath for re-association of addition · 8813d5d2

Nicolai Haehnle authored Jan 31, 2017

Summary:
The affected transforms all implicitly use associativity of addition,
for which we usually require unsafe math to be enabled.

The "Aggressive" flag is only meant to convey information about the
performance of the fused ops relative to a fmul+fadd sequence.

Fixes Bug 31626.

Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD

Subscribers: jholewinski, nemanjai, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D28675

llvm-svn: 293635

8813d5d2

Jan 30, 2017
- Use SelectionDAG::getBuildVector helper function where possible. NFCI. · ffe2535c
  Simon Pilgrim authored Jan 30, 2017
  
  llvm-svn: 293532
  ffe2535c
- SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced · 8f520a73
  Justin Bogner authored Jan 30, 2017
  
  Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we happened to replace a node during UpdateChains, because it would be left in the list we were iterating over. This nulls out the pointer when that happens so that we can avoid the issue. Fixes llvm.org/PR31710 llvm-svn: 293522
  8f520a73
- Use SelectionDAG::getBuildVector/getSplatBuildVector helper functions where possible. NFCI. · 0a5ab5c4
  Simon Pilgrim authored Jan 30, 2017
  
  llvm-svn: 293520
  0a5ab5c4
- DAG: Fold fneg into compare with constant into the constant · 32e6bfa2
  Matt Arsenault authored Jan 30, 2017
  
  fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred) InstCombine already does this. llvm-svn: 293512
  32e6bfa2
- DAG: Constant fold fp16_to_fp/fp16_to_fp · 0c687390
  Matt Arsenault authored Jan 30, 2017
  
  This fixes emitting conversions of constants on targets without legal f16 that need to use these for legalization. llvm-svn: 293499
  0c687390
Jan 29, 2017

[SelectionDAG] Make SDNode::getConstantOperandVal an inline method. · 135da1fa
Craig Topper authored Jan 29, 2017
```
It's operation already exists manually in many places without using the method.

llvm-svn: 293421
```
135da1fa

[DAGCombiner] Use unsigned for a constant vector index instead of APInt. · 4753736a

Craig Topper authored Jan 29, 2017

The type system requires that the number of vector elements should fit in 32-bits so this should be safe.

llvm-svn: 293414

4753736a

[DAGCombiner] Remove unnecessary check on the size of the type of the index of EXTRACT_SUBVECTOR. · d1573090

Craig Topper authored Jan 29, 2017

The type system already requires that the number of vector elements must fit in 32-bits so an index should as well. Even if the type of the index were larger all we care about is that the constant index can fit in 64-bits so that we can call getZExtValue.

llvm-svn: 293413

d1573090

[DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before trying... · 24cdbe8f
Craig Topper authored Jan 29, 2017
```
[DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before trying to use getConstantOperandVal.

llvm-svn: 293412
```
24cdbe8f

Jan 28, 2017

Cleanup dump() functions. · 8c209aa8

Matthias Braun authored Jan 28, 2017

We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html

For reference:
- Public headers should just declare the dump() method but not use
  LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
  #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  LLVM_DUMP_METHOD void MyClass::dump() {
    // print stuff to dbgs()...
  }
  #endif

llvm-svn: 293359

8c209aa8

Jan 27, 2017

[DAGTypeLegalizer] Handle SIGN/ZERO_EXTEND in WidenVecRes_Convert(). · bb0ed3e7

Jonas Paulsson authored Jan 27, 2017

In case of a SIGN/ZERO_EXTEND of an incomplete vector type (using only a
partial number of available vector elements), WidenVecRes_Convert() used to
resort to scalarization.

This patch adds a handling of the (common) case where an input vector can be
found of same width as the widened result vector, by converting the node to
SIGN/ZERO_EXTEND_VECTOR_INREG.

Review: Eli Friedman
llvm-svn: 293268

bb0ed3e7

Add intrinsics for constrained floating point operations · a0a1164c

Andrew Kaylor authored Jan 26, 2017

This commit introduces a set of experimental intrinsics intended to prevent
optimizations that make assumptions about the rounding mode and floating point
exception behavior.  These intrinsics will later be extended to specify
flush-to-zero behavior.  More work is also required to model instruction
dependencies in machine code and to generate these instructions from clang
(when required by pragmas and/or command line options that are not currently
supported).

Differential Revision: https://reviews.llvm.org/D27028

llvm-svn: 293226

a0a1164c

Jan 26, 2017

Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." · d32a421f
Nirav Dave authored Jan 26, 2017
```
This reverts commit r293184 which is failing in LTO builds

llvm-svn: 293188
```
d32a421f

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. · de6516c4

Nirav Dave authored Jan 26, 2017

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293184

de6516c4

[DAGCombiner] Fold extract_subvector of undef to undef. Fold away inserting undef subvectors. · 001aad7d
Craig Topper authored Jan 26, 2017
```
llvm-svn: 293152
```
001aad7d

Jan 25, 2017

SDag: fix how initial loads are formed when splitting vector ops. · 470f070b

Tim Northover authored Jan 25, 2017

Later code expects the vector loads produced to be directly
concatenable, which means we shouldn't pad anything except the last load
produced with UNDEF.

llvm-svn: 293088

470f070b

Add iterator_range<regclass_iterator> to {Target,MC}RegisterInfo, NFC · ee9aa3ff
Krzysztof Parzyszek authored Jan 25, 2017
```
llvm-svn: 293077
```
ee9aa3ff
Fix buildbot failures introduced by 293036 · bc934524
Artur Pilipenko authored Jan 25, 2017
```
Fix unused variable, specify types explicitly to make VC compiler happy.

llvm-svn: 293039
```
bc934524

[DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2. · 41c0005a

Artur Pilipenko authored Jan 25, 2017

The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html

This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used.

From the original commit:

Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: RKSimon, filcab, chandlerc 

Differential Revision: https://reviews.llvm.org/D27861

llvm-svn: 293036

41c0005a

DAG: Recognize no-signed-zeros-fp-math attribute · 732a5315

Matt Arsenault authored Jan 25, 2017

clang already emits this with -cl-no-signed-zeros, but codegen
doesn't do anything with it. Treat it like the other fast math
attributes, and change one place to use it.

llvm-svn: 293024

732a5315

DAGCombiner: Allow negating ConstantFP after legalize · 8a27aee6
Matt Arsenault authored Jan 25, 2017
```
llvm-svn: 293019
```
8a27aee6

Jan 24, 2017

[SelectionDAG] Handle inverted conditions when splitting into multiple branches. · 92a286ae

Geoff Berry authored Jan 24, 2017

Summary:
When conditional branches with complex conditions are split into
multiple branches in SelectionDAGBuilder::FindMergedConditions, also
handle inverted conditions.  These may sometimes appear without having
been optimized by InstCombine when CodeGenPrepare decides to sink and
duplicate cmp instructions, causing them to have only one use.  This
problem can be increased by e.g. GVNHoist hiding more cmps from
InstCombine by combining equivalent cmps from different blocks.

For example codegen X & !(Y | Z) as:
    jmp_if_X TmpBB
    jmp FBB
  TmpBB:
    jmp_if_notY Tmp2BB
    jmp FBB
  Tmp2BB:
    jmp_if_notZ TBB
    jmp FBB

Reviewers: bogner, MatzeB, qcolombet

Subscribers: llvm-commits, hiraditya, mcrosier, sebpop

Differential Revision: https://reviews.llvm.org/D28380

llvm-svn: 292944

92a286ae

[SelectionDAG] Teach getNode to simplify a couple easy cases of EXTRACT_SUBVECTOR · ff272ad4

Craig Topper authored Jan 24, 2017

Summary:
This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations.

For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there.

Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines.

Reviewers: RKSimon, delena

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29000

llvm-svn: 292876

ff272ad4

[Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC) · d21529fa

David L. Jones authored Jan 23, 2017

Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).

Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.

The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)

There are additional changes required in clang.

Reviewers: rsmith

Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28476

llvm-svn: 292848

d21529fa

Jan 23, 2017
- DAG: Don't fold vector extract into load if target doesn't want to · 4e305c6c
  Matt Arsenault authored Jan 23, 2017
  
  Fixes turning a 32-bit scalar load into an extending vector load for AMDGPU when dynamically indexing a vector. llvm-svn: 292842
  4e305c6c
- DAG: Allow legalization of fcanonicalize vector types · 70306613
  Matt Arsenault authored Jan 23, 2017
  
  llvm-svn: 292814
  70306613
Jan 19, 2017

[SelectionDAG] Improve knownbits handling of UMIN/UMAX (PR31293) · fb32eea1

Simon Pilgrim authored Jan 19, 2017

This patch improves the knownbits logic for unsigned integer min/max opcodes.

For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits.

This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set.

Differential Revision: https://reviews.llvm.org/D28853

llvm-svn: 292528

fb32eea1

[DAG] Don't increase SDNodeOrder for dbg.value/declare. · 2074e749

Mikael Holmen authored Jan 19, 2017

Summary:
The SDNodeOrder is saved in the IROrder field in the SDNode, and this
field may affects scheduling. Thus, letting dbg.value/declare increase
the order numbers may in turn affect scheduling.

Because of this change we also need to update the code deciding when
dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues.

Dbg values now have the same order as the SDNode they are connected to,
not the following orders.

Test cases provided by Florian Hahn.

Reviewers: bogner, aprantl, sunfish, atrick

Reviewed By: atrick

Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D25318

llvm-svn: 292485

2074e749

Jan 18, 2017
- DAG: Consider nnan in isKnownNeverNaN · f411071d
  Matt Arsenault authored Jan 18, 2017
  
  llvm-svn: 292328
  f411071d
Jan 17, 2017

Revert "[TLI] Robustize SDAG proto checking by merging it into TLI." · 9e5a085c
Ahmed Bougacha authored Jan 17, 2017
```
This reverts commit r292189, as it causes issues on SystemZ bots.

llvm-svn: 292191
```
9e5a085c

[TLI] Robustize SDAG proto checking by merging it into TLI. · c018efd6

Ahmed Bougacha authored Jan 17, 2017

SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.

Use TLI instead, removing another heap of redundant code.

This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to two tests:
- calling a non-variadic function via a variadic prototype isn't OK;
  it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter;  the SDAG code accepts any integer
  type, which meant using i32 on x86_64 worked.

I don't think it's worth supporting either of these (IMO) broken
testcases.  Instead, fix them to be more correct.

llvm-svn: 292189

c018efd6

Jan 16, 2017
- [SelectionDAG] Add knownbits support for BITREVERSE · 3e91519a
  Simon Pilgrim authored Jan 16, 2017
  
  llvm-svn: 292130
  3e91519a
- [SelectionDAG] Add support for BITREVERSE constant folding · db73dbcc
  Simon Pilgrim authored Jan 16, 2017
  
  We were relying on constant folding of the legalized instructions to do what constant folding we had previously llvm-svn: 292114
  db73dbcc
Jan 13, 2017
- Remove unused lambda captures. NFC · 17d266bc
  Malcolm Parsons authored Jan 13, 2017
  
  llvm-svn: 291916
  17d266bc