Commits · b884716f6a8dd32d10c6a9dee69364717045888b · Roger Ferrer / llvm-epi

Jan 27, 2017

CMake is funky on detecting Intel 17 as GCC compatible. · e1864d06

Yichao Yu authored Jan 26, 2017

Summary: This adds a fallback in case that the Intel compiler is failed to be detected correctly.

Reviewers: chapuni

Reviewed By: chapuni

Subscribers: llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D27610

llvm-svn: 293230

e1864d06

[ARM] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). · e6cf4374
Eugene Zelenko authored Jan 26, 2017
```
llvm-svn: 293229
```
e6cf4374

GlobalISel: support debug intrinsics. · 09aac4ad

Tim Northover authored Jan 26, 2017

The translation scheme is mostly cribbed from FastISel, and it's not entirely
convincing semantically. But it does seem to work in the common cases and allow
variables to be printed so it can't be all wrong.

llvm-svn: 293228

09aac4ad

Revert a couple of InstCombine/Guard checkins · 7516192a

Sanjoy Das authored Jan 26, 2017

This change reverts:

r293061: "[InstCombine] Canonicalize guards for NOT OR condition"
r293058: "[InstCombine] Canonicalize guards for AND condition"

They miscompile cases like:

```
declare void @llvm.experimental.guard(i1, ...)

define void @test_guard_not_or(i1 %A, i1 %B) {
  %C = or i1 %A, %B
  %D = xor i1 %C, true
  call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ]
  ret void
}
```

because they do transfer the `i32 20, i32 30` parameters to newly
created guard instructions.

llvm-svn: 293227

7516192a

Add intrinsics for constrained floating point operations · a0a1164c

Andrew Kaylor authored Jan 26, 2017

This commit introduces a set of experimental intrinsics intended to prevent
optimizations that make assumptions about the rounding mode and floating point
exception behavior.  These intrinsics will later be extended to specify
flush-to-zero behavior.  More work is also required to model instruction
dependencies in machine code and to generate these instructions from clang
(when required by pragmas and/or command line options that are not currently
supported).

Differential Revision: https://reviews.llvm.org/D27028

llvm-svn: 293226

a0a1164c

[PM] Enable the main loop pass pipelines with everything but · 79b733bc

Chandler Carruth authored Jan 26, 2017

loop-unswitch in the main pipelines for the new PM.

All of these now work, and Clang built using this pipeline can build the
test suite and SPEC without hitting any asserts of ASan failures.

There are still some bugs hiding though -- 7 tests regress with the new
PM. I'm going to be investigating these, but it seems worthwhile to at
least get the pipelines in place so that others can play with them, and
they aren't completely broken.

Differential Revision: https://reviews.llvm.org/D29113

llvm-svn: 293225

79b733bc

[obj2yaml] Produce correct output for invalid relocations. · 44f1281f

Davide Italiano authored Jan 26, 2017

R_X86_64_NONE can be emitted without a symbol associated (well,
in theory it should never be emitted in an ABI-compliant relocatable
object). So, if there's no symbol associated to a reloc, emit one
with an empty name, instead of crashing.

Ack'ed by Michael Spencer offline.

PR: 31768
llvm-svn: 293224

44f1281f

[Hexagon] Require IPO library in Hexagon build · d6c8e3c9
Krzysztof Parzyszek authored Jan 26, 2017
```
This should unbreak the Hexagon build bots.

llvm-svn: 293221
```
d6c8e3c9

Jan 26, 2017

NewGVN: Fix bug exposed by PR31761 · 1ea5f324

Daniel Berlin authored Jan 26, 2017

Summary:
This does not actually fix the testcase in PR31761 (discussion is
ongoing on the testcase), but does fix a bug it exposes, where stores
were not properly clobbering loads.

We accomplish this by unifying the memory equivalence infratructure
back into the normal congruence infrastructure, and then properly
destroying congruence classes when memory state leaders disappear.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29195

llvm-svn: 293216

1ea5f324

[InstCombine] fold (X >>u C) << C --> X & (-1 << C) · 50753f02

Sanjay Patel authored Jan 26, 2017

We already have this fold when the lshr has one use, but it doesn't need that
restriction. We may be able to remove some code from foldShiftedShift().

Also, move the similar:
(X << C) >>u C --> X & (-1 >>u C)
...directly into visitLShr to help clean up foldShiftByConstOfShiftByConst().

That whole function seems questionable since it is called by commonShiftTransforms(),
but there's really not much in common if we're checking the shift opcodes for every
fold.

llvm-svn: 293215

50753f02

[GlobalISel] Remove duplicate function using variadic templates. NFC. · b67a3cef

Ahmed Bougacha authored Jan 26, 2017

I think the initial version of r293172 was trying:
  std::forward<Args...>(args)...
which doesn't compile.  This seems like the correct way:
  std::forward<Args>(args)...

llvm-svn: 293214

b67a3cef

[Hexagon] Add Hexagon-specific loop idiom recognition pass · c8b94386
Krzysztof Parzyszek authored Jan 26, 2017
```
llvm-svn: 293213
```
c8b94386
NewGVN: Add algorithm overview · db3c7be0
Daniel Berlin authored Jan 26, 2017
```
llvm-svn: 293212
```
db3c7be0
[InstCombine] use m_APInt to allow (X << C) >>u C --> X & (-1 >>u C) with splat vectors · b0d96d32
Sanjay Patel authored Jan 26, 2017
```
llvm-svn: 293208
```
b0d96d32

[Doc][LangRef] Fix typo-ish error in description of Masked Gather · b26530cd

Zvi Rackover authored Jan 26, 2017

Summary: Fix the example of equivalent expansion for when mask is all ones.

Reviewers: delena

Reviewed By: delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29179

llvm-svn: 293206

b26530cd

[InstCombine] add tests for shift-shift folds; NFC · 0ca3f64c
Sanjay Patel authored Jan 26, 2017
```
llvm-svn: 293205
```
0ca3f64c

[AArch64] Refine Kryo Machine Model · b73d2962

Balaram Makam authored Jan 26, 2017

Summary: Refine floating point SQRT and DIV with accurate latency information.

Reviewers: mcrosier

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D29191

llvm-svn: 293204

b73d2962

[IfConversion] Use reverse_iterator to simplify. NFC · c4614b3e
Kyle Butt authored Jan 26, 2017
```
This simplifies skipping debug instructions and shrinking ranges.

llvm-svn: 293202
```
c4614b3e

[PPC] cleanup of mayLoad/mayStore flags and memory operands. · 3c8c385a

Sean Fertile authored Jan 26, 2017

1) Explicitly sets mayLoad/mayStore property in the tablegen files on load/store
   instructions.
2) Updated the flags on a number of intrinsics indicating that they write
    memory.
3) Added SDNPMemOperand flags for some target dependent SDNodes so that they
   propagate their memory operand

Review: https://reviews.llvm.org/D28818
llvm-svn: 293200

3c8c385a

NewGVN: Fix output of pr31578 testcase now that we mark unreachable blocks as unreachable · 66e3a3d0
Daniel Berlin authored Jan 26, 2017
```
llvm-svn: 293198
```
66e3a3d0
NewGVN: Make unreachable blocks be marked with unreachable · 2b83492e
Daniel Berlin authored Jan 26, 2017
```
llvm-svn: 293196
```
2b83492e

Replace addEarlyAsPossiblePasses callback with adjustPassManager · 81598117

Stanislav Mekhanoshin authored Jan 26, 2017

This change introduces adjustPassManager target callback giving a
target an opportunity to tweak PassManagerBuilder before pass
managers are populated.

This generalizes and replaces addEarlyAsPossiblePasses target
callback. In particular that can be used to add custom passes to
extension points other than EP_EarlyAsPossible.

Differential Revision: https://reviews.llvm.org/D28336

llvm-svn: 293189

81598117

Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." · d32a421f
Nirav Dave authored Jan 26, 2017
```
This reverts commit r293184 which is failing in LTO builds

llvm-svn: 293188
```
d32a421f

[XRay][Arm32] Reduce the portion of the stub and implement more staging for tail calls - in LLVM · e09ba748

Serge Rogatch authored Jan 26, 2017

Summary:
This patch provides more staging for tail calls in XRay Arm32 . When the logging part of XRay is ready for tail calls, its support in the core part of XRay Arm32 may be as easy as changing the number passed to the handler from 1 to 2.
Coupled patch:
- https://reviews.llvm.org/D28674

Reviewers: dberris, rengolin

Reviewed By: dberris

Subscribers: llvm-commits, iid_iunknown, aemerson, rengolin, dberris

Differential Revision: https://reviews.llvm.org/D28673

llvm-svn: 293185

e09ba748

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. · de6516c4

Nirav Dave authored Jan 26, 2017

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293184

de6516c4

Use shouldAssumeDSOLocal in classifyGlobalReference. · 82149a1a

Rafael Espindola authored Jan 26, 2017

And teach shouldAssumeDSOLocal that ppc has no copy relocations.

The resulting code handle a few more case than before. For example, it
knows that a weak symbol can be resolved to another .o file, but it
will still be in the main executable.

llvm-svn: 293180

82149a1a

[X86][SSE] Add support for combining ANDNP byte masks with target shuffles · 027bb453
Simon Pilgrim authored Jan 26, 2017
```
llvm-svn: 293178
```
027bb453

[SCEV] Introduce add operation inlining limit · b09dac59

Daniil Fukalov authored Jan 26, 2017

Inlining in getAddExpr() can cause abnormal computational time in some cases.
New parameter -scev-addops-inline-threshold is intruduced with default value 500.

Reviewers: sanjoy

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D28812

llvm-svn: 293176

b09dac59

[X86][SSE] Pull out target shuffle resolve code into helper. NFCI. · 3057fd53

Simon Pilgrim authored Jan 26, 2017

Pulled out code that removed unused inputs from a target shuffle mask into a helper function to allow it to be reused in a future commit.

llvm-svn: 293175

3057fd53

Remove a '#if 0' that wasn't intended for commit in r293173. · f69fe686

Daniel Sanders authored Jan 26, 2017

The '#if 0' contained the code I had intended to use but clang
rejects it (possibly incorrectly).

llvm-svn: 293174

f69fe686

Attempt to fix windows buildbots after r293172. · b2224311
Daniel Sanders authored Jan 26, 2017
```
llvm-svn: 293173
```
b2224311

[globalisel] Re-factor ISel matchers into a hierarchy. NFC · dc662ff0

Daniel Sanders authored Jan 26, 2017

Summary:
This should make it possible to easily add everything needed to import all
the existing SelectionDAG rules. It should also serve the likely
kinds of GlobalISel rules (some of which are not currently representable
in SelectionDAG) once we've nailed down the tablegen definition for that.

The hierarchy is as follows:
  MatcherRule - A matching rule. Currently used to emit C++ ISel code but will
  |             also be used to emit test cases and tablegen definitions in the
  |             near future.
  |- Instruction(s) - Represents the instruction to be matched.
     |- Instruction Predicate(s) - Test the opcode, arithmetic flags, etc. of an
     |                             instruction.
     \- Operand(s) - Represents a particular operand of the instruction. In the
        |            future, there may be subclasses to test the same predicates
        |            on multiple operands (including for variadic instructions).
        \ Operand Predicate(s) - Test the type, register bank, etc. of an operand.
                                 This is where the ComplexPattern equivalent
                                 will be represented. It's also
                                 nested-instruction matching will live as a
                                 predicate that follows the DefUse chain to the
                                 Def and tests a MatcherRule from that position.

Support for multiple instruction matchers in a rule has been retained from
the existing code but has been adjusted to assert when it is used.
Previously it would silently drop all but the first instruction matcher.

The tablegen-erated file is not functionally changed but has more
parentheses and no longer attempts to format the if-statements since
keeping track of the indentation is tricky in the presence of the matcher
hierarchy. It would be nice to have CMakes tablegen() run the output
through clang-format (when available) so we don't have to complicate
TableGen with pretty-printing.

It's also worth mentioning that this hierarchy will also be able to emit
TableGen definitions and test cases in the near future. This is the reason
for favouring explicit emit*() calls rather than the << operator.

Reviewers: aditya_nandakumar, rovka, t.p.northover, qcolombet, ab

Reviewed By: ab

Subscribers: igorb, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D28942

llvm-svn: 293172

dc662ff0

[AMDGPU] Fix typo in GCNSchedStrategy · 75d1de90
Valery Pykhtin authored Jan 26, 2017
```
Differential revision: https://reviews.llvm.org/D28980

llvm-svn: 293171
```
75d1de90
Revert "[mips] N64 static relocation model support" · 5b67a4f7
Simon Dardis authored Jan 26, 2017
```
This reverts commit r293164. There are multiple tests failing.

llvm-svn: 293170
```
5b67a4f7

[LV] Fix an issue where forming LCSSA in the place that we did would · 6f4ed077

Chandler Carruth authored Jan 26, 2017

change the set of uniform instructions in the loop causing an assert
failure.

The problem is that the legalization checking also builds data
structures mapping various facts about the loop body. The immediate
cause was the set of uniform instructions. If these then change when
LCSSA is formed, the data structures would already have been built and
become stale. The included test case triggered an assert in loop
vectorize that was reduced out of the new PM's pipeline.

The solution is to form LCSSA early enough that no information is cached
across the changes made. The only really obvious position is outside of
the main logic to vectorize the loop. This also has the advantage of
removing one case where forming LCSSA could mutate the loop but we
wouldn't track that as a "Changed" state.

If it is significantly advantageous to do some legalization checking
prior to this, we can do a more careful positioning but it seemed best
to just back off to a safe position first.

llvm-svn: 293168

6f4ed077

[mips] N64 static relocation model support · 09e65efd

Simon Dardis authored Jan 26, 2017

This patch makes one change to GOT handling and two changes to N64's
relocation model handling. Furthermore, the jumptable encodings have
been corrected for static N64.

Big GOT handling is now done via a new SDNode MipsGotHi - this node is
unconditionally lowered to an lui instruction.

The first change to N64's relocation handling is the lifting of the
restriction that N64 always uses PIC. Now it is possible to target static
environments.

The second change adds support for 64 bit symbols and enables them by
default. Previously N64 had patterns for sym32 mode only. In this mode all
symbols are assumed to have 32 bit addresses. sym32 mode support
is selectable with attribute 'sym32'. A follow on patch for clang will
add the necessary frontend parameter.

This partially resolves PR/23485.

Thanks to Brooks Davis for reporting the issue!

Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris

Differential Revision: https://reviews.llvm.org/D23652

llvm-svn: 293164

09e65efd

[ARM] GlobalISel: Load i1, i8 and i16 args from stack · 278c722e

Diana Picus authored Jan 26, 2017

Add support for loading i1, i8 and i16 arguments from the stack, with or without
the ABI extension flags.

When the ABI extension flags are present, we load a 4-byte value, otherwise we
preserve the size of the load and let the instruction selector replace it with a
LDRB/LDRH. This generates the same thing as DAGISel.

Differential Revision: https://reviews.llvm.org/D27803

llvm-svn: 293163

278c722e

[SLP] Add one more reduction operation for extra argument test to make · 7a7510ea
Alexey Bataev authored Jan 26, 2017
```
it vectorizable.

llvm-svn: 293162
```
7a7510ea

[PM] Use PoisoningVH correctly when merely deleting entries in a map · 41421df0

Chandler Carruth authored Jan 26, 2017

with it.

This code was dereferencing the PoisoningVH which isn't allowed once it
is poisoned. But the code itself really doesn't need to access the
pointer, it is just doing the safe stuff of clearing out data structures
keyed on the pointer value.

Change the code to use iterators to erase directly from a DenseMap. This
is also substantially more efficient as it avoids lots of hashing and
lookups to do the erasure. DenseMap supports iterating behind the
iteration which is fairly easy to implement.

Sadly, I don't have a test case here. I'm not even close and I don't
know that I ever will be. The issue is that several of the tricky
aspects of fixing this only show up when you cause the stack's
SmallVector to be in *EXACTLY* the right location. I only ever got
a reproduction for those with Clang, and only with *exactly* the right
command line flags. Any adjustment, even to seemingly unrelated flags,
would make partial and half-way solutions magically start to "work". In
good news, all of this was caught with the LLVM test suite. Also, there
is no *specific* code here that is untested, just that the old pattern
of code won't immediately fail on any test case I've managed to
contrive.

llvm-svn: 293160

41421df0

Chapter3/KaleidoscopeJIT.h: Fix a warning. [-Wunused-lambda-capture] · 949d54eb
NAKAMURA Takumi authored Jan 26, 2017
```
"this", aka class members, is not referred in the body.

llvm-svn: 293159
```
949d54eb