Commits · 93d5d3b5dbf9c01b60c41136482668e38e5a3806 · Roger Ferrer / llvm-epi

Sep 10, 2015

Add a way to skip the Go bindings tests even when Go is configured in · 93d5d3b5

Chandler Carruth authored Sep 10, 2015

CMake.

The Go bindings tests in an unoptimized build take over 30 seconds for
me, making it the slowest test in 'check-llvm' by a factor of two.

I've only rigged this up fully to the CMake build. If someone is
interested in rigging it up to the autoconf build, they're welcome to do
so.

llvm-svn: 247243

93d5d3b5

[ScalarEvolution] Fix PR24757. · f3132d3b

Sanjoy Das authored Sep 10, 2015

Summary:
PR24757 was caused by some incorect math in
`ScalarEvolution::HowFarToZero` -- the smallest unsigned solution for X
in

  2^N * A = 2^N * X

is not necessarily A.

Reviewers: atrick, majnemer, meheff

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D12721

llvm-svn: 247242

f3132d3b

[LPM] Simplify this code and fix a compile error for compilers that · 87275186

Chandler Carruth authored Sep 10, 2015

don't correctly implement the scoping rules of C++11 range based for
loops. This kind of aliasing isn't a good idea anyways (and wasn't
really intended).

llvm-svn: 247241

87275186

[LPM] Use a map from analysis ID to immutable passes in the legacy pass · b1e3a9ae

Chandler Carruth authored Sep 10, 2015

manager to avoid a slow linear scan of every immutable pass and on every
attempt to find an analysis pass.

This speeds up 'check-llvm' on an unoptimized build for me by 15%, YMMV.
It should also help (a tiny bit) other folks that are really
bottlenecked on repeated runs of tiny pass pipelines across small IR
files.

llvm-svn: 247240

b1e3a9ae

Enable the shrink wrapping optimization for PPC64. · d3b904d4

Kit Barton authored Sep 10, 2015

The changes in this patch are as follows:
1. Modify the emitPrologue and emitEpilogue methods to work properly when the prologue and epilogue blocks are not the first/last blocks in the function
2. Fix a bug in PPCEarlyReturn optimization caused by an empty entry block in the function
3. Override the runShrinkWrap PredicateFtor (defined in TargetMachine) to check whether shrink wrapping should run:
Shrink wrapping will run on PPC64 (Little Endian and Big Endian) unless -enable-shrink-wrap=false is specified on command line

A new test case, ppc-shrink-wrapping.ll was created based on the existing shrink wrapping tests for x86, arm, and arm64.

Phabricator review: http://reviews.llvm.org/D11817

llvm-svn: 247237

d3b904d4

[AArch64] Match FI+offset in STNP addressing mode. · 05541459

Ahmed Bougacha authored Sep 10, 2015

First, we need to teach isFrameOffsetLegal about STNP.
It already knew about the STP/LDP variants, but those were probably
never exercised, because it's only the load/store optimizer that
generates STP/LDP, and the only user of the method is frame lowering,
which runs earlier.
The STP/LDP cases were wrong: they didn't take into account the fact
that they return two results, not one, so the immediate offset will be
the 4th operand, not the 3rd.

Follow-up to r247234.

llvm-svn: 247236

05541459

[MC] Convert all the remaining tests from macho-dump to llvm-readobj. · ddedd725

Davide Italiano authored Sep 10, 2015

This sort-of deprecates macho-dump. It may take still a little while
to garbage collect it, but at least there's no real usage of it in
the tree anymore. New tests should always rely on llvm-readobj or
llvm-objdump.

llvm-svn: 247235

ddedd725

[AArch64] Match base+offset in STNP addressing mode. · c0ac38d5
Ahmed Bougacha authored Sep 10, 2015
```
Followup to r247231.

llvm-svn: 247234
```
c0ac38d5

Makes EmitRecord() accepting ArrayRef and raw array (NFC) · 8d461164

Mehdi Amini authored Sep 10, 2015

After r247186, a vector is no longer needed as the push_front for
the code is removed.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247232

8d461164

[AArch64] Support selecting STNP. · b8886b51

Ahmed Bougacha authored Sep 10, 2015

We could go through the load/store optimizer and match STNP where
we would have matched a nontemporal-annotated STP, but that's not
reliable enough, as an opportunistic optimization.
Insetad, we can guarantee emitting STNP, by matching them at ISel.
Since there are no single-input nontemporal stores, we have to
resort to some high-bits-extracting trickery to generate an STNP
from a plain store.

Also, we need to support another, LDP/STP-specific addressing mode,
base + signed scaled 7-bit immediate offset.
For now, only match the base. Let's make it smart separately.

Part of PR24086.

llvm-svn: 247231

b8886b51

AMDGPU/SI: Fix more cases of losing exec operands · 80f766a0
Matt Arsenault authored Sep 10, 2015
```
llvm-svn: 247230
```
80f766a0

AMDGPU/SI: Fix creating v_mov_b32s without exec uses · ad46e0c1

Matt Arsenault authored Sep 10, 2015

This will be caught by existing tests with a
verifier check to be added in a future commit.

llvm-svn: 247229

ad46e0c1

Revert r247216: "Fix Clang-tidy misc-use-override warnings, other minor fixes" · d2799a96
Hans Wennborg authored Sep 10, 2015
```
This caused build breakges, e.g.
http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu-gdb-75/builds/24926

llvm-svn: 247226
```
d2799a96
[CodeGen] Make x86 nontemporal store patfrags generic. NFC. · 37bffd83
Ahmed Bougacha authored Sep 10, 2015
```
To be used by other targets.

llvm-svn: 247225
```
37bffd83
[RewriteStatepointsForGC] Minor refactor to use shared implementation [NFC] · 953817b6
Philip Reames authored Sep 10, 2015
```
llvm-svn: 247223
```
953817b6

[RewriteStatepointsForGC] Strengthen a confusingly weak assertion [NFC] · b4e55f39

Philip Reames authored Sep 10, 2015

The assertion was weaker than it should be and gave the impression we're growing the number of base defining values being considered during the fixed point interation.  That's not true.  The tighter form of the assert is useful documentation.

llvm-svn: 247221

b4e55f39

[RewriteStatepointsForGC] One last bit of naming [NFCI] · c8ded462
Philip Reames authored Sep 10, 2015
```
llvm-svn: 247220
```
c8ded462

[WinEH] Add codegen support for cleanuppad and cleanupret · 78783912

Reid Kleckner authored Sep 10, 2015

All of the complexity is in cleanupret, and it mostly follows the same
codepaths as catchret, except it doesn't take a return value in RAX.

This small example now compiles and executes successfully on win32:
  extern "C" int printf(const char *, ...) noexcept;
  struct Dtor {
    ~Dtor() { printf("~Dtor\n"); }
  };
  void has_cleanup() {
    Dtor o;
    throw 42;
  }
  int main() {
    try {
      has_cleanup();
    } catch (int) {
      printf("caught it\n");
    }
  }

Don't try to put the cleanup in the same function as the catch, or Bad
Things will happen.

llvm-svn: 247219

78783912

[RewriteStatepointsForGC] Further style/naming fixup [NFCI] · 34d7a749
Philip Reames authored Sep 10, 2015
```
llvm-svn: 247217
```
34d7a749
Fix Clang-tidy misc-use-override warnings, other minor fixes · 6fa09455
Hans Wennborg authored Sep 10, 2015
```
Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D12740

llvm-svn: 247216
```
6fa09455

Bitcode Writer: EmitRecordWith* takes an ArrayRef instead of a SmallVector (NFC) · c7aa5ca8

Mehdi Amini authored Sep 10, 2015

This reapply commit r247178 after post-commit review from D.Blaikie
in a way that makes it compatible with the existing API.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247215

c7aa5ca8

Add makeArrayRef() overload for ArrayRef input (no-op/identity) NFC · defa5465

Mehdi Amini authored Sep 10, 2015

The purpose is to allow templated wrapper to work with either
ArrayRef or any convertible operation:

template<typename Container>
void wrapper(const Container &Arr) {
  impl(makeArrayRef(Arr));
}

with Container being a std::vector, a SmallVector, or an ArrayRef.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247214

defa5465

[RewriteStatepointsForGC] More naming cleanup [NFCI] · 7540e3a4
Philip Reames authored Sep 10, 2015
```
llvm-svn: 247213
```
7540e3a4

[RewriteStatepointsForGC] Code cleanup [NFC] · ece70b80

Philip Reames authored Sep 09, 2015

Factor out common code related to naming values, fix a small style issue.  More to follow in separate changes.

llvm-svn: 247211

ece70b80

[RewriteStatepointsForGC] Extend base pointer inference to handle insertelement · 6628713f

Philip Reames authored Sep 09, 2015

This change is simply enhancing the existing inference algorithm to handle insertelement instructions by conservatively inserting a new instruction to propagate the vector of associated base pointers. In the process, I'm ripping out the peephole optimizations which mostly helped cover the fact this hadn't been done.

Note that most of the newly inserted nodes will be nearly immediately removed by the post insertion optimization pass introduced in 246718. Arguably, we should be trying harder to avoid the malloc traffic here, but I'd rather get the code correct, then worry about compile time.

Unlike previous extensions of the algorithm to handle more case, I discovered the existing code was causing miscompiles in some cases. In particular, we had an implicit assumption that the peephole covered *all* insert element instructions, so if we had a value directly based on a insert element the peephole didn't cover, we proceeded as if it were a base anyways. Not good. I believe we had the same issue with shufflevector which is why I adjusted the predicate for them as well.

Differential Revision: http://reviews.llvm.org/D12583

llvm-svn: 247210

6628713f

[RewriteStatepointsForGC] Make base pointer inference deterministic · 15d5563c

Philip Reames authored Sep 09, 2015

Previously, the base pointer algorithm wasn't deterministic. The core fixed point was (of course), but we were inserting new nodes and optimizing them in an order which was unspecified and variable. We'd somewhat hacked around this for testing by sorting by value name, but that doesn't solve the general determinism problem.

Instead, we can use the order of traversal over the def/use graph to give us a single consistent ordering. Today, this is a DFS order, but the exact order doesn't mater provided it's deterministic for a given input.

(Q: It is safe to rely on a deterministic order of operands right?)

Note that this only fixes the determinism within a single inference step. The inference step is currently invoked many times in a non-deterministic order. That's a future change in the sequence. :)

Differential Revision: http://reviews.llvm.org/D12640

llvm-svn: 247208

15d5563c

LowerBitSets: Fix non-determinism bug. · 1cbc91ec

Peter Collingbourne authored Sep 09, 2015

Visit disjoint sets in a deterministic order based on the maximum BitSetNM
index, otherwise the order in which we visit them will depend on pointer
comparisons. This was being exposed by MSan.

llvm-svn: 247201

1cbc91ec

Sep 09, 2015

[SEH] Emit 32-bit SEH tables for the new EH IR · 94b704c4

Reid Kleckner authored Sep 09, 2015

The 32-bit tables don't actually contain PC range data, so emitting them
is incredibly simple.

The 64-bit tables, on the other hand, use the same table for state
numbering as well as label ranges. This makes things more difficult, so
it will be implemented later.

llvm-svn: 247192

94b704c4

[WebAssembly] Update target datalayout strings. · 5e066842
Dan Gohman authored Sep 09, 2015
```
llvm-svn: 247187
```
5e066842

Change EmitRecordWithAbbrevImpl to take Optional record code. NFC. · 0f251a1c

Teresa Johnson authored Sep 09, 2015

This change enables EmitRecord to pass the supplied record Code to
EmitRecordWithAbbrevImpl, rather than insert it into the Vals array.
It is an enabler for changing EmitRecord to take an ArrayRef<uintty> instead
of a SmallVectorImpl<uintty>&

Patch suggested by Duncan P. N. Exon Smith, modified by myself a bit to get
correct assertion checking.

llvm-svn: 247186

0f251a1c

ScalarEvolution assume hanging bugfix · 0dde00d2
Piotr Padlewski authored Sep 09, 2015
```
http://reviews.llvm.org/D12719

llvm-svn: 247184
```
0dde00d2
Revert "Bitcode Writer: EmitRecordWith* takes an ArrayRef instead of a SmallVector (NFC)" · c9a85abc
Mehdi Amini authored Sep 09, 2015
```
This reverts commit r247178.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247182
```
c9a85abc
Revert trunc(lshr (sext A), Cst) to ashr A, Cst · d34dbf07
David Majnemer authored Sep 09, 2015
```
This reverts commit r246997, it introduced a regression (PR24763).

llvm-svn: 247180
```
d34dbf07
Bitcode Writer: EmitRecordWith* takes an ArrayRef instead of a SmallVector (NFC) · 7d2bf53e
Mehdi Amini authored Sep 09, 2015
```
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247178
```
7d2bf53e

Revert "AVX512: Implemented encoding and intrinsics for vextracti64x4... · db7ea86b

Renato Golin authored Sep 09, 2015

Revert "AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding."

This reverts commit r247149, as it was breaking numerous buildbots of varied architectures.

llvm-svn: 247177

db7ea86b

allow unpredictable metadata on switch statements · 66dcafc3
Sanjay Patel authored Sep 09, 2015
```
llvm-svn: 247174
```
66dcafc3

Save LaneMask with livein registers · d9da1627

Matthias Braun authored Sep 09, 2015

With subregister liveness enabled we can detect the case where only
parts of a register are live in, this is expressed as a 32bit lanemask.
The current code only keeps registers in the live-in list and therefore
enumerated all subregisters affected by the lanemask. This turned out to
be too conservative as the subregister may also cover additional parts
of the lanemask which are not live. Expressing a given lanemask by
enumerating a minimum set of subregisters is computationally expensive
so the best solution is to simply change the live-in list to store the
lanemasks as well. This will reduce memory usage for targets using
subregister liveness and slightly increase it for other targets

Differential Revision: http://reviews.llvm.org/D12442

llvm-svn: 247171

d9da1627

VirtRegMap: Improve addMBBLiveIns() using SlotIndex::MBBIndexIterator; NFC · cc580058

Matthias Braun authored Sep 09, 2015

Now that we have an explicit iterator over the idx2MBBMap in SlotIndices
we can use the fact that segments and the idx2MBBMap is sorted by
SlotIndex position so can advance both simultaneously instead of
starting from the beginning for each segment.

This complicates the code for the subregister case somewhat but should
be more efficient and has the advantage that we get the final lanemask
for each block immediately which will be important for a subsequent
change.

Removes the now unused SlotIndexes::findMBBLiveIns function.

Differential Revision: http://reviews.llvm.org/D12443

llvm-svn: 247170

cc580058

[PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible · 7b560d40

Chandler Carruth authored Sep 09, 2015

with the new pass manager, and no longer relying on analysis groups.

This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:

- FunctionAAResults is a type-erasing alias analysis results aggregation
  interface to walk a single query across a range of results from
  different alias analyses. Currently this is function-specific as we
  always assume that aliasing queries are *within* a function.

- AAResultBase is a CRTP utility providing stub implementations of
  various parts of the alias analysis result concept, notably in several
  cases in terms of other more general parts of the interface. This can
  be used to implement only a narrow part of the interface rather than
  the entire interface. This isn't really ideal, this logic should be
  hoisted into FunctionAAResults as currently it will cause
  a significant amount of redundant work, but it faithfully models the
  behavior of the prior infrastructure.

- All the alias analysis passes are ported to be wrapper passes for the
  legacy PM and new-style analysis passes for the new PM with a shared
  result object. In some cases (most notably CFL), this is an extremely
  naive approach that we should revisit when we can specialize for the
  new pass manager.

- BasicAA has been restructured to reflect that it is much more
  fundamentally a function analysis because it uses dominator trees and
  loop info that need to be constructed for each function.

All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.

The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.

This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.

Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.

One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.

Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.

Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.

Differential Revision: http://reviews.llvm.org/D12080

llvm-svn: 247167

7b560d40

MachineVerifier: Check that SlotIndex MBBIndexList is sorted. · 80595460

Matthias Braun authored Sep 09, 2015

This introduces a check that the MBBIndexList is sorted as proposed in
http://reviews.llvm.org/D12443 but split up into a separate commit.

llvm-svn: 247166

80595460