Commits · 2a6b7991d4038e1a07b1627bfa2dd350d365fa9d · Lorenzo Albano / LLVM bpEVL

May 12, 2017

Restrict call metadata based hotness detection to Sample PGO mode · 2a6b7991

Teresa Johnson authored May 11, 2017

Summary:
Don't use the metadata on call instructions for determining hotness
unless we are in sample PGO mode, where it is needed because profile
counts are not accurate. In instrumentation mode this is not necessary
and does more harm than good when calls have VP metadata that hasn't
been properly scaled after transformations or dropped after constant
prop based devirtualization (both should be fixed, but we don't need
to do this in the first place for instrumentation PGO).

This required adjusting a number of tests to distinguish between sample
and instrumentation PGO handling, and to add in profile summary metadata
so that getProfileCount can get the summary.

Reviewers: davidxl, danielcdh

Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D32877

llvm-svn: 302844

2a6b7991

Reduce templating. NFC. · 5ab19895
Rafael Espindola authored May 11, 2017
```
llvm-svn: 302843
```
5ab19895

Remove unnecessary mapping from SourceLocation to Module. · 858e0e0a

Richard Smith authored May 11, 2017

When we parse a redefinition of an entity for which we have a hidden existing
declaration, make it visible in the current module instead of mapping the
current source location to its containing module.

llvm-svn: 302842

858e0e0a

Fix XFAIL to reflect recent fixes in GCC · bb78837e
Eric Fiselier authored May 11, 2017
```
llvm-svn: 302841
```
bb78837e

Module Debug Info: Emit namespaced C++ forward decls in the correct module. · d8870558

Adrian Prantl authored May 11, 2017

The AST merges NamespaceDecls, but for module debug info it is
important to put a namespace decl (or rather its children) into the
correct (sub-)module, so we need to use the parent module of the decl
that triggered this namespace to be serialized as a second key when
looking up DINamespace nodes.

rdar://problem/29339538

llvm-svn: 302840

d8870558

[DeLICM] Use input access heuristic for mapped PHI WRITEs. · d644ec76

Michael Kruse authored May 11, 2017

As with the scalar operand of the initial StoreInst, also use input
accesses when searching for new opportunities after mapping a
PHI write.

The same rational applies here: After LICM has been applied, the
promoted value will either be an instruction in the same statement
(in which case we fall back to try every scalar access of the
statement), or in another statement such that there will be such
an input access. In the latter case other scalars cannot have
originated from the same register promotion, at least not by LICM.

This mostly helps to decrease compilation time and makes debugging
easier by not pursuing unpromising routes. In some circumstances,
it may change the compiler's output.

llvm-svn: 302839

d644ec76

[DeLICM] Lookup input accesses. · 4c276433

Michael Kruse authored May 11, 2017

Previous to this patch, we used VirtualUse to determine the input
access of an llvm::Value in a statement. The input access is the
READ MemoryAccess that makes a value available in that statement,
which can either be a READ of a MemoryKind::Value or the
MemoryKind::PHI for a PHINode in the statement. DeLICM uses the input
access to heuristically find a candidate to map without searching all
possible values.

This might modify the behaviour in that previously PHI accesses were
not considered input accesses before. This was unintentially lost when
"VirtualUse" was extracted from the "Known Knowledge" patch.

llvm-svn: 302838

4c276433

[VirtualInstruction] Do a lookup instead of a linear search. NFC. · bfaa1857
Michael Kruse authored May 11, 2017
```
llvm-svn: 302837
```
bfaa1857

[ScopInfo] Keep scalar acceess dictionaries up-to-data. NFC. · e60eca73

Michael Kruse authored May 11, 2017

When removing a MemoryAccess, also remove it from maps pointing to it.
This was already done for InstructionToAccess, but not yet for
ValueReads, ValueWrites and PHIWrites as those were only used during
the ScopBuilder phase. Keeping them updated allows us to use them
later as well.

llvm-svn: 302836

e60eca73

Issue diagnostics when returning FP values on x86_64 without SSE1/2 · 43bbeb4c

Reid Kleckner authored May 11, 2017

Avoid using report_fatal_error, because it will ask the user to file a
bug. If the user attempts to disable SSE on x86_64 and them use floating
point, that's a bug in their code, not a bug in the compiler.

This is just a start. There are other ways to crash the backend in this
configuration, but they should be updated to follow this pattern.

Differential Revision: https://reviews.llvm.org/D27522

llvm-svn: 302835

43bbeb4c

[PPC] Change the register constraint of the first source operand of... · 22e7da95

Guozhi Wei authored May 11, 2017

[PPC] Change the register constraint of the first source operand of instruction mtvsrdd to g8rc_nox0

According to Power ISA V3.0 document, the first source operand of mtvsrdd is constant 0 if r0 is specified. So the corresponding register constraint should be g8rc_nox0.

This bug caused wrong output generated by 401.bzip2 when -mcpu=power9 and fdo are specified.

Differential Revision: https://reviews.llvm.org/D32880

llvm-svn: 302834

22e7da95

[DWARF parser] Produce correct template parameter packs · 09e91ac6

Sean Callanan authored May 11, 2017

Templates can end in parameter packs, like this

template <class T...> struct MyStruct 
  { /*...*/ };

LLDB does not currently support these parameter packs; 
it does not emit them into the template argument list
at all. This causes problems when you specialize, e.g.:

template <> struct MyStruct<int> 
  { /*...*/ };
template <> struct MyStruct<int, int> : MyStruct<int> 
  { /*...*/ };

LLDB generates two template specializations, each with 
no template arguments, and then when they are imported 
by the ASTImporter into a parser's AST context we get a 
single specialization that inherits from itself, 
causing Clang's record layout mechanism to smash its
stack.

This patch fixes the problem for classes and adds
tests. The tests for functions fail because Clang's
ASTImporter can't import them at the moment, so I've
xfailed that test.

Differential Revision: https://reviews.llvm.org/D33025

llvm-svn: 302833

09e91ac6

Reduce template usage. NFC. · 895aea6d
Rafael Espindola authored May 11, 2017
```
llvm-svn: 302832
```
895aea6d

May 11, 2017

[GISel]: Remove unused lambda captures. NFC · fd484c44
Aditya Nandakumar authored May 11, 2017
```
https://reviews.llvm.org/D33085

llvm-svn: 302831
```
fd484c44

[scudo] Use our own combined allocator · 01a66fc9

Kostya Kortchinsky authored May 11, 2017

Summary:
The reasoning behind this change is twofold:
- the current combined allocator (sanitizer_allocator_combined.h) implements
  features that are not relevant for Scudo, making some code redundant, and
  some restrictions not pertinent (alignments for example). This forced us to
  do some weird things between the frontend and our secondary to make things
  work;
- we have enough information to be able to know if a chunk will be serviced by
  the Primary or Secondary, allowing us to avoid extraneous calls to functions
  such as `PointerIsMine` or `CanAllocate`.

As a result, the new scudo-specific combined allocator is very straightforward,
and allows us to remove some now unnecessary code both in the frontend and the
secondary. Unused functions have been left in as unimplemented for now.

It turns out to also be a sizeable performance gain (3% faster in some Android
memory_replay benchmarks, doing some more on other platforms).

Reviewers: alekseyshl, kcc, dvyukov

Reviewed By: alekseyshl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33007

llvm-svn: 302830

01a66fc9

Decrease inlinecold-threshold to 45 · c103ef89

Easwaran Raman authored May 11, 2017

I ran the test-suite (including SPEC 2006) in PGO mode comparing cold
thresholds of 225 and 45. Here are some stats on the text size:

Out of 904 tests that ran, 197 see a change in text size. The average
text size reduction (of all the 904 binaries) is 1.07%. Of the 197
binaries, 19 see a text size increase, as high as 18%, but most of them
are small single source benchmarks. There are 3 multisource benchmarks
with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On
the other side of the spectrum, 31 benchmarks see >10% size reduction
and 6 of them are MultiSource.

I haven't run the test-suite with other values of inlinecold-threshold.
Since we have a cold callsite threshold of 45, I picked this value.

Differential revision: https://reviews.llvm.org/D33106

llvm-svn: 302829

c103ef89

Reduce template usage. NFC. · b3aa2c9b
Rafael Espindola authored May 11, 2017
```
llvm-svn: 302828
```
b3aa2c9b

De-virtualize TerminatorInst successor accessors · 45a13e1b

Reid Kleckner authored May 11, 2017

Use the same switch technique to eliminate virtual successor accessors
from TerminatorInst. Extracted from D31261.

NFC

llvm-svn: 302827

45a13e1b

Reduce template usage. NFC. · 4b1c3696
Rafael Espindola authored May 11, 2017
```
llvm-svn: 302826
```
4b1c3696

XFAIL this test for Hexagon. · 74df0547

Richard Smith authored May 11, 2017

It's failing due to Hexagon calling convention lowering being broken (empty
structs are not passed even if they have nontrivial destructors / copy ctors).

llvm-svn: 302825

74df0547

[Libcxxabi]: Support using compiler-rt for MinGW64 · 53877bc5
Martell Malone authored May 11, 2017
```
Reviewers: EricWF

Differential Revision: https://reviews.llvm.org/D33098

llvm-svn: 302824
```
53877bc5

De-virtualize GlobalValue · e7c7854c

Reid Kleckner authored May 11, 2017

The erase/remove from parent methods now use a switch table to remove
themselves from their appropriate parent ilist.

The copyAttributesFrom method is now completely non-virtual, since we
only ever copy attributes from a global of the appropriate type.

Pre-requisite to de-virtualizing Value to save a vptr
(https://reviews.llvm.org/D31261).

NFC

llvm-svn: 302823

e7c7854c

[AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD. · aeffffdb
Chad Rosier authored May 11, 2017
```
Differential Revision: http://reviews.llvm.org/D33101.

llvm-svn: 302822
```
aeffffdb
[AMDGPU] Placate unused variable warning in release builds. · 0dcc015a
Davide Italiano authored May 11, 2017
```
llvm-svn: 302821
```
0dcc015a

[MSP430] Generate EABI-compliant libcalls · 38e30197

Vadzim Dambrouski authored May 11, 2017

Updates the MSP430 target to generate EABI-compatible libcall names.
As a byproduct, adjusts the hardware multiplier options available in
the MSP430 target, adds support for promotion of the ISD::MUL operation
for 8-bit integers, and correctly marks R11 as used by call instructions.

Patch by Andrew Wygle.

Differential Revision: https://reviews.llvm.org/D32676

llvm-svn: 302820

38e30197

[LiveVariables] Switch Kill/Defs sets to be DenseSet(s). · 36acbc71

Davide Italiano authored May 11, 2017

The testcase in PR32984 shows a non linear compile time increase
after a change that made the LoopUnroll pass more aggressive
(increasing the threshold).

My profiling shows all the time of PHI elimination goes to
llvm::LiveVariables::addNewBlock. This is because we keep
Defs/Kills registers in a SmallSet and vfind(const T &V); is O(N).

Switching to a DenseSet reduces the time spent in the pass from
297 seconds to 97 seconds. Profiling still shows a lot of time is
spent iterating the data structure, so I guess there's room for
improvement.

Dan tells me GCC uses real set operations for live registers and
it takes no-time on this testcase. Matthias points out we might
want to switch all this to LiveIntervalAnalysis so it's not entirely
sure if a rewrite is worth it.

Differential Revision:  https://reviews.llvm.org/D33088

llvm-svn: 302819

36acbc71

Work around different -std= default for PS4 target. · 2cbd1f6c
Richard Smith authored May 11, 2017
```
llvm-svn: 302818
```
2cbd1f6c

PR22877: When constructing an array via a constructor with a default argument · 72236372

Richard Smith authored May 11, 2017

in list-initialization, run cleanups for the default argument after each
iteration of the initialization loop.

We previously only ran the destructor for any temporary once, at the end of the
complete loop, rather than once per iteration!

Re-commit of r302750, reverted in r302776.

llvm-svn: 302817

72236372

[APInt] Remove an APInt copy from the return of APInt::multiplicativeInverse. · dbd6219f
Craig Topper authored May 11, 2017
```
llvm-svn: 302816
```
dbd6219f
[APInt] Fix typo in comment. NFC · 3fbecada
Craig Topper authored May 11, 2017
```
llvm-svn: 302815
```
3fbecada

AMDGPU: Remove tfe bit from flat instruction definitions · 47ccafe7

Matt Arsenault authored May 11, 2017

We don't use it and it was removed in gfx9, and the encoding
bit repurposed.

Additionally actually using it requires changing the output register
class, which wasn't done anyway.

llvm-svn: 302814

47ccafe7

AMDGPU: Pull fneg out of extract_vector_elt · bf5482e4

Matt Arsenault authored May 11, 2017

This allows folding source modifiers in more f16 cases.
Makes it easier to select per-component packed neg modifiers.

llvm-svn: 302813

bf5482e4

[AMDGPU] Fix incorrect register pressure calculation · 33a97ec4

Stanislav Mekhanoshin authored May 11, 2017

Earlier fix D32572 introduced a bug where live-ins were calculated
for basic block instead of scheduling region. This change fixes it.

Differential Revision: https://reviews.llvm.org/D33086

llvm-svn: 302812

33a97ec4

[SLP] Emit optimization remarks · 0aca09fc

Adam Nemet authored May 11, 2017

The approach I followed was to emit the remark after getTreeCost concludes
that SLP is profitable. I initially tried emitting them after the
vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely
remember missing a few cases for example in HorizontalReduction::tryToReduce.

ORE is placed in BoUpSLP so that it's available from everywhere (notably
HorizontalReduction::tryToReduce).

We use the first instruction in the root bundle as the locator for the remark.
In order to get a sense how far the tree is spanning I've include the size of
the tree in the remark. This is not perfect of course but it gives you at
least a rough idea about the tree. Then you can follow up with -view-slp-tree
to really see the actual tree.

llvm-svn: 302811

0aca09fc

[PowerPC] Eliminate integer compare instructions - vol. 1 · 96c3d626

Nemanja Ivanovic authored May 11, 2017

This patch is the first in a series of patches to provide code gen for
doing compares in GPRs when the compare result is required in a GPR.

It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64
extensions. This first patch handles equality comparison on i32 operands with
the result sign or zero extended.

Differential Revision: https://reviews.llvm.org/D31847

llvm-svn: 302810

96c3d626

Add a test that local submodule visibility has no effect on debug info · 40b201c3
Adrian Prantl authored May 11, 2017
```
rdar://problem/27876262

llvm-svn: 302809
```
40b201c3
[DAGCombine] Use SelectionDAG::getAnyExtOrTrunc helper. NFCI. · 6faddcbd
Simon Pilgrim authored May 11, 2017
```
llvm-svn: 302808
```
6faddcbd

[asan] Test 'strndup_oob_test.cc' added in r302781 fails on the... · 9ce59db4

Pierre Gousseau authored May 11, 2017

[asan] Test 'strndup_oob_test.cc' added in r302781 fails on the clang-cmake-thumbv7-a15-full-sh bot.
Marking as unsupported on armv7l-unknown-linux-gnueabihf, same as strdup_oob_test.cc

llvm-svn: 302807

9ce59db4

Fix -DLLVM_ENABLE_THREADS=OFF build after r302748 · 905da745
Hans Wennborg authored May 11, 2017
```
llvm-svn: 302806
```
905da745

[Simplify] Remove identical scalar writes. · 07e315e7

Michael Kruse authored May 11, 2017

After DeLICM, it is possible to have two writes of the same value to
the same location in the same statement when it determined that those
writes do not conflict (write the same value).

Teach -polly-simplify to remove one of the writes. It interferes with
the pattern matching of matrix-multiplication kernels and also seem
to not be optimized away by LLVM.

The algorthm is simple, has O(n^2) behaviour (n = max number of
MemoryAccesses in a statement) and only matches the most obvious cases,
but seem to be enough to pattern-match Boost ublas gemm.

Not handled cases include:
- StoreInst instructions (a.k.a. explicit writes), since the value might
  be loaded or overwritten between the two stores.
- PHINode, especially LCSSA, when the PHI value matches with on other's.
- Partial writes (in preparation)

llvm-svn: 302805

07e315e7