- May 12, 2017
-
-
Teresa Johnson authored
Summary: Don't use the metadata on call instructions for determining hotness unless we are in sample PGO mode, where it is needed because profile counts are not accurate. In instrumentation mode this is not necessary and does more harm than good when calls have VP metadata that hasn't been properly scaled after transformations or dropped after constant prop based devirtualization (both should be fixed, but we don't need to do this in the first place for instrumentation PGO). This required adjusting a number of tests to distinguish between sample and instrumentation PGO handling, and to add in profile summary metadata so that getProfileCount can get the summary. Reviewers: davidxl, danielcdh Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D32877 llvm-svn: 302844
-
Rafael Espindola authored
llvm-svn: 302843
-
Richard Smith authored
When we parse a redefinition of an entity for which we have a hidden existing declaration, make it visible in the current module instead of mapping the current source location to its containing module. llvm-svn: 302842
-
Eric Fiselier authored
llvm-svn: 302841
-
Adrian Prantl authored
The AST merges NamespaceDecls, but for module debug info it is important to put a namespace decl (or rather its children) into the correct (sub-)module, so we need to use the parent module of the decl that triggered this namespace to be serialized as a second key when looking up DINamespace nodes. rdar://problem/29339538 llvm-svn: 302840
-
Michael Kruse authored
As with the scalar operand of the initial StoreInst, also use input accesses when searching for new opportunities after mapping a PHI write. The same rational applies here: After LICM has been applied, the promoted value will either be an instruction in the same statement (in which case we fall back to try every scalar access of the statement), or in another statement such that there will be such an input access. In the latter case other scalars cannot have originated from the same register promotion, at least not by LICM. This mostly helps to decrease compilation time and makes debugging easier by not pursuing unpromising routes. In some circumstances, it may change the compiler's output. llvm-svn: 302839
-
Michael Kruse authored
Previous to this patch, we used VirtualUse to determine the input access of an llvm::Value in a statement. The input access is the READ MemoryAccess that makes a value available in that statement, which can either be a READ of a MemoryKind::Value or the MemoryKind::PHI for a PHINode in the statement. DeLICM uses the input access to heuristically find a candidate to map without searching all possible values. This might modify the behaviour in that previously PHI accesses were not considered input accesses before. This was unintentially lost when "VirtualUse" was extracted from the "Known Knowledge" patch. llvm-svn: 302838
-
Michael Kruse authored
llvm-svn: 302837
-
Michael Kruse authored
When removing a MemoryAccess, also remove it from maps pointing to it. This was already done for InstructionToAccess, but not yet for ValueReads, ValueWrites and PHIWrites as those were only used during the ScopBuilder phase. Keeping them updated allows us to use them later as well. llvm-svn: 302836
-
Reid Kleckner authored
Avoid using report_fatal_error, because it will ask the user to file a bug. If the user attempts to disable SSE on x86_64 and them use floating point, that's a bug in their code, not a bug in the compiler. This is just a start. There are other ways to crash the backend in this configuration, but they should be updated to follow this pattern. Differential Revision: https://reviews.llvm.org/D27522 llvm-svn: 302835
-
Guozhi Wei authored
[PPC] Change the register constraint of the first source operand of instruction mtvsrdd to g8rc_nox0 According to Power ISA V3.0 document, the first source operand of mtvsrdd is constant 0 if r0 is specified. So the corresponding register constraint should be g8rc_nox0. This bug caused wrong output generated by 401.bzip2 when -mcpu=power9 and fdo are specified. Differential Revision: https://reviews.llvm.org/D32880 llvm-svn: 302834
-
Sean Callanan authored
Templates can end in parameter packs, like this template <class T...> struct MyStruct { /*...*/ }; LLDB does not currently support these parameter packs; it does not emit them into the template argument list at all. This causes problems when you specialize, e.g.: template <> struct MyStruct<int> { /*...*/ }; template <> struct MyStruct<int, int> : MyStruct<int> { /*...*/ }; LLDB generates two template specializations, each with no template arguments, and then when they are imported by the ASTImporter into a parser's AST context we get a single specialization that inherits from itself, causing Clang's record layout mechanism to smash its stack. This patch fixes the problem for classes and adds tests. The tests for functions fail because Clang's ASTImporter can't import them at the moment, so I've xfailed that test. Differential Revision: https://reviews.llvm.org/D33025 llvm-svn: 302833
-
Rafael Espindola authored
llvm-svn: 302832
-
- May 11, 2017
-
-
Aditya Nandakumar authored
https://reviews.llvm.org/D33085 llvm-svn: 302831
-
Kostya Kortchinsky authored
Summary: The reasoning behind this change is twofold: - the current combined allocator (sanitizer_allocator_combined.h) implements features that are not relevant for Scudo, making some code redundant, and some restrictions not pertinent (alignments for example). This forced us to do some weird things between the frontend and our secondary to make things work; - we have enough information to be able to know if a chunk will be serviced by the Primary or Secondary, allowing us to avoid extraneous calls to functions such as `PointerIsMine` or `CanAllocate`. As a result, the new scudo-specific combined allocator is very straightforward, and allows us to remove some now unnecessary code both in the frontend and the secondary. Unused functions have been left in as unimplemented for now. It turns out to also be a sizeable performance gain (3% faster in some Android memory_replay benchmarks, doing some more on other platforms). Reviewers: alekseyshl, kcc, dvyukov Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33007 llvm-svn: 302830
-
Easwaran Raman authored
I ran the test-suite (including SPEC 2006) in PGO mode comparing cold thresholds of 225 and 45. Here are some stats on the text size: Out of 904 tests that ran, 197 see a change in text size. The average text size reduction (of all the 904 binaries) is 1.07%. Of the 197 binaries, 19 see a text size increase, as high as 18%, but most of them are small single source benchmarks. There are 3 multisource benchmarks with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On the other side of the spectrum, 31 benchmarks see >10% size reduction and 6 of them are MultiSource. I haven't run the test-suite with other values of inlinecold-threshold. Since we have a cold callsite threshold of 45, I picked this value. Differential revision: https://reviews.llvm.org/D33106 llvm-svn: 302829
-
Rafael Espindola authored
llvm-svn: 302828
-
Reid Kleckner authored
Use the same switch technique to eliminate virtual successor accessors from TerminatorInst. Extracted from D31261. NFC llvm-svn: 302827
-
Rafael Espindola authored
llvm-svn: 302826
-
Richard Smith authored
It's failing due to Hexagon calling convention lowering being broken (empty structs are not passed even if they have nontrivial destructors / copy ctors). llvm-svn: 302825
-
Martell Malone authored
Reviewers: EricWF Differential Revision: https://reviews.llvm.org/D33098 llvm-svn: 302824
-
Reid Kleckner authored
The erase/remove from parent methods now use a switch table to remove themselves from their appropriate parent ilist. The copyAttributesFrom method is now completely non-virtual, since we only ever copy attributes from a global of the appropriate type. Pre-requisite to de-virtualizing Value to save a vptr (https://reviews.llvm.org/D31261). NFC llvm-svn: 302823
-
Chad Rosier authored
Differential Revision: http://reviews.llvm.org/D33101. llvm-svn: 302822
-
Davide Italiano authored
llvm-svn: 302821
-
Vadzim Dambrouski authored
Updates the MSP430 target to generate EABI-compatible libcall names. As a byproduct, adjusts the hardware multiplier options available in the MSP430 target, adds support for promotion of the ISD::MUL operation for 8-bit integers, and correctly marks R11 as used by call instructions. Patch by Andrew Wygle. Differential Revision: https://reviews.llvm.org/D32676 llvm-svn: 302820
-
Davide Italiano authored
The testcase in PR32984 shows a non linear compile time increase after a change that made the LoopUnroll pass more aggressive (increasing the threshold). My profiling shows all the time of PHI elimination goes to llvm::LiveVariables::addNewBlock. This is because we keep Defs/Kills registers in a SmallSet and vfind(const T &V); is O(N). Switching to a DenseSet reduces the time spent in the pass from 297 seconds to 97 seconds. Profiling still shows a lot of time is spent iterating the data structure, so I guess there's room for improvement. Dan tells me GCC uses real set operations for live registers and it takes no-time on this testcase. Matthias points out we might want to switch all this to LiveIntervalAnalysis so it's not entirely sure if a rewrite is worth it. Differential Revision: https://reviews.llvm.org/D33088 llvm-svn: 302819
-
Richard Smith authored
llvm-svn: 302818
-
Richard Smith authored
in list-initialization, run cleanups for the default argument after each iteration of the initialization loop. We previously only ran the destructor for any temporary once, at the end of the complete loop, rather than once per iteration! Re-commit of r302750, reverted in r302776. llvm-svn: 302817
-
Craig Topper authored
llvm-svn: 302816
-
Craig Topper authored
llvm-svn: 302815
-
Matt Arsenault authored
We don't use it and it was removed in gfx9, and the encoding bit repurposed. Additionally actually using it requires changing the output register class, which wasn't done anyway. llvm-svn: 302814
-
Matt Arsenault authored
This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813
-
Stanislav Mekhanoshin authored
Earlier fix D32572 introduced a bug where live-ins were calculated for basic block instead of scheduling region. This change fixes it. Differential Revision: https://reviews.llvm.org/D33086 llvm-svn: 302812
-
Adam Nemet authored
The approach I followed was to emit the remark after getTreeCost concludes that SLP is profitable. I initially tried emitting them after the vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely remember missing a few cases for example in HorizontalReduction::tryToReduce. ORE is placed in BoUpSLP so that it's available from everywhere (notably HorizontalReduction::tryToReduce). We use the first instruction in the root bundle as the locator for the remark. In order to get a sense how far the tree is spanning I've include the size of the tree in the remark. This is not perfect of course but it gives you at least a rough idea about the tree. Then you can follow up with -view-slp-tree to really see the actual tree. llvm-svn: 302811
-
Nemanja Ivanovic authored
This patch is the first in a series of patches to provide code gen for doing compares in GPRs when the compare result is required in a GPR. It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64 extensions. This first patch handles equality comparison on i32 operands with the result sign or zero extended. Differential Revision: https://reviews.llvm.org/D31847 llvm-svn: 302810
-
Adrian Prantl authored
rdar://problem/27876262 llvm-svn: 302809
-
Simon Pilgrim authored
llvm-svn: 302808
-
Pierre Gousseau authored
[asan] Test 'strndup_oob_test.cc' added in r302781 fails on the clang-cmake-thumbv7-a15-full-sh bot. Marking as unsupported on armv7l-unknown-linux-gnueabihf, same as strdup_oob_test.cc llvm-svn: 302807
-
Hans Wennborg authored
llvm-svn: 302806
-
Michael Kruse authored
After DeLICM, it is possible to have two writes of the same value to the same location in the same statement when it determined that those writes do not conflict (write the same value). Teach -polly-simplify to remove one of the writes. It interferes with the pattern matching of matrix-multiplication kernels and also seem to not be optimized away by LLVM. The algorthm is simple, has O(n^2) behaviour (n = max number of MemoryAccesses in a statement) and only matches the most obvious cases, but seem to be enough to pattern-match Boost ublas gemm. Not handled cases include: - StoreInst instructions (a.k.a. explicit writes), since the value might be loaded or overwritten between the two stores. - PHINode, especially LCSSA, when the PHI value matches with on other's. - Partial writes (in preparation) llvm-svn: 302805
-