- Nov 16, 2018
-
-
Adrian Prantl authored
This fixes PR39669. https://bugs.llvm.org/show_bug.cgi?id=39669 llvm-svn: 347065
-
Eugene Leviant authored
An attempt to recommit r346584 after failure on OSX build bot. Fixed cache key computation in ThinLTOCodeGenerator and added test case llvm-svn: 347033
-
- Nov 15, 2018
-
-
Xin Tong authored
Summary: Load sample profile in LTO link step. ThinLTO calls populateModulePassManager to load the profile Reviewers: tejohnson, davidxl, danielcdh Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54564 llvm-svn: 346971
-
- Nov 13, 2018
-
-
Steven Wu authored
This reverts commit 10c84a8f35cae4a9fc421648d9608fccda3925f2. llvm-svn: 346768
-
- Nov 11, 2018
-
-
Florian Hahn authored
Reviewers: kuhar, chandlerc, NutshellySima, brzycki Reviewed By: NutshellySima, brzycki Differential Revision: https://reviews.llvm.org/D54317 llvm-svn: 346618
-
- Nov 10, 2018
-
-
Eugene Leviant authored
This patch allows internalising globals if all accesses to them (from live functions) are from non-volatile load instructions Differential revision: https://reviews.llvm.org/D49362 llvm-svn: 346584
-
- Nov 09, 2018
-
-
Florian Hahn authored
After D45330, Dominators are required for IPSCCP and can be preserved. This patch preserves DominatorTreeAnalysis in the new pass manager. AFAIK the legacy pass manager cannot preserve function analysis required by a module analysis. Reviewers: davide, dberlin, chandlerc, efriedma, kuhar, NutshellySima Reviewed By: chandlerc, kuhar, NutshellySima Differential Revision: https://reviews.llvm.org/D47259 llvm-svn: 346486
-
- Nov 08, 2018
-
-
Pirama Arumuga Nainar authored
Summary: This fixes PR 37422 In ELF, non-weak symbols can also be non-prevailing. In this particular PR, the __llvm_profile_* symbols are non-prevailing but weren't getting dropped - causing multiply-defined errors with lld. Also add a test, strong_non_prevailing.ll, to ensure that multiple copies of a strong symbol are dropped. To fix the test regressions exposed by this fix, - do not mark prevailing copies for symbols with 'appending' linkage. There's no one prevailing copy for such symbols. - fix the prevailing version in dead-strip-fulllto.ll - explicitly pass exported symbols to llvm-lto in fumcimport.ll and funcimport_var.ll Reviewers: tejohnson, pcc Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D54125 llvm-svn: 346436
-
whitequark authored
Summary: MergeFunctions currently tries to process strong functions before weak functions, because weak functions can simply call strong functions, while a strong/weak function cannot call a weak function (a backing strong function is needed). This patch additionally tries to process external functions before local functions, because we definitely have to keep the external function, but may be able to drop the local one (and definitely can if it is also unnamed_addr). Unfortunately, this exposes an existing bug in the implementation: The FnTree and FNodesInTree structures can currently go out of sync in the case where two weak functions are merged, because the function in FnTree/FNodesInTree is RAUWed. This leaves it behind in FnTree (this is intended, as it is the strong backing function which should be used for further merges), while it is replaced in FNodesInTree (this is not intended). This is fixed by switching FNodesInTree from using a ValueMap to using a DenseMap of AssertingVH. This exposes another minor issue: Currently FNodesInTree is not cleared after MergeFunctions finishes running. Currently, this is potentially dangerous (e.g. if something else wants to RAUW a function with a non-function), but at the very least it is unnecessary/inefficient. After the change to use AssertingVH it becomes more problematic, because there are certainly passes that remove functions. This issue is fixed by clearing FNodesInTree at the end of the pass. Reviewers: jfb, whitequark Reviewed By: whitequark Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D53271 llvm-svn: 346386
-
whitequark authored
Summary: For unnamed_addr functions we RAUW instead of only replacing direct callers. However, functions in which replacements were performed currently are not added back to the worklist, resulting in missed merging opportunities. Fix this by calling removeUsers() prior to RAUW. Reviewers: jfb, whitequark Reviewed By: whitequark Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D53262 llvm-svn: 346385
-
- Nov 06, 2018
-
-
Teresa Johnson authored
Summary: The NotEligibleToImport flag on the GlobalValueSummary was set if it isn't legal to import (e.g. because it references unpromotable locals) and when it can't be inlined (in which case importing is pointless). I split out the inlinable piece into a separate flag on the FunctionSummary (doesn't make sense for aliases or global variables), because in the future we may want to import for reasons other than inlining. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D53345 llvm-svn: 346261
-
- Nov 05, 2018
-
-
Vedant Kumar authored
Using TargetTransformInfo allows the splitting pass to factor in the code size cost of instructions as it decides whether or not outlining is profitable. This did not regress the overall amount of outlining seen on the handful of internal frameworks I tested. Thanks to Jun Bum Lim for suggesting this! Differential Revision: https://reviews.llvm.org/D53835 llvm-svn: 346108
-
- Oct 31, 2018
-
-
Matthias Braun authored
This is modeled after C++17 std::empty(). Differential Revision: https://reviews.llvm.org/D53909 llvm-svn: 345679
-
- Oct 29, 2018
-
-
Vedant Kumar authored
It can be profitable to outline single-block cold regions because they may be large. Allow outlining single-block regions if they have over some threshold of non-debug, non-terminator instructions. I chose 3 as the threshold after experimenting with several internal frameworks. In practice, reducing the threshold further did not give much improvement, whereas increasing it resulted in substantial regressions. Differential Revision: https://reviews.llvm.org/D53824 llvm-svn: 345524
-
- Oct 25, 2018
-
-
Vedant Kumar authored
The current splitting algorithm works in three stages: 1) Identify cold blocks, then 2) Use forward/backward propagation to mark hot blocks, then 3) Grow a SESE region of blocks *outside* of the set of hot blocks and start outlining. While testing this pass on Apple internal frameworks I noticed that some kinds of control flow (e.g. loops) are never outlined, even though they unconditionally lead to / follow cold blocks. I noticed two other issues related to how cold regions are identified: - An inconsistency can arise in the internal state of the hotness propagation stage, as a block may end up in both the ColdBlocks set and the HotBlocks set. Further inconsistencies can arise as these sets do not match what's in ProfileSummaryInfo. - It isn't necessary to limit outlining to single-exit regions. This patch teaches the splitting algorithm to identify maximal cold regions and outline them. A maximal cold region is defined as the set of blocks post-dominated by a cold sink block, or dominated by that sink block. This approach can successfully outline loops in the cold path. As a side benefit, it maintains less internal state than the current approach. Due to a limitation in CodeExtractor, blocks within the maximal cold region which aren't dominated by a single entry point (a so-called "max ancestor") are filtered out. Results: - X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to 47KB pre-patch, or a ~3x improvement. Did not see a performance impact across two runs. - AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB of TEXT were outlined. Ditto re: performance impact. - Outlining results improve marginally in the internal frameworks I tested. Follow-ups: - Outline more than once per function, outline large single basic blocks, & try to remove unconditional branches in outlined functions. Differential Revision: https://reviews.llvm.org/D53627 llvm-svn: 345209
-
- Oct 24, 2018
-
-
Teresa Johnson authored
Summary: The current default of appending "_"+entry block label to the new extracted cold function breaks demangling. Change the deliminator from "_" to "." to enable demangling. Because the header block label will be empty for release compile code, use "extracted" after the "." when the label is empty. Additionally, add a mechanism for the client to pass in an alternate suffix applied after the ".", and have the hot cold split pass use "cold."+Count, where the Count is currently 1 but can be used to uniquely number multiple cold functions split out from the same function with D53588. Reviewers: sebpop, hiraditya Subscribers: llvm-commits, erik.pilkington Differential Revision: https://reviews.llvm.org/D53534 llvm-svn: 345178
-
Wei Mi authored
in the same round of SCC update. In https://reviews.llvm.org/rL309784, inline history is added to prevent infinite inlining across multiple run of inliner and SCC update, but the history will only be kept when new SCC is actually generated during SCC update. We found a case that SCC can be split and then merge into itself in the same round of SCC update, so the same SCC will be pop out from UR.CWorklist and then added back immediately, without any new SCC generated, that is why the existing patch cannot catch the infinite inline case. What the patch does is even if no new SCC is generated, if only the current SCC appears in UR.CWorklist again, then keep the inline history. Differential Revision: https://reviews.llvm.org/D52915 llvm-svn: 345103
-
- Oct 23, 2018
-
-
Vedant Kumar authored
Outlined code is cold by assumption, so it makes sense to optimize it for minimal code size rather than performance. After r344869 moved the splitting pass to the end of the IR pipeline, this does not result in much of a code size reduction. This is probably because a comparatively small number backend transforms make use of the MinSize hint. Running LNT on x86_64, I see that 33/1020 binaries shrink for a total of 919 bytes of TEXT reduction. I didn't measure a significant performance impact. Differential Revision: https://reviews.llvm.org/D53518 llvm-svn: 345072
-
Jordan Rupprecht authored
Summary: TryToShrinkGlobalToBoolean, when possible, will split store <value> + load <value> into store <bool> + select <bool ? value : 0>. This preserves DebugLoc during that pass. Fixes PR37959. The test case here is the simplified .ll for: ``` static int foo; int bar() { foo = 5; return foo; } ``` Reviewers: dblaikie, gbedwell, aprantl Reviewed By: dblaikie Subscribers: mehdi_amini, JDevlieghere, dexonsmith, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D53531 llvm-svn: 345046
-
- Oct 22, 2018
-
-
Teresa Johnson authored
Summary: Emit optimization remark on successful hot cold split. Reviewers: sebpop, hiraditya Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53512 llvm-svn: 344938
-
- Oct 21, 2018
-
-
Aditya Kumar authored
Summary: In the new+old pass manager, hot cold splitting was schedule too early. Thanks to Vedant for pointing this out. Reviewers: sebpop, vsk Reviewed By: sebpop, vsk Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D53437 llvm-svn: 344869
-
- Oct 18, 2018
-
-
Chandler Carruth authored
llvm-svn: 344714
-
- Oct 17, 2018
-
-
Teresa Johnson authored
Summary: Previously we could only get the number of imported functions and variables from the backend. This adds stats to the thin link where the importing is decided. Reviewers: wmi Subscribers: inglorion, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D53337 llvm-svn: 344658
-
- Oct 16, 2018
-
-
David Bolvansky authored
Reviewers: efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53338 llvm-svn: 344645
-
Lang Hames authored
r344558 added an assignment to a TerminatorInst* from BasicBlock::getTerminatorInst(), but BasicBlock::getTerminatorInst() returns an Instruction* rather than a TerminatorInst* since r344504 so this fails to compile. Changing the variable to an Instruction* should get the bots building again. llvm-svn: 344566
-
- Oct 15, 2018
-
-
Sebastian Pop authored
Make the code of blockEndsInUnreachable to match the function blockEndsInUnreachable in CodeGen/BranchFolding.cpp. I also have added a note to make sure the code of this function will not be modified unless the back-end version is also modified. An early return before outlining has been added to avoid outlining the full function body when the first block in the function is marked cold. The static analysis of cold code has been amended to avoid marking the whole function as cold by back-propagation because the back-propagation would mark blocks with return statements as cold. The patch adds debug statements to help discover these problems. Differential Revision: https://reviews.llvm.org/D52904 llvm-svn: 344558
-
Chandler Carruth authored
by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502
-
- Oct 13, 2018
-
-
David Bolvansky authored
Summary: Fixes PR39177 Reviewers: spatel, jbuening Reviewed By: jbuening Subscribers: jbuening, llvm-commits Differential Revision: https://reviews.llvm.org/D53129 llvm-svn: 344454
-
- Oct 12, 2018
-
-
Eugene Leviant authored
Differential revision: https://reviews.llvm.org/D53139 llvm-svn: 344325
-
- Oct 11, 2018
-
-
Richard Smith authored
This can be used to preserve profiling information across codebase changes that have widespread impact on mangled names, but across which most profiling data should still be usable. For example, when switching from libstdc++ to libc++, or from the old libstdc++ ABI to the new ABI, or even from a 32-bit to a 64-bit build. The user can provide a remapping file specifying parts of mangled names that should be treated as equivalent (eg, std::__1 should be treated as equivalent to std::__cxx11), and profile data will be treated as applying to a particular function if its name is equivalent to the name of a function in the profile data under the provided equivalences. See the documentation change for a description of how this is configured. Remapping is supported for both sample-based profiling and instruction profiling. We do not support remapping indirect branch target information, but all other profile data should be remapped appropriately. Support is only added for the new pass manager. If someone wants to also add support for this for the old pass manager, doing so should be straightforward. This is the LLVM side of Clang r344199. Reviewers: davidxl, tejohnson, dlj, erik.pilkington Subscribers: mehdi_amini, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51249 llvm-svn: 344200
-
- Oct 10, 2018
-
-
George Burgess IV authored
Moving away from UnknownSize is part of the effort to migrate us to LocationSizes (e.g. the cleanup promised in D44748). This doesn't entirely remove all of the uses of UnknownSize; some uses require tweaks to assume that UnknownSize isn't just some kind of int. This patch is intended to just be a trivial replacement for all places where LocationSize::unknown() will Just Work. llvm-svn: 344186
-
- Oct 08, 2018
-
-
Xin Tong authored
Summary: If we have a symbol with (linkonce|weak)_odr linkage, we do not want to dead strip it even it is not prevailing. IR level (linkonce|weak)_odr symbol can become non-prevailing when we mix ELF objects and IR objects where the (linkonce|weak)_odr symbol in the ELF object is prevailing and the ones in the IR objects are not. Stripping them will prevent us from doing optimizations with them. By not dead stripping them, We will convert these symbols to available_externally linkage as a result of non-prevailing and eventually dropping them after inlining. I modified cache-prevailing.ll to use linkonce linkage as it is testing whether cache prevailing bit is effective or not, not we should treat linkonce_odr alive or not Reviewers: tejohnson, pcc Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D52893 llvm-svn: 343970
-
- Oct 03, 2018
-
-
Aditya Kumar authored
Differential Revision: https://reviews.llvm.org/D52704 Reviewers: sebpop, tejohnson, brzycki, SirishP Reviewed By: sebpop llvm-svn: 343663
-
Aditya Kumar authored
Modified the testcases to use both pass managers Use single commandline flag for both pass managers. Differential Revision: https://reviews.llvm.org/D52708 Reviewers: sebpop, tejohnson, brzycki, SirishP Reviewed By: tejohnson, brzycki llvm-svn: 343662
-
- Oct 01, 2018
-
-
Eric Christopher authored
This reverts commit r342387 as it's showing significant performance regressions in a number of benchmarks. Followed up with the committer and original thread with an example and will get performance numbers before recommitting. llvm-svn: 343522
-
Florian Hahn authored
llvm-svn: 343450
-
- Sep 28, 2018
-
-
whitequark authored
This reverts commit c4baf7c2f06ff5459c4f5998ce980346e72bff97. Broke the bots, and should really be in Transforms/Coroutines instead. llvm-svn: 343337
-
whitequark authored
Summary: This patch adds bindings to C and Go for addCoroutinePassesToExtensionPoints, which is used to add coroutine passes to the correct locations in PassManagerBuilder. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: mehdi_amini, modocache, llvm-commits Differential Revision: https://reviews.llvm.org/D51642 llvm-svn: 343336
-
Florian Hahn authored
llvm-svn: 343310
-
Florian Hahn authored
This patch turns LoopInterchange into a loop pass. It now only considers top-level loops and tries to move the innermost loop to the optimal position within the loop nest. By only looking at top-level loops, we might miss a few opportunities the function pass would get (e.g. if we have a loop nest of 3 loops, in the function pass we might process loops at level 1 and 2 and move the inner most loop to level 1, and then we process loops at levels 0, 1, 2 and interchange again, because we now have a different inner loop). But I think it would be better to handle such cases by picking the best inner loop from the start and avoid re-visiting the same loops again. The biggest advantage of it being a function pass is that it interacts nicely with the other loop passes. Without this patch, there are some performance regressions on AArch64 with loop interchanging enabled, where no loops were interchanged, but we missed out on some other loop optimizations. It also removes the SimplifyCFG run. We are just changing branches, so the CFG should not be more complicated, besides the additional 'unique' preheaders this pass might create. Reviewers: chandlerc, efriedma, mcrosier, javed.absar, xbolva00 Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D51702 llvm-svn: 343308
-