- Oct 19, 2016
-
-
Dehao Chen authored
llvm-svn: 284544
-
- Oct 18, 2016
-
-
Dehao Chen authored
Summary: The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass. The performance impact on speccpu2006 on Intel sandybridge machines: spec/2006/fp/C++/444.namd 25.3 +0.26% spec/2006/fp/C++/447.dealII 45.96 -0.10% spec/2006/fp/C++/450.soplex 41.97 +1.49% spec/2006/fp/C++/453.povray 36.83 -0.96% spec/2006/fp/C/433.milc 23.81 +0.32% spec/2006/fp/C/470.lbm 41.17 +0.34% spec/2006/fp/C/482.sphinx3 48.13 +0.69% spec/2006/int/C++/471.omnetpp 22.45 +3.25% spec/2006/int/C++/473.astar 21.35 -2.06% spec/2006/int/C++/483.xalancbmk 36.02 -2.39% spec/2006/int/C/400.perlbench 33.7 -0.17% spec/2006/int/C/401.bzip2 22.9 +0.52% spec/2006/int/C/403.gcc 32.42 -0.54% spec/2006/int/C/429.mcf 39.59 +0.19% spec/2006/int/C/445.gobmk 26.98 -0.00% spec/2006/int/C/456.hmmer 24.52 -0.18% spec/2006/int/C/458.sjeng 28.26 +0.02% spec/2006/int/C/462.libquantum 55.44 +3.74% spec/2006/int/C/464.h264ref 46.67 -0.39% geometric mean +0.20% Manually checked 473 and 471 to verify the diff is in the noise range. Reviewers: rengolin, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24818 llvm-svn: 284541
-
- Aug 25, 2016
-
-
Eugene Zelenko authored
Differential revision: https://reviews.llvm.org/D23861 llvm-svn: 279695
-
- Jul 15, 2016
-
-
Jacques Pienaar authored
Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect. Reviewers: tstellarAMD, mcrosier Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai Differential Revision: https://reviews.llvm.org/D22409 llvm-svn: 275564
-
- Jul 01, 2016
-
-
Duncan P. N. Exon Smith authored
Use MachineInstr& instead of MachineInstr* in MachineSinker to help avoid implicit conversions from iterator to pointer. llvm-svn: 274303
-
- Jun 30, 2016
-
-
Duncan P. N. Exon Smith authored
This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `*` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr*` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189
-
- Apr 23, 2016
-
-
Andrew Kaylor authored
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling). Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267231
-
- Apr 22, 2016
-
-
Vedant Kumar authored
This reverts commit r267022, due to an ASan failure: http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549 llvm-svn: 267115
-
- Apr 21, 2016
-
-
Quentin Colombet authored
splitting edges. MachineBasicBlock::SplitCriticalEdges will crash if a nullptr would have been passed for the Pass argument. Do not allow that by turning this argument into a reference. The alternative would have been to make the Pass a truly optional argument, but although this is easy to do, I was afraid users using it like this would not be aware the livness information, dominator tree and such would silently be broken. llvm-svn: 267052
-
Andrew Kaylor authored
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations. The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used. The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way. Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267022
-
- Mar 30, 2016
-
-
Fiona Glaser authored
Some targets may disagree on what they want sunk or not sunk, so make this a target hook instead of hardcoded. llvm-svn: 264799
-
- Mar 09, 2016
-
-
Chad Rosier authored
http://reviews.llvm.org/D17967 llvm-svn: 263021
-
- Feb 18, 2016
-
-
Richard Trieu authored
Cleanup for upcoming Clang warning -Wcomma. No functionality change intended. llvm-svn: 261270
-
- Jan 20, 2016
-
-
Sanjoy Das authored
Summary: This teaches MachineSink to not sink instructions that might break the implicit null check optimization that runs later. This should not affect frontends that do not use implicit null checks. Reviewers: aadg, reames, hfinkel, atrick Subscribers: majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D14632 llvm-svn: 258254
-
- Oct 09, 2015
-
-
Owen Anderson authored
This covers the common case of operations that cannot be sunk. Operations that cannot be hoisted should already be handled properly via the safe-to-speculate rules and mechanisms. llvm-svn: 249865
-
- Sep 09, 2015
-
-
Chandler Carruth authored
with the new pass manager, and no longer relying on analysis groups. This builds essentially a ground-up new AA infrastructure stack for LLVM. The core ideas are the same that are used throughout the new pass manager: type erased polymorphism and direct composition. The design is as follows: - FunctionAAResults is a type-erasing alias analysis results aggregation interface to walk a single query across a range of results from different alias analyses. Currently this is function-specific as we always assume that aliasing queries are *within* a function. - AAResultBase is a CRTP utility providing stub implementations of various parts of the alias analysis result concept, notably in several cases in terms of other more general parts of the interface. This can be used to implement only a narrow part of the interface rather than the entire interface. This isn't really ideal, this logic should be hoisted into FunctionAAResults as currently it will cause a significant amount of redundant work, but it faithfully models the behavior of the prior infrastructure. - All the alias analysis passes are ported to be wrapper passes for the legacy PM and new-style analysis passes for the new PM with a shared result object. In some cases (most notably CFL), this is an extremely naive approach that we should revisit when we can specialize for the new pass manager. - BasicAA has been restructured to reflect that it is much more fundamentally a function analysis because it uses dominator trees and loop info that need to be constructed for each function. All of the references to getting alias analysis results have been updated to use the new aggregation interface. All the preservation and other pass management code has been updated accordingly. The way the FunctionAAResultsWrapperPass works is to detect the available alias analyses when run, and add them to the results object. This means that we should be able to continue to respect when various passes are added to the pipeline, for example adding CFL or adding TBAA passes should just cause their results to be available and to get folded into this. The exception to this rule is BasicAA which really needs to be a function pass due to using dominator trees and loop info. As a consequence, the FunctionAAResultsWrapperPass directly depends on BasicAA and always includes it in the aggregation. This has significant implications for preserving analyses. Generally, most passes shouldn't bother preserving FunctionAAResultsWrapperPass because rebuilding the results just updates the set of known AA passes. The exception to this rule are LoopPass instances which need to preserve all the function analyses that the loop pass manager will end up needing. This means preserving both BasicAAWrapperPass and the aggregating FunctionAAResultsWrapperPass. Now, when preserving an alias analysis, you do so by directly preserving that analysis. This is only necessary for non-immutable-pass-provided alias analyses though, and there are only three of interest: BasicAA, GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is preserved when needed because it (like DominatorTree and LoopInfo) is marked as a CFG-only pass. I've expanded GlobalsAA into the preserved set everywhere we previously were preserving all of AliasAnalysis, and I've added SCEVAA in the intersection of that with where we preserve SCEV itself. One significant challenge to all of this is that the CGSCC passes were actually using the alias analysis implementations by taking advantage of a pretty amazing set of loop holes in the old pass manager's analysis management code which allowed analysis groups to slide through in many cases. Moving away from analysis groups makes this problem much more obvious. To fix it, I've leveraged the flexibility the design of the new PM components provides to just directly construct the relevant alias analyses for the relevant functions in the IPO passes that need them. This is a bit hacky, but should go away with the new pass manager, and is already in many ways cleaner than the prior state. Another significant challenge is that various facilities of the old alias analysis infrastructure just don't fit any more. The most significant of these is the alias analysis 'counter' pass. That pass relied on the ability to snoop on AA queries at different points in the analysis group chain. Instead, I'm planning to build printing functionality directly into the aggregation layer. I've not included that in this patch merely to keep it smaller. Note that all of this needs a nearly complete rewrite of the AA documentation. I'm planning to do that, but I'd like to make sure the new design settles, and to flesh out a bit more of what it looks like in the new pass manager first. Differential Revision: http://reviews.llvm.org/D12080 llvm-svn: 247167
-
- Aug 28, 2015
-
-
Reid Kleckner authored
We can now run 32-bit programs with empty catch bodies. The next step is to change PEI so that we get funclet prologues and epilogues. llvm-svn: 246235
-
- Jun 16, 2015
-
-
Arnaud A. de Grandmaison authored
The successors cache is now a local variable, making it more visible that it is only valid for the MBB being processed. llvm-svn: 239807
-
- Jun 15, 2015
-
-
Arnaud A. de Grandmaison authored
This patch fixes a compilation time issue, when MachineSink faces PHIs with a huge number of operands. This can happen for example in goto table based interpreters, where some basic blocks can have several of those PHIs, each one with several hundreds operands. MachineSink was spending a significant time re-building and re-sorting the list of successors of the current MachineBasicBlock. The computing and sorting of the current MachineBasicBlock successors is now cached. llvm-svn: 239720
-
- Jun 01, 2015
-
-
Owen Anderson authored
restricted. No test because no in-tree target currently has convergent MachineInstr's. llvm-svn: 238763
-
- May 19, 2015
-
-
Matthias Braun authored
llvm-svn: 237726
-
- May 16, 2015
-
-
Matthias Braun authored
Currently whenever we sink any instruction, we do clearKillFlags for every use of every use operand for that instruction, apparently there are a lot of duplication, therefore compile time penalties. This patch collect all the interested registers first, do clearKillFlags for it all together at once at the end, so we only need to do clearKillFlags once for one register, duplication is avoided. Patch by Lawrence Hu! Differential Revision: http://reviews.llvm.org/D9719 llvm-svn: 237510
-
- May 08, 2015
-
-
Pete Cooper authored
The test here was sinking the AND here to a lower BB: %vreg7<def> = ANDWri %vreg8, 0; GPR32common:%vreg7,%vreg8 TBNZW %vreg8<kill>, 0, <BB#1>; GPR32common:%vreg8 which meant that vreg8 was read after it was killed. This commit changes the code from clearing kill flags on the AND to clearing flags on all registers used by the AND. llvm-svn: 236886
-
Pete Cooper authored
llvm-svn: 236885
-
- Dec 04, 2014
-
-
Patrik Hagglund authored
According to a previous FIXME comment we now not only look at MBB successors, but also handle code sinking past them: x = computation if () {} else {} use x The instruction could be sunk over the whole diamond for the if/then/else (or loop, etc), allowing it to be sunk into other blocks after that. Modified test added in r204522, due to one spill less present. Minor fixes in comments. Patch provided by Jonas Paulsson. Reviewed by Hal Finkel. llvm-svn: 223350
-
- Nov 19, 2014
-
-
David Blaikie authored
This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334
-
- Oct 15, 2014
-
-
Jingyue Wu authored
Summary: Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo. The old heuristics caused instructions to sink unnecessarily, and might create register pressure. This is the second try of the fix. The first one (D4814) caused a performance regression due to failing to sink instructions out of loops (PR21115). This patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower one regardless of whether the target block post-dominates the source. Thanks Alexey Volkov for reporting PR21115! Test Plan: Added a NVPTX codegen test to verify that our change prevents the backend from over-sinking. It also shows the unnecessary register pressure caused by over-sinking. Added an X86 test to verify we can sink instructions out of loops regardless of the dominance relationship. This test is reduced from Alexey's test in PR21115. Updated an affected test in X86. Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime performance. Results are attached separately in the review thread. Reviewers: Jiangning, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5633 llvm-svn: 219773
-
- Oct 14, 2014
-
-
Eric Christopher authored
cached subtarget and not the TargetMachine. llvm-svn: 219668
-
- Oct 01, 2014
-
-
Jingyue Wu authored
Reported by Alexey Volkov in PR21115 llvm-svn: 218771
-
- Sep 26, 2014
-
-
Bruno Cardoso Lopes authored
Machine Sink uses loop depth information to select between successors BBs to sink machine instructions into, where BBs within smaller loop depths are preferable. This patch adds support for choosing between successors by using profile information from BlockFrequencyInfo instead, whenever the information is available. Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5% execution speedup in average on x86-64 darwin. <rdar://problem/18021659> llvm-svn: 218472
-
- Sep 09, 2014
-
-
Patrik Hagglund authored
This solves the problem of having a kill flag inside a loop with a definition of the register prior to the loop: %vreg368<def> ... Inside loop: %vreg520<def> = COPY %vreg368 %vreg568<def,tied1> = add %vreg341<tied0>, %vreg520<kill> => was coalesced into => %vreg568<def,tied1> = add %vreg341<tied0>, %vreg368<kill> MachineVerifier then complained: *** Bad machine code: Virtual register killed in block, but needed live out. *** The kill flag for %vreg368 is incorrect, and is cleared by this patch. This is similar to the clearing done at the end of MachineSinking::SinkInstruction(). Patch provided by Jonas Paulsson. Reviewed by Quentin Colombet and Juergen Ributzka. llvm-svn: 217427
-
- Sep 04, 2014
-
-
Juergen Ributzka authored
This reverts commit r216803, because it might have broken the buildbot. The issue is tracked in PR20842. llvm-svn: 217120
-
- Sep 01, 2014
-
-
Jingyue Wu authored
Summary: Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in isPostDominatedBy, use the real MachinePostDominatorTree. The old heuristics caused instructions to sink unnecessarily, and might create register pressure. Test Plan: Added a NVPTX codegen test to verify that our change is in effect. It also shows the unnecessary register pressure caused by over-sinking. Updated affected tests in AArch64 and X86. Reviewers: eliben, meheff, Jiangning Reviewed By: Jiangning Subscribers: jholewinski, aemerson, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4814 llvm-svn: 216862
-
- Aug 30, 2014
-
-
Juergen Ributzka authored
When sinking an instruction it might be moved past the original last use of one of its operands. This last use has the kill flag set and the verifier will obviously complain about this. Before Machine Sinking (AArch64): %vreg3<def> = ASRVXr %vreg1, %vreg2<kill> %XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def> ... After Machine Sinking: %XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def> ... %vreg3<def> = ASRVXr %vreg1, %vreg2<kill> This fix clears all the kill flags in all instruction that use the same operands as the instruction that is being sunk. This fixes rdar://problem/18180996. llvm-svn: 216803
-
- Aug 12, 2014
-
-
Quentin Colombet authored
as long as possible. ** Context ** Each time the dominance information is modified, the dominator tree analysis switches in a slow query mode. After a few queries without any modification on the dominator tree, it performs an expensive update of its internal structure to provide fast queries again. ** Problem ** Prior to this patch, the MachineSink pass was splitting the critical edges on demand while relying heavy on the dominator tree information. In some cases, this leads to pathological behavior where: - We end up in the slow query mode right after splitting an edge. - We update the dominance information. - We break the dominance information again, thus ending up in the slow query mode and so on. ** Proposed Solution ** To mitigate this effect, this patch postpones all the splitting of the edges at the end of each iteration of the main loop. The benefits are: - The dominance information is valid for the life time of an iteration. - This simplifies the code as we do not have to special treat instructions that are sunk on critical edges. Indeed, the related block will be available through the next iteration. The downside is that when edges splitting is required, this incurs an additional iteration of the main loop compared to the previous scheme. ** Performance ** Thanks to this patch, the motivating example compiles in 6+ minutes instead of 10+ minutes. No test case added as the motivating example as nothing special but being huge! I have measured only noise for both the compile time and the runtime on the llvm test-suite + SPECs with Os and O3. Note: The current implementation of MachineBasicBlock::SplitCriticalEdge also uses the dominance information and therefore, hits this problem. A subsequent patch will address that. <rdar://problem/17894619> llvm-svn: 215410
-
- Aug 04, 2014
-
-
Eric Christopher authored
information and update all callers. No functional change. llvm-svn: 214781
-
- Jul 29, 2014
-
-
Jiangning Liu authored
llvm-svn: 214158
-
- Apr 22, 2014
-
-
Chandler Carruth authored
define below all header includes in the lib/CodeGen/... tree. While the current modules implementation doesn't check for this kind of ODR violation yet, it is likely to grow support for it in the future. It also removes one layer of macro pollution across all the included headers. Other sub-trees will follow. llvm-svn: 206837
-
- Apr 14, 2014
-
-
Craig Topper authored
[C++11] More 'nullptr' conversion. In some cases just using a boolean check instead of comparing to nullptr. llvm-svn: 206142
-
- Mar 31, 2014
-
-
Paul Robinson authored
pass normally runs at optimization level None, or is part of the register allocation pipeline. llvm-svn: 205228
-