- Jan 26, 2017
-
-
Sean Fertile authored
1) Explicitly sets mayLoad/mayStore property in the tablegen files on load/store instructions. 2) Updated the flags on a number of intrinsics indicating that they write memory. 3) Added SDNPMemOperand flags for some target dependent SDNodes so that they propagate their memory operand Review: https://reviews.llvm.org/D28818 llvm-svn: 293200
-
Stanislav Mekhanoshin authored
This change introduces adjustPassManager target callback giving a target an opportunity to tweak PassManagerBuilder before pass managers are populated. This generalizes and replaces addEarlyAsPossiblePasses target callback. In particular that can be used to add custom passes to extension points other than EP_EarlyAsPossible. Differential Revision: https://reviews.llvm.org/D28336 llvm-svn: 293189
-
Nirav Dave authored
This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188
-
Serge Rogatch authored
Summary: This patch provides more staging for tail calls in XRay Arm32 . When the logging part of XRay is ready for tail calls, its support in the core part of XRay Arm32 may be as easy as changing the number passed to the handler from 1 to 2. Coupled patch: - https://reviews.llvm.org/D28674 Reviewers: dberris, rengolin Reviewed By: dberris Subscribers: llvm-commits, iid_iunknown, aemerson, rengolin, dberris Differential Revision: https://reviews.llvm.org/D28673 llvm-svn: 293185
-
Nirav Dave authored
* Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184
-
Rafael Espindola authored
And teach shouldAssumeDSOLocal that ppc has no copy relocations. The resulting code handle a few more case than before. For example, it knows that a weak symbol can be resolved to another .o file, but it will still be in the main executable. llvm-svn: 293180
-
Simon Pilgrim authored
llvm-svn: 293178
-
Simon Pilgrim authored
Pulled out code that removed unused inputs from a target shuffle mask into a helper function to allow it to be reused in a future commit. llvm-svn: 293175
-
Valery Pykhtin authored
Differential revision: https://reviews.llvm.org/D28980 llvm-svn: 293171
-
Simon Dardis authored
This reverts commit r293164. There are multiple tests failing. llvm-svn: 293170
-
Simon Dardis authored
This patch makes one change to GOT handling and two changes to N64's relocation model handling. Furthermore, the jumptable encodings have been corrected for static N64. Big GOT handling is now done via a new SDNode MipsGotHi - this node is unconditionally lowered to an lui instruction. The first change to N64's relocation handling is the lifting of the restriction that N64 always uses PIC. Now it is possible to target static environments. The second change adds support for 64 bit symbols and enables them by default. Previously N64 had patterns for sym32 mode only. In this mode all symbols are assumed to have 32 bit addresses. sym32 mode support is selectable with attribute 'sym32'. A follow on patch for clang will add the necessary frontend parameter. This partially resolves PR/23485. Thanks to Brooks Davis for reporting the issue! Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris Differential Revision: https://reviews.llvm.org/D23652 llvm-svn: 293164
-
Diana Picus authored
Add support for loading i1, i8 and i16 arguments from the stack, with or without the ABI extension flags. When the ABI extension flags are present, we load a 4-byte value, otherwise we preserve the size of the load and let the instruction selector replace it with a LDRB/LDRH. This generates the same thing as DAGISel. Differential Revision: https://reviews.llvm.org/D27803 llvm-svn: 293163
-
Craig Topper authored
[AVX-512] Move the combine that runs combineBitcastForMaskedOp to the last DAG combine phase where I had originally meant to put it. llvm-svn: 293157
-
Craig Topper authored
[X86] When bitcasting INSERT_SUBVECTOR/EXTRACT_SUBVECTOR to match masked operations, use the correct type for the immediate operand. llvm-svn: 293156
-
Jonas Paulsson authored
Refactoring to remove duplications of this method. New method getOperandsScalarizationOverhead() that looks at the present unique operands and add extract costs for them. Old behaviour was to just add extract costs for one operand of the type always, which still happens in getArithmeticInstrCost() if no operands are provided by the caller. This is a good start of improving on this, but there are more places that can be improved by using getOperandsScalarizationOverhead(). Review: Hal Finkel https://reviews.llvm.org/D29017 llvm-svn: 293155
-
Matt Arsenault authored
llvm-svn: 293127
-
- Jan 25, 2017
-
-
Daniel Jasper authored
This reverts commit r292680. It is causing significantly worse performance and test timeouts in our internal builds. I have already routed reproduction instructions your way. llvm-svn: 293092
-
Konstantin Zhuravlyov authored
Differential Revision: https://reviews.llvm.org/D29115 llvm-svn: 293083
-
Matt Arsenault authored
According to the documentation this is supposed to be -1 if indirect calls are not supported. llvm-svn: 293081
-
Serge Rogatch authored
Summary: This patch prepares more for tail call support in XRay. Until the logging part supports tail calls, this is just staging, so it seems LLVM part is mostly ready with this patch. Related: https://reviews.llvm.org/D28948 (compiler-rt) Reviewers: dberris, rengolin Reviewed By: dberris Subscribers: llvm-commits, iid_iunknown, aemerson Differential Revision: https://reviews.llvm.org/D28947 llvm-svn: 293080
-
Krzysztof Parzyszek authored
llvm-svn: 293077
-
Matthias Braun authored
Change getReservedRegs() to not mark a register as reserved and then revert that decision in some cases. Motivated by the discussion in https://reviews.llvm.org/D29056 llvm-svn: 293073
-
Chad Rosier authored
llvm-svn: 293063
-
Martin Bohme authored
Summary: Lifetime extension wasn't triggered on the result of BuildMI because the reference was non-const. However, instead of adding a const, I've removed the reference entirely as RVO should kick in anyway. Reviewers: rovka, bkramer Reviewed By: bkramer Subscribers: aemerson, rengolin, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D29124 llvm-svn: 293059
-
Mohammed Agabaria authored
Differential Revision: https://reviews.llvm.org/D28547 llvm-svn: 293040
-
Diana Picus authored
Add support for: * i1 add * i1 function arguments, if passed through registers * i1 returns, with ABI signext/zeroext Differential Revision: https://reviews.llvm.org/D27706 llvm-svn: 293035
-
Diana Picus authored
At the moment, this means supporting the signext/zeroext attribute on the return type of the function. For function arguments, signext/zeroext should be handled by the caller, so there's nothing for us to do until we start lowering calls. Note that this does not include support for other extensions (i8 to i16), those will be added later. Differential Revision: https://reviews.llvm.org/D27705 llvm-svn: 293034
-
Coby Tayree authored
Enable the next form (intel style): "mov <reg64>, <largeImm>" which is should be available, where <largeImm> stands for immediates which exceed the range of a singed 32bit integer Differential Revision: https://reviews.llvm.org/D28988 llvm-svn: 293030
-
Diana Picus authored
Thumb is not supported yet, so bail out early. llvm-svn: 293029
-
Matt Arsenault authored
llvm-svn: 293028
-
Matt Arsenault authored
clang already emits this with -cl-no-signed-zeros, but codegen doesn't do anything with it. Treat it like the other fast math attributes, and change one place to use it. llvm-svn: 293024
-
Matt Arsenault authored
Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016
-
Ahmed Bougacha authored
This surprisingly isn't NFC because there are patterns to select GPR sub to SUBSWrr (rather than SUBWrr/rs); SUBS is later optimized to SUB if NZCV is dead. From ISel's perspective, both are fine. llvm-svn: 293010
-
Tom Stellard authored
Summary: This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1]. Patch By: Dave Airlie Reviewers: nhaehnle, arsenm, tstellarAMD Reviewed By: arsenm Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D25428 llvm-svn: 293000
-
Eugene Zelenko authored
llvm-svn: 292996
-
Eugene Zelenko authored
llvm-svn: 292988
-
- Jan 24, 2017
-
-
Matt Arsenault authored
The sequence like this: v_cmpx_le_f32_e32 vcc, 0, v0 s_branch BB0_30 s_cbranch_execnz BB0_30 ; BB#29: exp null off, off, off, off done vm s_endpgm BB0_30: ; %endif110 is likely wrong. The s_branch instruction will unconditionally jump to BB0_30 and the skip block (exp done + endpgm) inserted for performing the kill instruction will never be executed. This results in a GPU hang with Star Ruler 2. The s_branch instruction is added during the "Control Flow Optimizer" pass which seems to re-organize the basic blocks, and we assume that SI_KILL_TERMINATOR is always the last instruction inside a basic block. Thus, after inserting a skip block we just go to the next BB without looking at the subsequent instructions after the kill, and the s_branch op is never removed. Instead, we should remove the unconditional out branches and let skip the two instructions if the exec mask is non-zero. This patch fixes the GPU hang and doesn't introduce any regressions with "make check". Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019 Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 292985
-
Eugene Zelenko authored
llvm-svn: 292983
-
Matt Arsenault authored
This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982
-
Changpeng Fang authored
Differential Revision: http://reviews.llvm.org/D28970 Reviewer: Matt llvm-svn: 292966
-