- Mar 23, 2017
-
-
Konstantin Zhuravlyov authored
Differential Revision: https://reviews.llvm.org/D30969 llvm-svn: 298558
-
- Mar 22, 2017
-
-
Konstantin Zhuravlyov authored
- These are not required for low level runtime Differential Revision: https://reviews.llvm.org/D29949 llvm-svn: 298556
-
Konstantin Zhuravlyov authored
- Rename runtime metadata -> code object metadata - Make metadata not flow - Switch enums to use ScalarEnumerationTraits - Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc - Introduce in-memory representation for attributes - Code object metadata streamer - Create metadata for isa and printf during EmitStartOfAsmFile - Create metadata for kernel during EmitFunctionBodyStart - Finalize and emit metadata to .note during EmitEndOfAsmFile - Other minor improvements/bug fixes Differential Revision: https://reviews.llvm.org/D29948 llvm-svn: 298552
-
- Mar 21, 2017
-
-
Matt Arsenault authored
This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452
-
Matthias Braun authored
Fix two problems related to r298025: - SplitKit would create duplicate VNIs in some cases leading to crashs when hoisting copies. - VirtRegMap could fail expanding copies at the beginning of a basic block. This fixes http://llvm.org/PR32353 llvm-svn: 298448
-
Matt Arsenault authored
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444
-
George Burgess IV authored
This adds a parameter to @llvm.objectsize that makes it return conservative values if it's given null. This fixes PR23277. Differential Revision: https://reviews.llvm.org/D28494 llvm-svn: 298430
-
Marek Olsak authored
Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31158 llvm-svn: 298397
-
Marek Olsak authored
Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31157 llvm-svn: 298396
-
Matt Arsenault authored
Fold these to undef during lowering so users get eliminated. llvm-svn: 298387
-
Matt Arsenault authored
llvm-svn: 298386
-
Matt Arsenault authored
Fixes not eliminating store when intrinsic is lowered to undef. llvm-svn: 298385
-
Valery Pykhtin authored
Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368
-
Sam Kolton authored
Summary: First iteration of SDWA peephole. This pass tries to combine several instruction into one SDWA instruction. E.g. it converts: ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1 V_ADD_I32_e32 %vreg2, %vreg0, %vreg3 V_LSHLREV_B32_e32 %vreg4, 16, %vreg2 ''' Into: ''' V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD ''' Pass structure: 1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''. 2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0''' 3. Iterate over all potential instructions and check if they can be converted into SDWA. 4. Convert instructions to SDWA. This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done). There are several ways this pass can be improved: 1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass. 2. Introduce more SDWA patterns 3. Introduce mnemonics to limit when SDWA patterns should apply Reviewers: vpykhtin, alex-t, arsenm, rampitec Subscribers: wdng, nhaehnle, mgorny Differential Revision: https://reviews.llvm.org/D30038 llvm-svn: 298365
-
- Mar 20, 2017
-
-
Konstantin Zhuravlyov authored
Differential Revision: https://reviews.llvm.org/D31141 llvm-svn: 298281
-
Konstantin Zhuravlyov authored
This reverts commit r297958, it breaks device-libs build. llvm-svn: 298239
-
- Mar 19, 2017
-
-
Ahmed Bougacha authored
Folding instructions when selecting can cause them to become dead. Don't select these dead instructions (if they don't have other side effects, and don't define physical registers). Preserve existing tests by adding COPYs. In some tests, the G_CONSTANT vregs never get constrained to a class: the only use of the vreg was folded into another instruction, so the G_CONSTANT, now dead, never gets selected. llvm-svn: 298224
-
- Mar 18, 2017
-
-
Stanislav Mekhanoshin authored
This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 llvm-svn: 298172
-
- Mar 17, 2017
-
-
Matt Arsenault authored
If the loop condition was an i1 phi with a constantexpr input, this would add a loop intrinsic fed by a phi dependent on a call to if.break in the same block. Insert the call in the loop header. llvm-svn: 298121
-
Matthias Braun authored
- This fixes a bug where subregister incompatible with the vregs register class where used. - Implement the case where multiple copies are necessary to cover a given lanemask. Differential Revision: https://reviews.llvm.org/D30438 llvm-svn: 298025
-
- Mar 16, 2017
-
-
Stanislav Mekhanoshin authored
We can mark functions to always inline early in the opt. Since we do not have call support this early inlining creates opportunities for inter-procedural optimizations which would not occur otherwise. Differential Revision: https://reviews.llvm.org/D31016 llvm-svn: 297958
-
Matt Arsenault authored
llvm-svn: 297913
-
- Mar 15, 2017
-
-
Matt Arsenault authored
llvm-svn: 297903
-
Matt Arsenault authored
computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873
-
- Mar 14, 2017
-
-
Nirav Dave authored
Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695
-
- Mar 13, 2017
-
-
Matt Arsenault authored
llvm-svn: 297658
-
- Mar 11, 2017
-
-
Matt Arsenault authored
llvm-svn: 297557
-
Matt Arsenault authored
Since v_max_f32_e64/v_max_f16_e64 can be folded if the target instruction supports the clamp bit, we also need to maintain modifiers when converting v_mac to v_mad. This fixes a rendering issue with Dirt Rally because a v_mac instruction with the clamp bit set was converted to a v_mad but that bit was lost during the conversion. Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit") Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 297556
-
Stanislav Mekhanoshin authored
This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536
-
- Mar 09, 2017
-
-
Matt Arsenault authored
llvm-svn: 297354
-
- Mar 08, 2017
-
-
Matt Arsenault authored
If there is only one successor, and that successor only has one predecessor the wait can obviously be delayed until uses or the end of the next block. This avoids code quality regressions when there are trivial fallthrough blocks inserted for structurization. llvm-svn: 297251
-
Matt Arsenault authored
When doing arcp optimization with a constant denominator, this was leaving behind rcps with constant inputs. llvm-svn: 297248
-
Changpeng Fang authored
Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D22025 llvm-svn: 297243
-
- Mar 07, 2017
-
-
Konstantin Zhuravlyov authored
It breaks line tables because the patch is not complete, working on a complete one at the moment This reverts commit r294031. llvm-svn: 297118
-
- Mar 06, 2017
-
-
Jan Vesely authored
also exit early on kill instead of redefinition. Differential Revision: https://reviews.llvm.org/D30230 llvm-svn: 297060
-
- Mar 03, 2017
-
-
Chandler Carruth authored
This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. llvm-svn: 296862
-
- Mar 02, 2017
-
-
Tobias Grosser authored
This commit also relied on r296812, which I just reverted. We should probably apply it again, after the r296812 has been discussed and been reapplied in some variant. llvm-svn: 296820
-
Matthias Braun authored
Surprisingly, one of the three interference checks in LiveRegMatrix was using the main live range instead of the apropriate subregister range resulting in unnecessarily conservative results. llvm-svn: 296722
-
- Mar 01, 2017
-
-
Artur Pilipenko authored
Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336). Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 296651
-
Matt Arsenault authored
Modify the test so that it is still testing something closer to what it was intended to originally. I think the original intent was to test the situation where there was a branch on execz and then unconditional branch required relaxing.With the change in r296539, there was no longer and execz branch. Change the test so that there is now an execz branch inserted. There is no longer an unconditional branch after the execz branch, so this might need to be tricked in some other way to keep that there. llvm-svn: 296574
-