- Feb 15, 2017
-
-
Stanislav Mekhanoshin authored
This patch reverts region's scheduling to the original untouched state in case if we have have decreased occupancy. In addition it switches to use TargetRegisterInfo occupancy callback for pressure limits instead of gradually increasing limits which were just passed by. We are going to stay with the best schedule so we do not need to tolerate worsened scheduling anymore. Differential Revision: https://reviews.llvm.org/D29971 llvm-svn: 295206
-
Stanislav Mekhanoshin authored
This patch corrects the maximum workgroups per CU if we have big workgroups (more than 128). This calculation contributes to the occupancy calculation in respect to LDS size. Differential Revision: https://reviews.llvm.org/D29974 llvm-svn: 295134
-
- Feb 14, 2017
-
-
Alexander Timofeev authored
This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907. llvm-svn: 295054
-
Eugene Zelenko authored
Same changes in files affected by reduced MC headers dependencies. llvm-svn: 295009
-
- Feb 12, 2017
-
-
NAKAMURA Takumi authored
AMDGPU::expandMemIntrinsicUses(): Fix an uninitialized variable. This function returned true or undef. llvm-svn: 294895
-
- Feb 10, 2017
-
-
Matt Arsenault authored
llvm-svn: 294694
-
Wei Ding authored
Differential Revision: http://reviews.llvm.org/D26010 llvm-svn: 294692
-
Stanislav Mekhanoshin authored
This change returns empty PSet list for M0 register. Otherwise its PSet as defined by tablegen is SReg_32. This results in incorrect register pressure calculation every time an instruction uses M0. Such uses count as SReg_32 PSet and inadequately increase pressure on SGPRs. Differential Revision: https://reviews.llvm.org/D29798 llvm-svn: 294691
-
- Feb 09, 2017
-
-
Matt Arsenault authored
llvm-svn: 294635
-
Konstantin Zhuravlyov authored
Differential Revision: https://reviews.llvm.org/D29741 llvm-svn: 294627
-
Daniel Berlin authored
llvm-svn: 294621
-
Daniel Berlin authored
GraphTraits: Add range versions of graph traits functions (graph_nodes, graph_children, inverse_graph_nodes, inverse_graph_children). Summary: Convert all obvious node_begin/node_end and child_begin/child_end pairs to range based for. Sending for review in case someone has a good idea how to make graph_children able to be inferred. It looks like it would require changing GraphTraits to be two argument or something. I presume inference does not happen because it would have to check every GraphTraits in the world to see if the noderef types matched. Note: This change was 3-staged with clang as well, which uses Dominators/etc from LLVM. Reviewers: chandlerc, tstellarAMD, dblaikie, rsmith Subscribers: arsenm, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D29767 llvm-svn: 294620
-
- Feb 08, 2017
-
-
Stanislav Mekhanoshin authored
Implement getRegPressureLimit and getRegPressureSetLimit callbacks in SIRegisterInfo. This makes standard converge scheduler to behave almost the same as GCNScheduler, sometime slightly better sometimes a bit worse. In gerenal that is also possible to switch GCNScheduler to use these callbacks instead of getMaxWaves(), which also makes GCNScheduler slightly better on some tests and slightly worse on another. A big win is behavior with converge scheduler. Note, these are used not only by scheduling, but in places like MachineLICM. Differential Revision: https://reviews.llvm.org/D29700 llvm-svn: 294518
-
Konstantin Zhuravlyov authored
llvm-svn: 294454
-
Konstantin Zhuravlyov authored
Differential Revision: https://reviews.llvm.org/D28760#fb670e28 llvm-svn: 294449
-
Konstantin Zhuravlyov authored
llvm-svn: 294445
-
Konstantin Zhuravlyov authored
Differential Revision: https://reviews.llvm.org/D29633 llvm-svn: 294441
-
Konstantin Zhuravlyov authored
Differential Revision: https://reviews.llvm.org/D29318 llvm-svn: 294440
-
Matt Arsenault authored
llvm-svn: 294408
-
- Feb 07, 2017
-
-
Alexander Timofeev authored
lane masks. Differential revision: https://reviews.llvm.org/D29442 llvm-svn: 294324
-
Matt Arsenault authored
llvm-svn: 294281
-
Yaxun Liu authored
For amdgcn target Clang generates addrspacecast to represent null pointers in private and local address spaces. In LLVM codegen, the static variable initializer is lowered by virtual function AsmPrinter::lowerConstant which is target generic. Since addrspacecast is target specific, AsmPrinter::lowerConst This patch overrides AsmPrinter::lowerConstant with AMDGPUAsmPrinter::lowerConstant, which is able to lower the target-specific addrspacecast in the null pointer representation so that -1 is co Differential Revision: https://reviews.llvm.org/D29284 llvm-svn: 294265
-
Stanislav Mekhanoshin authored
There is typo in the debug output: top and bottom candidates are switched. Differential Revision: https://reviews.llvm.org/D29608 llvm-svn: 294257
-
- Feb 04, 2017
-
-
Eugene Zelenko authored
This is preparation to reduce MCExpr.h dependencies. llvm-svn: 294067
-
- Feb 03, 2017
-
-
Matt Arsenault authored
Use typedef, remove unnecessary enum, line wraps. llvm-svn: 294039
-
Stanislav Mekhanoshin authored
This has quite positive performance impact according to measurements. Before previous fixes to limit the optimization that was too high and blowed compile time and scratch usage, but now this is gone and we can bump the threshold. Differential Revision: https://reviews.llvm.org/D29505 llvm-svn: 294032
-
Matt Arsenault authored
llvm-svn: 294031
-
Matt Arsenault authored
This won't be elimnated, so this will just bloat code if/when these are ever used/supported. llvm-svn: 294030
-
Artem Tamazov authored
Issue occurs when assembling "ds_ordered_count v0, v0 gds". llvm-svn: 294004
-
Stanislav Mekhanoshin authored
Exit loop analysis early if suitable private access found. Do not account for GEPs which are invariant to loop induction variable. Do not account for Allocas which are too big to fit into register file anyway. Add option for tuning: -amdgpu-unroll-threshold-private. Differential Revision: https://reviews.llvm.org/D29473 llvm-svn: 293991
-
Matt Arsenault authored
llvm-svn: 293972
-
Matt Arsenault authored
llvm-svn: 293968
-
Matt Arsenault authored
In multi-use cases this can save a few instructions. llvm-svn: 293962
-
- Feb 02, 2017
-
-
Nirav Dave authored
This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915
-
Nirav Dave authored
Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893
-
Matt Arsenault authored
The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857
-
- Feb 01, 2017
-
-
Stanislav Mekhanoshin authored
Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 llvm-svn: 293837
-
Matt Arsenault authored
llvm-svn: 293809
-
Matt Arsenault authored
These were simply preserving the flags of the original operation, which was too conservative in most cases and incorrect for mul. nsw/nuw may be needed for some combines to cleanup messes when intermediate sext_inregs are introduced later. Tested valid combinations with alive. llvm-svn: 293776
-
Matt Arsenault authored
Use a more specific subtarget check and combine hasOneUse checks llvm-svn: 293726
-