- Nov 18, 2016
-
-
Tom Stellard authored
Summary: The 32-bit instructions don't zero the high 16-bits like the 16-bit instructions do. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26828 llvm-svn: 287342
-
Nicolai Haehnle authored
Summary: The addr64-based legalization is incorrect for MUBUF instructions with idxen set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions. This affects e.g. shaders that access buffer textures. Since we never actually need the addr64-legalization in shaders, this patch takes the easy route and keys off the calling convention. If this ever affects (non-OpenGL) compute, the type of legalization needs to be chosen based on some TSFlag. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664 Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26747 llvm-svn: 287339
-
Matt Arsenault authored
There are still crashes on non-MVT types in other places. llvm-svn: 287310
-
- Nov 17, 2016
-
-
Konstantin Zhuravlyov authored
This reverts commit r287146. This breaks few conformance tests. llvm-svn: 287233
-
Konstantin Zhuravlyov authored
llvm-svn: 287204
-
Konstantin Zhuravlyov authored
llvm-svn: 287201
-
Konstantin Zhuravlyov authored
Differential Revision: https://reviews.llvm.org/D26732 llvm-svn: 287199
-
- Nov 16, 2016
-
-
Matt Arsenault authored
This fixes a probably unintended divergence from the default scheduler behavior. llvm-svn: 287146
-
Tom Stellard authored
Summary: 1. Don't try to copy values to and from the same register class. 2. Replace copies with of registers with immediate values with v_mov/s_mov instructions. The main purpose of this change is to make MachineSink do a better job of determining when it is beneficial to split a critical edge, since the pass assumes that copies will become move instructions. This prevents a regression in uniform-cfg.ll if we enable critical edge splitting for AMDGPU. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23408 llvm-svn: 287131
-
Konstantin Zhuravlyov authored
- Select `select` to `v_cndmask_b32` - Expand `select_cc` - Refactor patterns Differential Revision: https://reviews.llvm.org/D26714 llvm-svn: 287074
-
Jan Vesely authored
wbinvl.* are vector instruction that do not sue vector registers. v2: check only M?BUF instructions Differential Revision: https://reviews.llvm.org/D26633 llvm-svn: 287056
-
- Nov 15, 2016
-
-
Tom Stellard authored
Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26670 llvm-svn: 287035
-
Matt Arsenault authored
Also respect the TII hook for these like the generic code does in case we want a flag later to disable this. llvm-svn: 287021
-
Matt Arsenault authored
Fixes giving up on clustering common addr64 accesses with constant 0 soffset. llvm-svn: 287018
-
Stanislav Mekhanoshin authored
The wave barrier represents the discardable barrier. Its main purpose is to carry convergent attribute, thus preventing illegal CFG optimizations. All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA. The barrier is discarded during code generation. Differential Revision: https://reviews.llvm.org/D26585 llvm-svn: 287007
-
Matt Arsenault authored
llvm-svn: 286931
-
Matt Arsenault authored
llvm-svn: 286912
-
- Nov 14, 2016
-
-
Changpeng Fang authored
Summary: Extend image intrinsics to support data types of V1F32 and V2F32. TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active, even though such case should be very rare. Reviewers: tstellarAMD Differential Revision: http://reviews.llvm.org/D26472 llvm-svn: 286860
-
- Nov 13, 2016
-
-
Matt Arsenault authored
nThis avoids the nasty problems caused by using memory instructions that read the exec mask while spilling / restoring registers used for control flow masking, but only for VI when these were added. This always uses the scalar stores when enabled currently, but it may be better to still try to spill to a VGPR and use this on the fallback memory path. The cache also needs to be flushed before wave termination if a scalar store is used. llvm-svn: 286766
-
Konstantin Zhuravlyov authored
Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
-
- Nov 12, 2016
-
-
Tom Stellard authored
Summary: This fixes a regression caused by r286464. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26570 llvm-svn: 286687
-
Tom Stellard authored
Summary: This pass was assuming that when a PHI instruction defined a register used by another PHI instruction that the defining insstruction would be legalized before the using instruction. This assumption was causing the pass to not legalize some PHI nodes within divergent flow-control. This fixes a bug that was uncovered by r285762. Reviewers: nhaehnle, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26303 llvm-svn: 286676
-
- Nov 11, 2016
-
-
Matthias Braun authored
addSchedBarrierDeps() is supposed to add use operands to the ExitSU node. The current implementation adds uses for calls/barrier instruction and the MBB live-outs in all other cases. The use operands of conditional jump instructions were missed. Also added code to macrofusion to set the latencies between nodes to zero to avoid problems with the fusing nodes lingering around in the pending list now. Differential Revision: https://reviews.llvm.org/D25140 llvm-svn: 286544
-
Stanislav Mekhanoshin authored
This reverts commit r286171, it breaks piglit test fs-discard-exit-2 llvm-svn: 286530
-
- Nov 10, 2016
-
-
Yaxun Liu authored
Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata. However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html). This patch lets AMDGPU backend emits runtime metadata as a note element in .note section. Differential Revision: https://reviews.llvm.org/D25781 llvm-svn: 286502
-
Tom Stellard authored
Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 286464
-
- Nov 08, 2016
-
-
Stanislav Mekhanoshin authored
Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a forward propagation of a v_cmp 64 bit result to an user is implemented. Additional side effect of this is that we may consume less VGPRs in a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. llvm-svn: 286171
-
- Nov 07, 2016
-
-
Matt Arsenault authored
The comment explaining why this was necessary is incorrect in its description of v_cmp's behavior for inactive workitems. llvm-svn: 286134
-
- Nov 04, 2016
-
-
Tom Stellard authored
This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995
-
- Nov 03, 2016
-
-
Tom Stellard authored
Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939
-
Alexander Timofeev authored
hange explores the fact that LDS reads may be reordered even if access the same location. Prior the change, algorithm immediately stops as soon as any memory access encountered between loads that are expected to be merged together. Although, Read-After-Read conflict cannot affect execution correctness. Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%. Also improvement expected on any massive sequences of reads from LDS. Differential Revision: https://reviews.llvm.org/D25944 llvm-svn: 285919
-
- Nov 02, 2016
-
-
Matt Arsenault authored
Some of these are already fixed or tested somewhere else. llvm-svn: 285840
-
Matt Arsenault authored
llvm-svn: 285828
-
Matt Arsenault authored
This is already done with VGPR immediates and saves 4 bytes. llvm-svn: 285765
-
- Nov 01, 2016
-
-
Matt Arsenault authored
This is the conservatively correct way because it's easy to move or replace a scalar immediate. This was incorrect in the case when the register class wasn't known from the static instruction definition, but still needed to be an SGPR. The main example of this is inlineasm has an SGPR constraint. Also start verifying the register classes of inlineasm operands. llvm-svn: 285762
-
Konstantin Zhuravlyov authored
This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/ctlz.ll test/CodeGen/AMDGPU/ctlz_zero_undef.ll Differential Revision: https://reviews.llvm.org/D25802 llvm-svn: 285716
-
Tom Stellard authored
I wanted to implement this as a target independent expansion, however when targets say they want to expand FP_TO_FP16 what they actually want is the unsafe math expansion when possible and expansion to a libcall in all other cases. The only way to make this work as a target independent would be to add logic to target's TargetLowering construction to mark theses nodes as Expand when LegalizeDAG can use the unsafe expansion and mark them as LibCall when it cannot. I think this would be possible, but I think it would be too fragile and complex as it would require targets to keep their expansion logic up to date with the code in LegalizeDAG. Reviewers: bogner, ab, t.p.northover, arsenm Subscribers: wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25999 llvm-svn: 285704
-
Valery Pykhtin authored
Differential revision: https://reviews.llvm.org/D26077 llvm-svn: 285684
-
- Oct 29, 2016
-
-
Matt Arsenault authored
I'm guessing at how it is supposed to be printed llvm-svn: 285490
-
- Oct 28, 2016
-
-
Matt Arsenault authored
Also add glc bit to the scalar loads since they exist on VI and change the caching behavior. This currently has an assembler bug where the glc bit is incorrectly accepted on SI/CI which do not have it. llvm-svn: 285463
-