- Nov 04, 2021
-
-
Johannes Doerfert authored
Reviewed By: carlo.bertolli Differential Revision: https://reviews.llvm.org/D113111
-
Johannes Doerfert authored
Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D113110
-
Johannes Doerfert authored
Minimize the `impl` interface and clean up some uses of mapping functions. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D112154
-
- Nov 03, 2021
-
-
Johannes Doerfert authored
Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was OK to be used in the custom state machine. Now that SPMD barriers are assumed to be aligned we need to use a "generic" barrier in places that are not aligned. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112893
-
Johannes Doerfert authored
When we pick state 0 to initialize state but thread N is going to be the "main thread", in generic mode, we would require extra synchronization. Instead, we should pick the main thread to initialize state in generic mode and any thread in SPMD mode. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112874
-
- Oct 30, 2021
-
-
Shilei Tian authored
The synchronization at the end of parallel region cannot make sure all threads exit the scope. As a result, the assertions right after it might be hit, and further the `state::assumeInitialState(IsSPMD)` in `__kmpc_target_deinit` may not hold as well. We either add a synchronization right after the parallel region, or remove the assertions and assuptions. Here we choose the first one as those assertions and assumptions can help optimizations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112861
-
Kazu Hirata authored
-
- Oct 29, 2021
-
-
Joseph Huber authored
Summary: A previous patch changed the check and mistakenly only did `!expr` when this is a macro expansion and could only apply to the left side of an expression.
-
Joseph Huber authored
This patch changes the `assert_assume` function used for internal assumptions in the device runtime to use a more standard formatting for the assumption message. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112842
-
Joseph Huber authored
A common problem is the device running out of global heap memory and crashing due to a nullptr dereference when using the data sharing stack. This explicitly checks that a nullptr was not returned by malloc when debugging field 1 is enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112005
-
Joseph Huber authored
This patch adds support for using function tracing features to track the executino of runtime functions in the device runtime library. This is enabled by first compiling the new runtime with `-fopenmp-target-debug=3` and running with `LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and thread 0 so there isn't much output when using a generic region. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112002
-
- Oct 28, 2021
-
-
Jon Chesterfield authored
Passes same tests as the current deviceRTL. Includes cmake change from D111987. CI is showing a different set of pass/fails to local, committing this without the tests enabled by default while debugging that difference. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227
-
Jon Chesterfield authored
-
Jon Chesterfield authored
- more tests failing on CI than failed locally when writing this patch This reverts commit 33427fdb.
-
Jon Chesterfield authored
-
Jon Chesterfield authored
Passes same tests as the current deviceRTL. Includes cmake change from D111987. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227
-
Johannes Doerfert authored
We do not generate _serialized_parallel calls in device mode, no need for an external API. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112145
-
Johannes Doerfert authored
Exiting a data environment will reset all values, it is wrong to adjust them afterwards. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112144
-
Johannes Doerfert authored
We will later use the fact that a barrier is aligned to reason about thread divergence. For now we introduce the assumption and some more documentation. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112153
-
Johannes Doerfert authored
The OpenMP thread ID is not the hardware thread ID if we have nesting. We need to ask the runtime properly to ensure correct results. Note that the loop interface is going to change soon so we do not adjust it now but simply ignore the extra argument. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111950
-
Johannes Doerfert authored
The team size could/should be an ICV but since we know it is either 1 or a value we can leave it in the team state for now. However, we still need to determine if the current level is nested before we use it. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D111949
-
Johannes Doerfert authored
The first thread state in the new GPU runtime doesn't have a previous one and we should not dereference the nullptr placeholder. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111946
-
- Oct 26, 2021
-
-
Jon Chesterfield authored
Essentially moves the foreach over sm integers into a macro and instantiates it for nvptx. NFC in that the macro is not presently instantiated for amdgpu as the corresponding code doesn't compile yet. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111987
-
Ron Lieberman authored
add support for ROCR_VISIBLE_DEVICES similar to name and purpose as CUDA_VISIBLE_DEVICES Differential Revision: https://reviews.llvm.org/D112503
-
- Oct 25, 2021
-
-
Georgios Rokos authored
offsets as two separate entities to the plugins.
-
Shilei Tian authored
Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D112475
-
- Oct 23, 2021
-
-
Kazu Hirata authored
-
Jon Chesterfield authored
Implemented by patching python config instead of modifying all the tests so that -generic and XFAIL work as usual. Expectation is for this to be reverted once the old runtime is deleted. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D112225
-
- Oct 21, 2021
-
-
Jon Chesterfield authored
Step towards building the DeviceRTL for amdgpu. Mostly replaces cuda-specific toolchain finding logic with the generic logic currently found in the amdgpu deviceRTL cmake. Also deletes dead code and changes the default to build on systems without cuda installed, as the library doesn't use cuda and the amdgpu-only systems generally won't have cuda installed. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111983
-
- Oct 19, 2021
-
-
Joseph Huber authored
The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This is being deprecated because the new device runtime does not maintain separate debug builds and should always be availible. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112083
-
Jon Chesterfield authored
Subset of D111993. Fix typos, rename read to load. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111999
-
- Oct 18, 2021
-
-
Jon Chesterfield authored
Useful for OMP_TARGET_OFFLOAD=MANDATORY when testing Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111995
-
- Oct 16, 2021
-
-
Shilei Tian authored
D110279 introduced a bug to the device runtime. In `__kmpc_parallel_51`, we detect whether we are already in parallel region by `__kmpc_parallel_level() > __kmpc_is_spmd_exec_mode()`. It is based on the assumption that: - In SPMD mode, parallel level is initialized to 1. - In generic mode, parallel level is initialized to 0. - `__kmpc_is_spmd_exec_mode` returns `1` for SPMD mode, 0 otherwise. Because the return value type of `__kmpc_is_spmd_exec_mode` is `int8_t`, there was an implicit cast from `bool` to `int8_t`. We can make sure it is either 0 or 1 since C++14. In D110279, the return value is the result of an `and` operation, which is 2 in SPMD mode. This breaks the assumption in `__kmpc_parallel_51`. Reviewed By: carlo.bertolli, dpalermo Differential Revision: https://reviews.llvm.org/D111905
-
- Oct 09, 2021
-
-
Ron Lieberman authored
-
Joseph Huber authored
This patch adds support for the `__kmpc_get_hardware_num_threads_in_block` function that returns the number of threads. This was missing in the new runtime and was used by the AMDGPU plugin which prevented it from using the new runtime. This patchs also unified the interface for getting the thread numbers in the frontend. Originally authored by jdoerfert. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D111475
-
Joseph Huber authored
Until we hit the first barrier we should not call `mapping::isSPMDMode` with all threads. Instead, we now have (and use during initialization) a `mapping::isMainThreadInGenericMode` overload that takes the known SPMD-mode state and one that queries it. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111381
-
- Oct 08, 2021
-
-
Joseph Huber authored
This patch adds an external interface to access the dynamic shared memory buffer in the device runtime. The function introduced is ``llvm_omp_get_dynamic_shared``. This includes a host-side definition that only returns a null pointer so that it can be used when host-fallback is enabled without crashing. Support for dynamic shared memory was also ported to the old device runtime. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110957
-
Shilei Tian authored
It must be a copy mistake. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111407
-
Shilei Tian authored
For NVPTX, `printf` can be used just with a function declaration. For AMDGCN, an function definition is added, but it simply returns. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109728
-
Johannes Doerfert authored
We need to synchronize the threads *before* we destroy the RAII objects that hold the old values and not after to avoid threads executing the parallel region but seeing an inconsistent state. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111369
-