- Jan 21, 2022
-
-
Joseph Huber authored
This patch changes the visibility for all construct in the new device RTL to be hidden by default. This is done after the changes introduced in D117806 changed the visibility from being hidden by default for all device compilations. This asserts that the visibility for the device runtime library will be hidden except for the internal environment variable. This is done to aid optimization and linking of the device library. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D117807
-
- Jan 20, 2022
-
-
Joseph Huber authored
The OpenMP offloading libraries are built with fixed triples and linked in during compile time. This would cause un-helpful errors if the user passed in the wrong expansion of the triple used for the bitcode library. because we only support these triples for OpenMP offloading we can normalize them to the full verion used in the bitcode library. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D117634
-
- Jan 19, 2022
-
-
Joseph Huber authored
After the changes in D117362 made variables declared inside of a target declare directive visible outside the plugin, some variables inside the runtime were given visiblity that conflicted with their address space type. This caused problems when shared or local memory was made externally visible. This patch fixes this issue by making these varialbes static within the module, therefore limiting their visibility to being internal. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D117526
-
- Jan 18, 2022
-
-
Joseph Huber authored
Reverting to investigate break on AMDGPU. This reverts commit 0203ff19.
-
Joseph Huber authored
After the changes in D117362 made variables declared inside of a target declare directive visible outside the plugin, some variables inside the runtime were given visiblity that conflicted with their address space type. This caused problems when shared or local memory was made externally visible. This patch fixes this issue by making these varialbes static within the module, therefore limiting their visibility to being internal. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D117526
-
- Jan 17, 2022
-
-
Joseph Huber authored
This patch adds the `cold` attribute to the keepAlive functions in the RTL. This dummy function exists to keep certain RTL calls alive without them being optimized out, but it is never called and can be declared cold. This also helps some erroneous remarks being given on this function because it has weak linkage and cannot be made internal. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D117513
-
- Jan 13, 2022
-
-
Jon Chesterfield authored
Fixes github issues/52910 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D117230
-
Joseph Huber authored
This patch adds the `weak` identifier to the openmp device environment variable. The changes introduced in https://reviews.llvm.org/D117211 result in multiply defined symbols. Because the symbol is potentially included multiple times for each offloading file we will get symbol colisions, and because it needs to have external visiblity it should be weak. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D117231
-
Jon Chesterfield authored
D97446 changed the behaviour of 'used'. Compensate. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D117211
-
- Dec 27, 2021
-
-
Joseph Huber authored
This patch changes the default aligntment from 8 to 16, and encodes this information in the `__kmpc_alloc_shared` runtime call to communicate it to the HeapToStack pass. The previous alignment of 8 was not sufficient for the maximum size of primitive types on 64-bit systems, and needs to be increaesd. This reduces the amount of space availible in the data sharing stack, so this implementation will need to be improved later to include the alignment requirements in the allocation call, and use it properly in the data sharing stack in the runtime. Depends on D115888 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115971
-
- Dec 09, 2021
-
-
Joseph Huber authored
The problem with the old scheme is that we would need to keep track of the "next region" and reset the num_threads value after it. The new RT doesn't do it and an assertion is triggered. The old RT doesn't do it either, I haven't tested it but I assume a num_threads clause might impact multiple parallel regions "accidentally". Further, in SPMD mode num_threads was simply ignored, for some reason beyond me. In any case, parallel_51 is designed to take the clause value directly, so let's do that instead. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D113623
-
- Nov 30, 2021
-
-
Jon Chesterfield authored
-
- Nov 16, 2021
-
-
Joseph Huber authored
The RAII class used for debugging RTL entry used a shared variable to keep track of the current depth. This used a global initializer, which isn't supported on AMDGPU. This patch removes the initializer and instead sets it to zero when the state is initialized in the runtime. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D113963
-
- Nov 12, 2021
-
-
Joel E. Denny authored
Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113602
-
- Nov 10, 2021
-
-
Jon Chesterfield authored
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf` which takes the same const char*, void* arguments as cuda vprintf and also passes the size of the void* alloca which will be needed by a non-stub implementation of `__llvm_omp_vprintf` for amdgpu. This removes the amdgpu link error on any printf in a target region in favour of silently compiling code that doesn't print anything to stdout. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112680
-
- Nov 09, 2021
-
-
Atmn Patel authored
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit cleaner. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113421
-
Atmn Patel authored
This reverts commit 81a7cad2.
-
Atmn Patel authored
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit cleaner. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113421
-
- Nov 08, 2021
-
-
Jon Chesterfield authored
This reverts commit db81d8f6.
-
Jon Chesterfield authored
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf` which takes the same const char*, void* arguments as cuda vprintf and also passes the size of the void* alloca which will be needed by a non-stub implementation of `__llvm_omp_vprintf` for amdgpu. This removes the amdgpu link error on any printf in a target region in favour of silently compiling code that doesn't print anything to stdout. The exact set of changes to check-openmp probably needs revision before commit Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112680
-
- Nov 04, 2021
-
-
Johannes Doerfert authored
Reviewed By: carlo.bertolli Differential Revision: https://reviews.llvm.org/D113111
-
Johannes Doerfert authored
Minimize the `impl` interface and clean up some uses of mapping functions. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D112154
-
- Nov 03, 2021
-
-
Johannes Doerfert authored
Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was OK to be used in the custom state machine. Now that SPMD barriers are assumed to be aligned we need to use a "generic" barrier in places that are not aligned. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112893
-
Johannes Doerfert authored
When we pick state 0 to initialize state but thread N is going to be the "main thread", in generic mode, we would require extra synchronization. Instead, we should pick the main thread to initialize state in generic mode and any thread in SPMD mode. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112874
-
- Oct 30, 2021
-
-
Shilei Tian authored
The synchronization at the end of parallel region cannot make sure all threads exit the scope. As a result, the assertions right after it might be hit, and further the `state::assumeInitialState(IsSPMD)` in `__kmpc_target_deinit` may not hold as well. We either add a synchronization right after the parallel region, or remove the assertions and assuptions. Here we choose the first one as those assertions and assumptions can help optimizations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112861
-
- Oct 29, 2021
-
-
Joseph Huber authored
Summary: A previous patch changed the check and mistakenly only did `!expr` when this is a macro expansion and could only apply to the left side of an expression.
-
Joseph Huber authored
This patch changes the `assert_assume` function used for internal assumptions in the device runtime to use a more standard formatting for the assumption message. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112842
-
Joseph Huber authored
A common problem is the device running out of global heap memory and crashing due to a nullptr dereference when using the data sharing stack. This explicitly checks that a nullptr was not returned by malloc when debugging field 1 is enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112005
-
Joseph Huber authored
This patch adds support for using function tracing features to track the executino of runtime functions in the device runtime library. This is enabled by first compiling the new runtime with `-fopenmp-target-debug=3` and running with `LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and thread 0 so there isn't much output when using a generic region. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112002
-
- Oct 28, 2021
-
-
Jon Chesterfield authored
Passes same tests as the current deviceRTL. Includes cmake change from D111987. CI is showing a different set of pass/fails to local, committing this without the tests enabled by default while debugging that difference. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227
-
Jon Chesterfield authored
-
Jon Chesterfield authored
- more tests failing on CI than failed locally when writing this patch This reverts commit 33427fdb.
-
Jon Chesterfield authored
-
Jon Chesterfield authored
Passes same tests as the current deviceRTL. Includes cmake change from D111987. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227
-
Johannes Doerfert authored
We do not generate _serialized_parallel calls in device mode, no need for an external API. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112145
-
Johannes Doerfert authored
Exiting a data environment will reset all values, it is wrong to adjust them afterwards. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112144
-
Johannes Doerfert authored
We will later use the fact that a barrier is aligned to reason about thread divergence. For now we introduce the assumption and some more documentation. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112153
-
Johannes Doerfert authored
The OpenMP thread ID is not the hardware thread ID if we have nesting. We need to ask the runtime properly to ensure correct results. Note that the loop interface is going to change soon so we do not adjust it now but simply ignore the extra argument. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111950
-
Johannes Doerfert authored
The team size could/should be an ICV but since we know it is either 1 or a value we can leave it in the team state for now. However, we still need to determine if the current level is nested before we use it. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D111949
-
Johannes Doerfert authored
The first thread state in the new GPU runtime doesn't have a previous one and we should not dereference the nullptr placeholder. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111946
-