- Oct 26, 2021
-
-
Jon Chesterfield authored
Essentially moves the foreach over sm integers into a macro and instantiates it for nvptx. NFC in that the macro is not presently instantiated for amdgpu as the corresponding code doesn't compile yet. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111987
-
Ron Lieberman authored
add support for ROCR_VISIBLE_DEVICES similar to name and purpose as CUDA_VISIBLE_DEVICES Differential Revision: https://reviews.llvm.org/D112503
-
- Oct 25, 2021
-
-
Georgios Rokos authored
offsets as two separate entities to the plugins.
-
Shilei Tian authored
Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D112475
-
AndreyChurbanov authored
Declarations of 5.1 atomic entries were added under "#if KMP_ARCH_X86 || KMP_ARCH_X86_64" in kmp_atomic.h, but definitions of the functions missed architecture guard in kmp_atomic.cpp. As a result mangled symbols were available on non-x86 architecture. The patch eliminates these unexpected symbols from the library. Differential Revision: https://reviews.llvm.org/D112261
-
Vladimir Inđić authored
__ompt_get_task_info_internal function is adapted to support thread_num determination during the execution of multiple nested serialized parallel regions enclosed by a regular parallel region. Consider the following program that contains parallel region R1 executed by two threads. Let the worker thread T of region R1 executes serialized parallel regions R2 that encloses another serialized parallel region R3. Note that the thread T is the master thread of both R2 and R3 regions. Assume that __ompt_get_task_info_internal function is called with the argument "ancestor_level == 1" during the execution of region R3. The function should determine the "thread_num" of the thread T inside the team of region R2, whose implicit task is at level 1 inside the hierarchy of active tasks. Since the thread T is the master thread of region R2, one should expected that "thread_num" takes a value 0. After the while loop finishes, the following stands: "lwt != NULL", "prev_lwt == NULL", "prev_team" represents the team information about the innermost serialized parallel region R3. This results in executing the assignment "thread_num = prev_team->t.t_master_tid". Note that "prev_team->t.t_master_tid" was initialized at the moment of R2’s creation and represents the "thread_num" of the thread T inside the region R1 which encloses R2. Since the thread T is the worker thread of the region R1, "the thread_num" takes value 1, which is a contradiction. This patch proposes to use "lwt" instead of "prev_lwt" when determining the "thread_num". If "lwt" exists, the task at the requested level belongs to the serialized parallel region. Since the serialized parallel region is executed by one thread only, the "thread_num" takes value 0. Similarly, assume that __ompt_get_task_info_internal function is called with the argument "ancestor_level == 2" during the execution of region R3. The function should determine the "thread_num" of the thread T inside the team of region R1. Since the thread is the worker inside the region R1, one should expected that "thread_num" takes value 1. After the loop finishes, the following stands: "lwt == NULL", "prev_lwt != NULL", "prev_team" represents the team information about the innermost serialized parallel region R3. This leads to execution of the assignment "thread_num = 0", which causes a contradiction. Ignoring the "prev_lwt" leads to executing the assignment "thread_num = prev_team->t.t_master_tid" instead. From the previous explanation, it is obvious that "thread_num" takes value 1. Note that the "prev_lwt" variable is marked as unnecessary and thus removed. This patch introduces the test case which represents the OpenMP program described earlier in the summary. Differential Revision: https://reviews.llvm.org/D110699
-
Vladimir Inđić authored
__kmp_fork_call sets the enter_frame of the active task (th_curren_task) before new parallel region begins. After the region is finished, the enter_frame is cleared. The old implementation of __kmpc_fork_call didn’t clear the enter_frame of active task. Also, the way of initializing the enter_frame of the active task was wrong. Consider the following two OpenMP programs. The first program: Let R1 be the serialized parallel region that encloses another serialized parallel region R2. Assume that thread that executes R2 is going to create a new serialized parallel region R3 by executing __kmpc_fork_call. This thread is responsible to set enter_frame of R2's implicit task. Note that the information about R2's implicit task is present inside master_th->th.th_current_task at this moment, while lwt represents the information about R1's implicit task. The old implementation uses lwt and resets enter_frame of R1's implicit task instead of R2's implicit task. The new implementation uses master_th->th.th_current_task instead. The second program: Consider the OpenMP program that contains parallel region R1 which encloses an explicit task T. Assume that thread should create another parallel region R2 during the execution of the T. The __kmpc_fork_call is responsible to create R2 and set enter frame of T whose information is present inside the master_th->th.th_current_task. Old implementation tries to set the frame of parent_team->t.t_implicit_task_taskdata[tid] which corresponds to the implicit task of the R1, instead of T. Differential Revision: https://reviews.llvm.org/D112419
-
Joachim Protze authored
As discussed in D108488, testing for invariants of omp_get_wtime would be more reliable than testing for duration of sleep, as return from sleep might be delayed due to system load. Alternatively/in addition, we could compare the time measured by omp_get_wtime to time measured with C++11 chrono (for portability?). Differential Revision: https://reviews.llvm.org/D112458
-
Joachim Protze authored
The CHECK: line in the test had no effect, because the test does not pipe to FileCheck. Since the test only checks for a single value, encode the result in the return value of the test.
-
Joachim Protze authored
For some tests with target-related functionality icc 18/19 tries to link libioffload_target.so.5, which fails for missing COI symbols.
-
Joachim Protze authored
Also mark the test as unsupported by intel-21, because the test does not terminate
-
Joachim Protze authored
Where possible change to declare the variable before the loop. Where not possible, specifically request -std=c99 (could be limited to specific compilers like icc).
-
Joachim Protze authored
-
- Oct 23, 2021
-
-
Kazu Hirata authored
-
Jon Chesterfield authored
Implemented by patching python config instead of modifying all the tests so that -generic and XFAIL work as usual. Expectation is for this to be reverted once the old runtime is deleted. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D112225
-
- Oct 22, 2021
-
-
Vladimir Inđić authored
KMP_API_NAME_GOMP_PARALLEL_SECTIONS function was missing the task frame support. This patch introduced a fix responsible to set properly the exit_frame of the innermost implicit task that corresponds to the parallel section construct, as well as the enter_frame of the task that encloses the mentioned implicit task. This patch also introduced a simple test case sections_serialized.c that contains serialized parallel section construct and validates whether the mentioned task frames are set correctly. Differential Revision: https://reviews.llvm.org/D112205
-
- Oct 21, 2021
-
-
AndreyChurbanov authored
-
Jon Chesterfield authored
Step towards building the DeviceRTL for amdgpu. Mostly replaces cuda-specific toolchain finding logic with the generic logic currently found in the amdgpu deviceRTL cmake. Also deletes dead code and changes the default to build on systems without cuda installed, as the library doesn't use cuda and the amdgpu-only systems generally won't have cuda installed. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111983
-
- Oct 20, 2021
-
-
Nawrin Sultana authored
This patch adds GOMP_alloc and GOMP_free functions of LIBGOMP. Differential revision: https://reviews.llvm.org/D111673
-
- Oct 19, 2021
-
-
Joseph Huber authored
The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This is being deprecated because the new device runtime does not maintain separate debug builds and should always be availible. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112083
-
Jon Chesterfield authored
Subset of D111993. Fix typos, rename read to load. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111999
-
- Oct 18, 2021
-
-
AndreyChurbanov authored
This patch allows to simplify compiler implementation on "taskwait nowait" construct. The "taskwait nowait" is semantically equivalent to the empty task. Instead of creating an empty routine as a task entry, compiler can just send NULL pointer to the runtime. Then the runtime will make all the work with dependences and return because of the absent task routine. Differential Revision: https://reviews.llvm.org/D112015
-
Jon Chesterfield authored
Useful for OMP_TARGET_OFFLOAD=MANDATORY when testing Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D111995
-
@vladaindjic authored
__ompt_get_task_info_internal is now able to determine the right value of the “thread_num” argument during the execution of an explicit task. During the execution of a while loop that iterates over the ancestor tasks hierarchy, the “prev_team” variable was always set to “team” variable at the beginning of each loop iteration. Assume that the program contains a parallel region which encloses an explicit task executed by the worker thread of the region. Also assume that the tool inquires the “thread_num” of a worker thread for the implicit task that corresponds to the region (task at “ancestor_level == 1”) and expects to receive the value of “thread_num > 0”. After the loop finishes, both “team” and “prev_team” variables are equal and point to the team information of the parallel region. The “thread_num” is set to “prev_team->t.t_master_tid”, that is equal to “team->t.t_master_tid”. In this case, “team->t.t_master_tid” is 0, since the master thread of the region is the initial master thread of the program. This leads to a contradiction. To prevent this, “prev_team” variable is set to “team” variable only at the time when the loop that has already encountered the implicit task (“taskdata” variable contains the information about an implicit task) continues iterating over the implicit task’s ancestors, if any. After the mentioned loop finishes, the “prev_team” variable might be equal to NULL. This means that the task at requested “ancestor_level” belongs to the innermost parallel region, so the “thread_num” will be determined by calling the “__kmp_get_tid”. To prove that this patch works, the test case “explicit_task_thread_num.c” is provided. It contains the example of the program explained earlier in the summary. Differential Revision: https://reviews.llvm.org/D110473
-
Joachim Protze authored
Older intel compilers miss the privatization of nested loop variables for doacross loops. Declaring the variable in the loop makes the test more robust.
-
Joachim Protze authored
With Intel 19 compiler the teams tests fail to link while trying to link liboffload.
-
- Oct 16, 2021
-
-
Shilei Tian authored
D110279 introduced a bug to the device runtime. In `__kmpc_parallel_51`, we detect whether we are already in parallel region by `__kmpc_parallel_level() > __kmpc_is_spmd_exec_mode()`. It is based on the assumption that: - In SPMD mode, parallel level is initialized to 1. - In generic mode, parallel level is initialized to 0. - `__kmpc_is_spmd_exec_mode` returns `1` for SPMD mode, 0 otherwise. Because the return value type of `__kmpc_is_spmd_exec_mode` is `int8_t`, there was an implicit cast from `bool` to `int8_t`. We can make sure it is either 0 or 1 since C++14. In D110279, the return value is the result of an `and` operation, which is 2 in SPMD mode. This breaks the assumption in `__kmpc_parallel_51`. Reviewed By: carlo.bertolli, dpalermo Differential Revision: https://reviews.llvm.org/D111905
-
- Oct 15, 2021
-
-
Joachim Protze authored
The execution order of the tasks is not fixed, so there is no ordering for the write accesses. Enforce the ordering that is expected in the check.
-
- Oct 14, 2021
-
-
Peyton, Jonathan L authored
Detect, through CPUID.1A, and show user different core types through KMP_AFFINITY=verbose mechanism. Offer future runtime optimizations __kmp_is_hybrid_cpu() to know whether running on a hybrid system or not. Differential Revision: https://reviews.llvm.org/D110435
-
Peyton, Jonathan L authored
-
Peyton, Jonathan L authored
This patch implements teams affinity on the host. The default is spread. A user can specify either spread, close, or primary using KMP_TEAMS_PROC_BIND environment variable. Unlike OMP_PROC_BIND, KMP_TEAMS_PROC_BIND is only a single value and is not a list of values. The values follow the same semantics under the OpenMP specification for parallel regions except T is the number of teams in a league instead of the number of threads in a parallel region. Differential Revision: https://reviews.llvm.org/D109921
-
- Oct 13, 2021
-
-
AndreyChurbanov authored
Added functions those implement "atomic compare". Though clang does not use library interfaces to implement OpenMP atomics, the functions added for consistency. Also added missed functions for 80-bit floating min/max atomics. Differential Revision: https://reviews.llvm.org/D110109
-
AndreyChurbanov authored
Replaced storing of ittnotify domain array index into location info structure (which is now read-only) with storing of (location info address + ittnotify domain + team size) into hash map. Replaced __kmp_itt_barrier_domains and __kmp_itt_imbalance_domains arrays with __kmp_itt_barrier_domains hash map; __kmp_itt_region_domains and __kmp_itt_region_team_size arrays with __kmp_itt_region_domains hash map. Basic functionality did not change (at least tried to not change). The patch fixes https://bugs.llvm.org/show_bug.cgi?id=48644. Differential Revision: https://reviews.llvm.org/D111580
-
AndreyChurbanov authored
Replaced macro with global variable of correspondent type. Differential Revision: https://reviews.llvm.org/D111562
-
- Oct 11, 2021
-
-
AndreyChurbanov authored
Aligned allocation routines added. Fortran interfaces added for all allocation routines. Differential Revision: https://reviews.llvm.org/D110923
-
- Oct 09, 2021
-
-
Ron Lieberman authored
-
Joseph Huber authored
This patch adds support for the `__kmpc_get_hardware_num_threads_in_block` function that returns the number of threads. This was missing in the new runtime and was used by the AMDGPU plugin which prevented it from using the new runtime. This patchs also unified the interface for getting the thread numbers in the frontend. Originally authored by jdoerfert. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D111475
-
Joseph Huber authored
Until we hit the first barrier we should not call `mapping::isSPMDMode` with all threads. Instead, we now have (and use during initialization) a `mapping::isMainThreadInGenericMode` overload that takes the known SPMD-mode state and one that queries it. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D111381
-
- Oct 08, 2021
-
-
Joseph Huber authored
This patch adds an external interface to access the dynamic shared memory buffer in the device runtime. The function introduced is ``llvm_omp_get_dynamic_shared``. This includes a host-side definition that only returns a null pointer so that it can be used when host-fallback is enabled without crashing. Support for dynamic shared memory was also ported to the old device runtime. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110957
-
Shilei Tian authored
It must be a copy mistake. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111407
-