Commits · e42e5785ad0c38465d45fe8d292c8e3cf2e260ab · Lorenzo Albano / LLVM bpEVL

Oct 26, 2021

[libomptarget][nfc]Generalise DeviceRTL cmake to allow building for amdgpu · e42e5785

Jon Chesterfield authored Oct 26, 2021

Essentially moves the foreach over sm integers into a macro and instantiates it for nvptx.

NFC in that the macro is not presently instantiated for amdgpu as the corresponding code doesn't compile yet.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D111987

e42e5785

[openmp][lit] Add support to OpenMP lit.cfg for ROCR_VISIBLE_DEVICES env-var · be03ef3e

Ron Lieberman authored Oct 26, 2021

add support for ROCR_VISIBLE_DEVICES similar to name and purpose
as CUDA_VISIBLE_DEVICES

Differential Revision: https://reviews.llvm.org/D112503

be03ef3e

Oct 25, 2021

[libomptarget][NFC] Add comment explaining why we pass argument bases and · 2feafa2e
Georgios Rokos authored Oct 25, 2021
```
offsets as two separate entities to the plugins.
```
2feafa2e
[OpenMP][Offloading] Only get trip count if team construct · 2a30c03c
Shilei Tian authored Oct 25, 2021
```
Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D112475
```
2a30c03c

[OpenMP] libomp: disable definitions of 5.1 atomics for non-x86 arch. · e38a1deb

AndreyChurbanov authored Oct 25, 2021

Declarations of 5.1 atomic entries were added under
"#if KMP_ARCH_X86 || KMP_ARCH_X86_64" in kmp_atomic.h,
but definitions of the functions missed architecture guard in kmp_atomic.cpp.
As a result mangled symbols were available on non-x86 architecture.
The patch eliminates these unexpected symbols from the library.

Differential Revision: https://reviews.llvm.org/D112261

e38a1deb

[OpenMP][OMPT] thread_num determination during execution of nested serialized parallel regions · f41d0854

Vladimir Inđić authored Oct 25, 2021

__ompt_get_task_info_internal function is adapted to support thread_num
determination during the execution of multiple nested serialized
parallel regions enclosed by a regular parallel region.

Consider the following program that contains parallel region R1 executed
by two threads. Let the worker thread T of region R1 executes serialized
parallel regions R2 that encloses another serialized parallel region R3.
Note that the thread T is the master thread of both R2 and R3 regions.

Assume that __ompt_get_task_info_internal function is called with the
argument "ancestor_level == 1" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside
the team of region R2, whose implicit task is at level 1 inside the
hierarchy of active tasks. Since the thread T is the master thread of
region R2, one should expected that "thread_num" takes a value 0.
After the while loop finishes, the following stands: "lwt != NULL",
"prev_lwt == NULL", "prev_team" represents the team information about
the innermost serialized parallel region R3. This results in executing
the assignment "thread_num = prev_team->t.t_master_tid". Note that
"prev_team->t.t_master_tid" was initialized at the moment of
R2’s creation and represents the "thread_num" of the thread T inside
the region R1 which encloses R2. Since the thread T is the worker thread
of the region R1, "the thread_num" takes value 1, which is a contradiction.

This patch proposes to use "lwt" instead of "prev_lwt" when determining
the "thread_num". If "lwt" exists, the task at the requested level belongs
to the serialized parallel region. Since the serialized parallel region
is executed by one thread only, the "thread_num" takes value 0.

Similarly, assume that __ompt_get_task_info_internal function is called
with the argument "ancestor_level == 2" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside the
team of region R1. Since the thread is the worker inside the region R1,
one should expected that "thread_num" takes value 1. After the loop finishes,
the following stands: "lwt == NULL", "prev_lwt != NULL", "prev_team" represents
the team information about the innermost serialized parallel region R3.
This leads to execution of the assignment "thread_num = 0", which causes
a contradiction.

Ignoring the "prev_lwt" leads to executing the assignment
"thread_num = prev_team->t.t_master_tid" instead. From the previous explanation,
it is obvious that "thread_num" takes value 1.

Note that the "prev_lwt" variable is marked as unnecessary and thus removed.

This patch introduces the test case which represents the OpenMP program
described earlier in the summary.

Differential Revision: https://reviews.llvm.org/D110699

f41d0854

[OpenMP][OMPT][clang] task frame support fixed in __kmpc_fork_call · f2410bfb

Vladimir Inđić authored Oct 25, 2021

__kmp_fork_call sets the enter_frame of the active task (th_curren_task)
before new parallel region begins. After the region is finished, the
enter_frame is cleared.

The old implementation of __kmpc_fork_call didn’t clear the enter_frame of
active task.

Also, the way of initializing the enter_frame of the active task was wrong.
Consider the following two OpenMP programs.

The first program: Let R1 be the serialized parallel region that encloses
another serialized parallel region R2. Assume that thread that executes R2 is
going to create a new serialized parallel region R3 by executing
__kmpc_fork_call. This thread is responsible to set enter_frame of R2's
implicit task. Note that the information about R2's implicit task is present
inside master_th->th.th_current_task at this moment, while lwt represents the
information about R1's implicit task. The old implementation uses lwt and
resets enter_frame of R1's implicit task instead of R2's implicit task. The
new implementation uses master_th->th.th_current_task instead.

The second program: Consider the OpenMP program that contains parallel region
R1 which encloses an explicit task T. Assume that thread should create another
parallel region R2 during the execution of the T. The __kmpc_fork_call is
responsible to create R2 and set enter frame of T whose information is present
inside the master_th->th.th_current_task.
Old implementation tries to set the frame of
parent_team->t.t_implicit_task_taskdata[tid] which corresponds to the implicit
task of the R1, instead of T.

Differential Revision: https://reviews.llvm.org/D112419

f2410bfb

[OpenMP][Tests] Test omp_get_wtime for invariants · 73682279

Joachim Protze authored Oct 25, 2021

As discussed in D108488, testing for invariants of omp_get_wtime would be more
reliable than testing for duration of sleep, as return from sleep might be
delayed due to system load.

Alternatively/in addition, we could compare the time measured by omp_get_wtime
to time measured with C++11 chrono (for portability?).

Differential Revision: https://reviews.llvm.org/D112458

73682279

[OpenMP][Tests][NFC] Actually check for test outcome · 3f229f42

Joachim Protze authored Oct 25, 2021

The CHECK: line in the test had no effect, because the test does not
pipe to FileCheck. Since the test only checks for a single value,
encode the result in the return value of the test.

3f229f42

[OpenMP][Tests][NFC] Mark tests trying to link COI as unsupported · 047890bc

Joachim Protze authored Oct 25, 2021

For some tests with target-related functionality icc 18/19 tries to link
libioffload_target.so.5, which fails for missing COI symbols.

047890bc

[OpenMP][Tests][NFC] Replace atomic increment by reduction · d7fdd236
Joachim Protze authored Oct 25, 2021
```
Also mark the test as unsupported by intel-21, because the test does
not terminate
```
d7fdd236

[OpenMP][Tools][NFC] Fix C99-style declaration of iteration variables · 38f78dd2

Joachim Protze authored Oct 25, 2021

Where possible change to declare the variable before the loop.
Where not possible, specifically request -std=c99 (could be limited to
specific compilers like icc).

38f78dd2

[OpenMP][Tools][NFC] Pass intel license ENV to lit · d29a7d23
Joachim Protze authored Oct 25, 2021

d29a7d23

Oct 23, 2021

Ensure newlines at the end of files (NFC) · d8e4170b
Kazu Hirata authored Oct 23, 2021

d8e4170b

[libomptarget] Run GPU offloading tests on both new and old runtime · bf6f955f

Jon Chesterfield authored Oct 22, 2021

Implemented by patching python config instead of modifying all
the tests so that -generic and XFAIL work as usual. Expectation is for
this to be reverted once the old runtime is deleted.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D112225

bf6f955f

Oct 22, 2021

[OpenMP][OMPT][GOMP] task frame support in KMP_API_NAME_GOMP_PARALLEL_SECTIONS · ba02586f

Vladimir Inđić authored Oct 21, 2021

KMP_API_NAME_GOMP_PARALLEL_SECTIONS function was missing the task frame support.
This patch introduced a fix responsible to set properly the exit_frame of
the innermost implicit task that corresponds to the parallel section construct,
as well as the enter_frame of the task that encloses the mentioned implicit task.

This patch also introduced a simple test case sections_serialized.c that contains
serialized parallel section construct and validates whether the mentioned
task frames are set correctly.

Differential Revision: https://reviews.llvm.org/D112205

ba02586f

Oct 21, 2021

[OpenMP][NFC] skip atomic tests for non-x86 arch · 52f4922e
AndreyChurbanov authored Oct 21, 2021

52f4922e

[libomptarget][DeviceRTL] Generalise and simplify cmakelists · a602c2b5

Jon Chesterfield authored Oct 21, 2021

Step towards building the DeviceRTL for amdgpu.

Mostly replaces cuda-specific toolchain finding logic with the
generic logic currently found in the amdgpu deviceRTL cmake. Also
deletes dead code and changes the default to build on systems
without cuda installed, as the library doesn't use cuda and the
amdgpu-only systems generally won't have cuda installed.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D111983

a602c2b5

Oct 20, 2021

[OpenMP] Add GOMP allocator functions · 99d1ce4a

Nawrin Sultana authored Oct 12, 2021

This patch adds GOMP_alloc and GOMP_free functions of LIBGOMP.

Differential revision: https://reviews.llvm.org/D111673

99d1ce4a

Oct 19, 2021

[OpenMP] Remove macro guards for device debugging · b1ce4549

Joseph Huber authored Oct 19, 2021

The plugin currently uses a macro to check if this is a debug built
before assigning the debug kind variable to the device environment
struct. This is being deprecated because the new device runtime does not
maintain separate debug builds and should always be availible.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112083

b1ce4549

[libomptarget] Refactor DeviceRTL prior to AMDGPU bringup · 7272982e

Jon Chesterfield authored Oct 19, 2021

Subset of D111993. Fix typos, rename read to load.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111999

7272982e

Oct 18, 2021

[OpenMP] libomp: add check of task function pointer for NULL. · 63f8099e

AndreyChurbanov authored Oct 18, 2021

This patch allows to simplify compiler implementation on "taskwait nowait"
construct. The "taskwait nowait" is semantically equivalent to the empty task.
Instead of creating an empty routine as a task entry, compiler can just send
NULL pointer to the runtime. Then the runtime will make all the work with
dependences and return because of the absent task routine.

Differential Revision: https://reviews.llvm.org/D112015

63f8099e

[libomptarget] Pass OMP_TARGET_OFFLOAD env variable through to tests · 251b1e7c

Jon Chesterfield authored Oct 18, 2021

Useful for OMP_TARGET_OFFLOAD=MANDATORY when testing

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D111995

251b1e7c

[OpenMP][OMPT] thread_num determination for programs with explicit tasks · 59a994e8

@vladaindjic authored Oct 18, 2021

__ompt_get_task_info_internal is now able to determine the right value of the
“thread_num” argument during the execution of an explicit task.

During the execution of a while loop that iterates over the ancestor tasks
hierarchy, the “prev_team” variable was always set to “team” variable at the
beginning of each loop iteration.

Assume that the program contains a parallel region which encloses an explicit
task executed by the worker thread of the region. Also assume that the tool
inquires the “thread_num” of a worker thread for the implicit task that
corresponds to the region (task at “ancestor_level == 1”) and expects to
receive the value of “thread_num > 0”.
After the loop finishes, both “team” and “prev_team” variables are equal and
point to the team information of the parallel region.
The “thread_num” is set to “prev_team->t.t_master_tid”, that is equal to
“team->t.t_master_tid”. In this case, “team->t.t_master_tid” is 0, since
the master thread of the region is the initial master thread of the program.
This leads to a contradiction.

To prevent this, “prev_team” variable is set to “team” variable only at the
time when the loop that has already encountered the implicit task (“taskdata”
variable contains the information about an implicit task) continues iterating
over the implicit task’s ancestors, if any.

After the mentioned loop finishes, the “prev_team” variable might be equal to
NULL. This means that the task at requested “ancestor_level” belongs to the
innermost parallel region, so the “thread_num” will be determined by calling
the “__kmp_get_tid”.

To prove that this patch works, the test case “explicit_task_thread_num.c” is
provided.
It contains the example of the program explained earlier in the summary.

Differential Revision: https://reviews.llvm.org/D110473

59a994e8

[OpenMP][Tests][NFC] Work around ICC bug · c93fb143

Joachim Protze authored Oct 18, 2021

Older intel compilers miss the privatization of nested loop variables for
doacross loops. Declaring the variable in the loop makes the test more
robust.

c93fb143

[OpenMP][Tests][NFC] Flagging OMPT tests as XFAIL for Intel compilers · 59186882
Joachim Protze authored Oct 18, 2021
```
With Intel 19 compiler the teams tests fail to link while trying to link
liboffload.
```
59186882

Oct 16, 2021

[OpenMP][deviceRTLs] Fix wrong return value of `__kmpc_is_spmd_exec_mode` · 2c941fa2

Shilei Tian authored Oct 16, 2021

D110279 introduced a bug to the device runtime. In `__kmpc_parallel_51`, we detect
whether we are already in parallel region by `__kmpc_parallel_level() > __kmpc_is_spmd_exec_mode()`.
It is based on the assumption that:
- In SPMD mode, parallel level is initialized to 1.
- In generic mode, parallel level is initialized to 0.
- `__kmpc_is_spmd_exec_mode` returns `1` for SPMD mode, 0 otherwise.

Because the return value type of `__kmpc_is_spmd_exec_mode` is `int8_t`, there
was an implicit cast from `bool` to `int8_t`. We can make sure it is either 0 or
1 since C++14. In D110279, the return value is the result of an `and` operation,
which is 2 in SPMD mode. This breaks the assumption in `__kmpc_parallel_51`.

Reviewed By: carlo.bertolli, dpalermo

Differential Revision: https://reviews.llvm.org/D111905

2c941fa2

Oct 15, 2021

[OpenMP][Tools][NFC] Make an Archer test more robust · 26b675d6

Joachim Protze authored Oct 15, 2021

The execution order of the tasks is not fixed, so there is no ordering
for the write accesses. Enforce the ordering that is expected in the check.

26b675d6

Oct 14, 2021

[OpenMP][host runtime] Add initial hybrid CPU support · acb3b187

Peyton, Jonathan L authored Oct 14, 2021

Detect, through CPUID.1A, and show user different core types through
KMP_AFFINITY=verbose mechanism. Offer future runtime optimizations
 __kmp_is_hybrid_cpu() to know whether running on a hybrid system or not.

Differential Revision: https://reviews.llvm.org/D110435

acb3b187

[OpenMP][host runtime] small fixup of RTM CPUID bit check · b840d3ab
Peyton, Jonathan L authored Oct 14, 2021

b840d3ab

[OpenMP][host runtime] Add support for teams affinity · 50b68a3d

Peyton, Jonathan L authored Sep 15, 2021

This patch implements teams affinity on the host.
The default is spread. A user can specify either spread, close, or
primary using KMP_TEAMS_PROC_BIND environment variable. Unlike
OMP_PROC_BIND, KMP_TEAMS_PROC_BIND is only a single value and is not a
list of values. The values follow the same semantics under the OpenMP
specification for parallel regions except T is the number of teams in
a league instead of the number of threads in a parallel region.

Differential Revision: https://reviews.llvm.org/D109921

50b68a3d

Oct 13, 2021

[OpenMP] libomp: add atomic functions for new OpenMP 5.1 atomics. · 621d7a75

AndreyChurbanov authored Oct 13, 2021

Added functions those implement "atomic compare".
Though clang does not use library interfaces to implement OpenMP atomics,
the functions added for consistency.
Also added missed functions for 80-bit floating min/max atomics.

Differential Revision: https://reviews.llvm.org/D110109

621d7a75

[OpenMP] libomp: fix ittnotify usage. · 6e98ec9b

AndreyChurbanov authored Oct 13, 2021

Replaced storing of ittnotify domain array index into
location info structure (which is now read-only) with storing of
(location info address + ittnotify domain + team size) into hash map.
Replaced __kmp_itt_barrier_domains and __kmp_itt_imbalance_domains arrays with
__kmp_itt_barrier_domains hash map; __kmp_itt_region_domains and
__kmp_itt_region_team_size arrays with __kmp_itt_region_domains hash map.
Basic functionality did not change (at least tried to not change).

The patch fixes https://bugs.llvm.org/show_bug.cgi?id=48644.

Differential Revision: https://reviews.llvm.org/D111580

6e98ec9b

[OpenMP] libomp: fix warning on comparison of integer expressions of different signedness · 5e58b63b
AndreyChurbanov authored Oct 13, 2021
```
Replaced macro with global variable of correspondent type.

Differential Revision: https://reviews.llvm.org/D111562
```
5e58b63b

Oct 11, 2021

[OpenMP] libomp: add OpenMP 5.1 memory allocation routines. · f5c0c917

AndreyChurbanov authored Oct 11, 2021

Aligned allocation routines added.
Fortran interfaces added for all allocation routines.

Differential Revision: https://reviews.llvm.org/D110923

f5c0c917

Oct 09, 2021

[libomptarget][amdgpu][NFC] tweak a comment · d022f39d
Ron Lieberman authored Oct 09, 2021

d022f39d

[OpenMP] Add RTL function for getting number of threads in block. · bad44d5f

Joseph Huber authored Oct 08, 2021

This patch adds support for the
`__kmpc_get_hardware_num_threads_in_block` function that returns the
number of threads. This was missing in the new runtime and was used by
the AMDGPU plugin which prevented it from using the new runtime. This
patchs also unified the interface for getting the thread numbers in the
frontend.

Originally authored by jdoerfert.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D111475

bad44d5f

[OpenMP] Avoid calling `isSPMDMode` during RT initialization · 85ad5663

Joseph Huber authored Oct 08, 2021

Until we hit the first barrier we should not call `mapping::isSPMDMode`
with all threads. Instead, we now have (and use during initialization) a
`mapping::isMainThreadInGenericMode` overload that takes the known
SPMD-mode state and one that queries it.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111381

85ad5663

Oct 08, 2021

[Libomptarget] Add an external interface to dynamic shared memory · 208f9005

Joseph Huber authored Oct 01, 2021

This patch adds an external interface to access the dynamic shared
memory buffer in the device runtime. The function introduced is
``llvm_omp_get_dynamic_shared``. This includes a host-side
definition that only returns a null pointer so that it can be used when
host-fallback is enabled without crashing. Support for dynamic shared
memory was also ported to the old device runtime.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110957

208f9005

[OpenMP][NVPTX] Fix an error in configuring #teams and #threads · c060c634
Shilei Tian authored Oct 08, 2021
```
It must be a copy mistake.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D111407
```
c060c634