- Nov 14, 2016
-
-
Jonathan Peyton authored
Have developer timers use partitioning scheme which also required that some redundant developer timers be removed in favor of the already existing normal timers. Move per thread stats initialization to just after global thread id assignment which is as early as possible. Also put all global stats initialization code in __kmp_stats_init() and all global stats destruction code in __kmp_stats_fini(). Differential Revision: https://reviews.llvm.org/D26361 llvm-svn: 286892
-
- Nov 07, 2016
-
-
Jonas Hahnfeld authored
This patch allows ThreadSanitizer (Tsan) to verify OpenMP programs. It means that no false positive will be reported by Tsan when verifying an OpenMP programs. This patch introduces annotations within the OpenMP runtime module to provide information about thread synchronization to the Tsan runtime. In order to enable the Tsan support when building the runtime, you must enable the TSAN_SUPPORT option with the following environment variable: -DLIBOMP_TSAN_SUPPORT=TRUE The annotations will be enabled in the main shared library (same mechanism of OMPT). Patch by Simone Atzeni and Joachim Protze! Differential Revision: https://reviews.llvm.org/D13072 llvm-svn: 286115
-
- Oct 07, 2016
-
-
Jonathan Peyton authored
This change removes/disables unnecessary code when monitor thread is not used. Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D25102 llvm-svn: 283577
-
- Jun 14, 2016
-
-
Jonathan Peyton authored
OpenMP 4.1 is now OpenMP 4.5. Any mention of 41 or 4.1 is replaced with 45 or 4.5. Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that 41 is deprecated and to use 45 instead. llvm-svn: 272687
-
- May 20, 2016
-
-
Jonathan Peyton authored
This patch doesn't affect D19878's context. So D19878 still cleanly applies. llvm-svn: 270252
-
- May 05, 2016
-
-
Jonathan Peyton authored
This change removes the current timers with ones that partition time properly. The current timers are nested, so that if a new timer, B, starts when the current timer, A, is already timing, A's time will include B's. To eliminate this problem, the partitioned timers are designed to stop the current timer (A), let the new timer run (B), and when the new timer is finished, restart the previously running timer (A). With this partitioning of time, a threads' timers all sum up to the OMP_worker_thread_life time and can now easily show the percentage of time a thread is spending in different parts of the runtime or user code. There is also a new state variable associated with each thread which tells where it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if time is spent in OMP_task_taskwait, then that thread executed tasks inside a #pragma omp taskwait construct. The changes are mostly changing the MACROs to use the new PARITIONED_* macros, the new partitionedTimers class and its methods, and new state logic. Differential Revision: http://reviews.llvm.org/D19229 llvm-svn: 268640
-
- Apr 14, 2016
-
-
Jonathan Peyton authored
ittnotify fix for barrier imbalance time in case tasks exist. In the current implementation, task execution time is included into aggregated time on a barrier. This fix calculates task execution time and corrects the arrive time by subtracting the task execution time. Since __kmp_invoke_task() can not only be called on a barrier, the field th.th_bar_arrive_time is used to check if the function was called at the barrier (th.th_bar_arrive_time != 0). So for this check, th_bar_arrive_time is set to zero right after the value is used on the barrier. Differential Revision: http://reviews.llvm.org/D19030 llvm-svn: 266332
-
- Jan 27, 2016
-
-
Jonathan Peyton authored
Removing references to non-existent functions, fixing typos. llvm-svn: 258985
-
- Nov 12, 2015
-
-
Jonathan Peyton authored
Trace when thread is waiting at join phase for oncore children. llvm-svn: 252954
-
- Nov 09, 2015
-
-
Jonathan Peyton authored
1) When the number of threads in a team increases, new threads need to have all their barrier struct fields initialized. We were missing the parent_bar and team fields. 2) For non-forkjoin barriers, we now do the __kmp_task_team_setup before the gather. The setup now sets up the task_team that all the threads will switch to after the barrier, but it needs to be done before other threads do the switch. 3) Remove an unneeded assignment of tt_found_tasks in task team free function. Differential Revision: http://reviews.llvm.org/D14456 llvm-svn: 252486
-
- Nov 04, 2015
-
-
Jonathan Peyton authored
This is a refactoring of the task_team code that more elegantly handles the two task_team case. Two task_teams per team are kept in use for the lifetime of the team. Thus no reference counting is needed. Differential Revision: http://reviews.llvm.org/D13993 llvm-svn: 252082
-
- Oct 08, 2015
-
-
Jonathan Peyton authored
llvm-svn: 249725
-
Jonathan Peyton authored
These changes improve the wait/release mechanism for threads spinning in barriers that are handling tasks while spinnin by providing feedback to the barriers about any task stealing that occurs. Differential Revision: http://reviews.llvm.org/D13353 llvm-svn: 249711
-
- Sep 21, 2015
-
-
Jonathan Peyton authored
Prior to this change, OMPT had a status flag ompt_status, which could take several values. This was due to an earlier OMPT design that had several levels of enablement (ready, disabled, tracking state, tracking callbacks). The current OMPT design has OMPT support either on or off. This revision replaces ompt_status with a boolean flag ompt_enabled, which simplifies the runtime logic for OMPT. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D12999 llvm-svn: 248189
-
Jonathan Peyton authored
This change adds guards to the code in places where they are missing to enable the OpenMP 3.0 build. Patch by Diego Caballero and Johnny Peyton Mailing List: http://lists.llvm.org/pipermail/openmp-dev/2015-September/000935.html llvm-svn: 248178
-
- Sep 18, 2015
-
-
Jonathan Peyton authored
An ifdef for OMPT_TRACE needs to be OMPT_BLAME so that both instances of a callback are controlled by the same ifdef. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D12911 llvm-svn: 248001
-
- Sep 10, 2015
-
-
Jonathan Peyton authored
The fix is to make b_arrived flag 64 bit in both structures - kmp_balign_team_t and kmp_balign_t. Otherwise when flag in kmp_balign_team_t wrapped over UINT_MAX the library hangs. Differential Revision: http://reviews.llvm.org/D12563 llvm-svn: 247320
-
- Aug 26, 2015
-
-
Jonathan Peyton authored
This change just removes the variables created solely for KMP_DEBUG_ASSERT statements and puts the definition of the removed variables inside the KMP_DEBUG_ASSERT statements. llvm-svn: 246065
-
- Aug 12, 2015
-
-
Jonathan Peyton authored
There was a missing implicit task init for the ICV PUSH case in hierarchical barrier. llvm-svn: 244807
-
- Aug 11, 2015
-
-
Jonathan Peyton authored
This removes some statistics counters and timers which were not used, adds new counters and timers for some language features that were not monitored previously and separates the counters and timers into those which are of interest for investigating user code and those which are only of interest to the developer of the runtime itself. The runtime developer statistics are now ony collected if the additional #define KMP_DEVELOPER_STATS is set. Additional user statistics which are now collected include: * Count of nested parallelism (omp parallel inside a parallel region) * Count of omp distribute occurrences * Count of omp teams occurrences * Counts of task related statistics (taskyield, task execution, task cancellation, task steal) * Values passed to omp_set_numtheads * Time spent in omp single and omp master None of this affects code compiled without stats gathering enabled, which is the normal library build mode. This also fixes the CMake build by linking to the standard c++ library when building the stats library as it is a requirement. The normal library does not have this requirement and its link phase is left alone. Differential Revision: http://reviews.llvm.org/D11759 llvm-svn: 244677
-
- Jul 09, 2015
-
-
Jonathan Peyton authored
A while back, we made an initial change where dangerous C API functions were replaced with macros that translated the dangerous API function calls to safer function calls e.g., sprintf() replaced with KMP_SPRINTF() which translates to sprintf_s() on Windows. Currently, the only operating system where this is applicable is Windows. Unix-like systems are still using the dangerous API e.g., KMP_SPRINTF() translates to sprintf(). Our own testing showed no performance differences. Differential Revision: http://reviews.llvm.org/D9918 llvm-svn: 241833
-
Jonathan Peyton authored
These changes enable external debuggers to conveniently interface with the LLVM OpenMP Library. Structures are added which describe the important internal structures of the OpenMP Library e.g., teams, threads, etc. This feature is turned on by default (CMake variable LIBOMP_USE_DEBUGGER) and can be turned off with -DLIBOMP_USE_DEBUGGER=off. Differential Revision: http://reviews.llvm.org/D10038 llvm-svn: 241832
-
- Jul 01, 2015
-
-
Jonathan Peyton authored
The OMPT status is never equal to ompt_status_track. ompt_status_track = 0x2 and ompt_status_track_callback = 0x6 just share a bit, so that we can check for traceing and callbacks with the same status. Patch by Tim Cramer Differential Revision: http://reviews.llvm.org/D10863 llvm-svn: 241167
-
- Jun 29, 2015
-
-
Jonathan Peyton authored
Fix OMPT support for barriers so that state changes occur even if OMPT_TRACE turned off. These state changes are needed by performance tools that use callbacks for either ompt_event_wait_barrier_begin or ompt_event_wait_barrier_end. Change ifdef flag to OMPT_BLAME for callbacks ompt_event_wait_barrier_begin or ompt_event_wait_barrier_end rather than OMPT_TRACE -- they were misclassified. Without this patch, when the runtime is compiled with LIBOMP_OMPT_SUPPORT=true, LIBOMP_OMPT_BLAME=true, and LIBOMP_OMPT_TRACE=false, and a callback is registered for either ompt_event_wait_barrier_begin or ompt_event_wait_barrier_end, then an assertion will trip. Fix the scoping of one OMPT_TRACE ifdef, which should not have surrounded an update of an OMPT state. Add a missing initialization of an OMPT task id for an implicit task. Patch by John Mellor-Crummey Differential Revision: http://reviews.llvm.org/D10759 llvm-svn: 240970
-
- Jun 08, 2015
-
-
Jonathan Peyton authored
As an ongoing effort to sanitize the openmp code, these changes move variables under already existing macro guards. Patch by Jack Howarth llvm-svn: 239331
-
Jonathan Peyton authored
Some variables are convenient to keep around even if they aren't really used in a release build. This is often seen in DEBUG guarded code where the variable is only used in a DEBUG build. Patch by Jack Howarth llvm-svn: 239326
-
- May 07, 2015
-
-
Andrey Churbanov authored
llvm-svn: 236753
-
- May 06, 2015
-
-
Andrey Churbanov authored
D9302.partial2: cleanup of ittnotify checks, that eliminats redundant notifications in case of nested regions. llvm-svn: 236631
-
Andrey Churbanov authored
llvm-svn: 236623
-
- Apr 29, 2015
-
-
Andrey Churbanov authored
These are the actual changes in the runtime to issue OMPT-related functions. All of them are surrounded by #if OMPT_SUPPORT and can be disabled (which is the default). llvm-svn: 236122
-
- Mar 10, 2015
-
-
Andrey Churbanov authored
llvm-svn: 231776
-
- Feb 10, 2015
-
-
Andrey Churbanov authored
llvm-svn: 228718
-
- Jan 27, 2015
-
-
Andrey Churbanov authored
llvm-svn: 227207
-
Andrey Churbanov authored
Fixed implementation of the teams construct in case it contains parallel regions with different number of threads. llvm-svn: 227198
-
- Jan 13, 2015
-
-
Andrey Churbanov authored
This patch enables the use of KMP_AFFINITY=balanced on non-MIC Architectures. The restriction for using balanced affinity on non-MIC architectures is it only works for one-package machines. llvm-svn: 225794
-
Andrey Churbanov authored
llvm-svn: 225793
-
- Oct 07, 2014
-
-
Jim Cownie authored
understand that this is not friendly, and are working to change our internal code-development to make it easier to make development features available more frequently and in finer (more functional) chunks. Unfortunately we haven't got that in place yet, and unpicking this into multiple separate check-ins would be non-trivial, so please bear with me on this one. We should be better in the future. Apologies over, what do we have here? GGC 4.9 compatibility -------------------- * We have implemented the new entrypoints used by code compiled by GCC 4.9 to implement the same functionality in gcc 4.8. Therefore code compiled with gcc 4.9 that used to work will continue to do so. However, there are some other new entrypoints (associated with task cancellation) which are not implemented. Therefore user code compiled by gcc 4.9 that uses these new features will not link against the LLVM runtime. (It remains unclear how to handle those entrypoints, since the GCC interface has potentially unpleasant performance implications for join barriers even when cancellation is not used) --- new parallel entry points --- new entry points that aren't OpenMP 4.0 related These are implemented fully :- GOMP_parallel_loop_dynamic() GOMP_parallel_loop_guided() GOMP_parallel_loop_runtime() GOMP_parallel_loop_static() GOMP_parallel_sections() GOMP_parallel() --- cancellation entry points --- Currently, these only give a runtime error if OMP_CANCELLATION is true because our plain barriers don't check for cancellation while waiting GOMP_barrier_cancel() GOMP_cancel() GOMP_cancellation_point() GOMP_loop_end_cancel() GOMP_sections_end_cancel() --- taskgroup entry points --- These are implemented fully. GOMP_taskgroup_start() GOMP_taskgroup_end() --- target entry points --- These are empty (as they are in libgomp) GOMP_target() GOMP_target_data() GOMP_target_end_data() GOMP_target_update() GOMP_teams() Improvements in Barriers and Fork/Join -------------------------------------- * Barrier and fork/join code is now in its own file (which makes it easier to understand and modify). * Wait/release code is now templated and in its own file; suspend/resume code is also templated * There's a new, hierarchical, barrier, which exploits the cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve fork/join and barrier performance. ***BEWARE*** the new source files have *not* been added to the legacy Cmake build system. If you want to use that fixes wil be required. Statistics Collection Code -------------------------- * New code has been added to collect application statistics (if this is enabled at library compile time; by default it is not). The statistics code itself is generally useful, the lightweight timing code uses the X86 rdtsc instruction, so will require changes for other architectures. The intent of this code is not for users to tune their codes but rather 1) For timing code-paths inside the runtime 2) For gathering general properties of OpenMP codes to focus attention on which OpenMP features are most used. Nested Hot Teams ---------------- * The runtime now maintains more state to reduce the overhead of creating and destroying inner parallel teams. This improves the performance of code that repeatedly uses nested parallelism with the same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL envirable to a depth to enable this (and, of course, OMP_NESTED=true to enable nested parallelism at all). Improved Intel(r) VTune(Tm) Amplifier support --------------------------------------------- * The runtime provides additional information to Vtune via the itt_notify interface to allow it to display better OpenMP specific analyses of load-imbalance. Support for OpenMP Composite Statements --------------------------------------- * Implement new entrypoints required by some of the OpenMP 4.1 composite statements. Improved ifdefs --------------- * More separation of concepts ("Does this platform do X?") from platforms ("Are we compiling for platform Y?"), which should simplify future porting. ScaleMP* contribution --------------------- Stack padding to improve the performance in their environment where cross-node coherency is managed at the page level. Redesign of wait and release code --------------------------------- The code is simplified and performance improved. Bug Fixes --------- *Fixes for Windows multiple processor groups. *Fix Fortran module build on Linux: offload attribute added. *Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen. *Fix an inconsistent error message for KMP_PLACE_THREADS environment variable. llvm-svn: 219214
-