- Nov 04, 2015
-
-
Jonathan Peyton authored
llvm-svn: 252084
-
Jonathan Peyton authored
This is a refactoring of the task_team code that more elegantly handles the two task_team case. Two task_teams per team are kept in use for the lifetime of the team. Thus no reference counting is needed. Differential Revision: http://reviews.llvm.org/D13993 llvm-svn: 252082
-
- Oct 19, 2015
-
-
Dimitry Andric authored
warnings similar to the following: runtime/src/kmp_global.c:117:35: warning: implicit conversion from 'unsigned long' to 'int' changes value from 18446744073709551615 to -1 [-Wconstant-conversion] int __kmp_sys_max_nth = KMP_MAX_NTH; ~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~ runtime/src/kmp.h:849:34: note: expanded from macro 'KMP_MAX_NTH' # define KMP_MAX_NTH PTHREAD_THREADS_MAX ^~~~~~~~~~~~~~~~~~~ Clamp KMP_MAX_NTH to INT_MAX to avoid these warnings. Also use INT_MAX whenever PTHREAD_THREADS_MAX is not defined at all. Differential Revision: http://reviews.llvm.org/D13827 llvm-svn: 250708
-
- Oct 09, 2015
-
-
Jonathan Peyton authored
Because __kmp_task_init_ompt is called for every initial task in each thread and always generated task ids, this was a big performance issue on bigger systems even without any tool attached. After changing the initialization interface to ompt_tool, we can now rely on already knowing whether a tool is attached and OMPT is enabled at this point. Patch by Jonas Hahnfeld Differential Revision: http://reviews.llvm.org/D13494 llvm-svn: 249855
-
- Oct 08, 2015
-
-
Jonathan Peyton authored
Added (optional) sockets to the syntax of the KMP_PLACE_THREADS environment variable. Some limitations: * The number of sockets and then optional offset should be specified first (before other parameters). * The letter designation is mandatory for sockets and then for other parameters. * If number of cores is specified first, then the number of sockets is defaulted to all sockets on the machine; also, the old syntax is partially supported if sockets are skipped. * If number of threads per core is specified first, then the number of sockets and cores per socket are defaulted to all sockets and all cores per socket respectively. * The number of cores per socket cannot be specified before sockets or after threads per core. * The number of threads per core can be specified before or after core-offset (old syntax required it to be before core-offset); * Parameters delimiter can be: empty, comma, lower-case x; * Spaces are allowed around numbers, around letters, around delimiter. Approximate shorthand specification: KMP_PLACE_THREADS="[num_sockets(S|s)[[delim]offset(O|o)][delim]][num_cores_per_socket(C|c)[[delim]offset(O|o)][delim]][num_threads_per_core(T|t)]" Differential Revision: http://reviews.llvm.org/D13175 llvm-svn: 249708
-
- Sep 21, 2015
-
-
Joerg Sonnenberger authored
llvm-svn: 248204
-
Joerg Sonnenberger authored
simplify conditional. llvm-svn: 248199
-
- Sep 10, 2015
-
-
Jonathan Peyton authored
Some of this is improvement to code suggested by Hal Finkel. Four changes here: 1.Cleanup of hierarchy code to handle all hierarchy cases whether affinity is available or not 2.Separated this and other classes and common functions out to a header file 3.Added a destructor-like fini function for the hierarchy (and call in __kmp_cleanup) 4.Remove some redundant code that is hopefully no longer needed Differential Revision: http://reviews.llvm.org/D12449 llvm-svn: 247326
-
Jonathan Peyton authored
The fix is to make b_arrived flag 64 bit in both structures - kmp_balign_team_t and kmp_balign_t. Otherwise when flag in kmp_balign_team_t wrapped over UINT_MAX the library hangs. Differential Revision: http://reviews.llvm.org/D12563 llvm-svn: 247320
-
- Aug 31, 2015
-
-
Jonathan Peyton authored
Conditionally include the fork_context parameter to __kmp_join_call() only if OMPT_SUPPORT=1 Differential Revision: http://reviews.llvm.org/D12495 llvm-svn: 246460
-
- Aug 28, 2015
-
-
Jonathan Peyton authored
Currently, the libomp CMake build system uses a Perl script to configure files (tools/expand-vars.pl). This patch replaces the use of the Perl script by using CMake's configure_file() function. The major changes include: 1. *.var has every $KMP_* variable changed to @LIBOMP_*@ 2. kmp_config.h.cmake is a new file which contains all the feature macros and #cmakedefine lines 3. Most of the -D lines have been moved from LibompDefinitions.cmake but some OS specific MACROs (e.g., _GNU_SOURCE) remain. 4. All expand-vars.pl related logic is removed from the CMake files. One important note about this change is that it breaks the old Perl+Makefile build system because it can't create kmp_config.h properly. Differential Review: http://reviews.llvm.org/D12211 llvm-svn: 246314
-
- Aug 13, 2015
-
-
Jonathan Peyton authored
This macro and the small amount of code along with it are unused and can be removed. The macro is never defined in any build script or source file. llvm-svn: 244899
-
- Jul 21, 2015
-
-
Jonathan Peyton authored
This patch makes it possible for a performance tool that uses call stack unwinding to map implementation-level call stacks from master and worker threads into a unified global view. There are several components to this patch. include/*/ompt.h.var Add a new enumeration type that indicates whether the code for a master task for a parallel region is invoked by the user program or the runtime system Change the signature for OMPT parallel begin/end callbacks to indicate whether the master task will be invoked by the program or the runtime system. This enables a performance tool using call stack unwinding to handle these two cases differently. For this case, a profiler that uses call stack unwinding needs to know that the call path prefix for the master task may differ from those available within the begin/end callbacks if the program invokes the master. kmp.h Change the signature for __kmp_join_call to take an additional parameter indicating the fork_context type. This is needed to supply the OMPT parallel end callback with information about whether the compiler or the runtime invoked the master task for a parallel region. kmp_csupport.c Ensure that the OMPT task frame field reenter_runtime_frame is properly set and cleared before and after calls to fork and join threads for a parallel region. Adjust the code for the new signature for __kmp_join_call. Adjust the OMPT parallel begin callback invocations to carry the extra parameter indicating whether the program or the runtime invokes the master task for a parallel region. kmp_gsupport.c Apply all of the analogous changes described for kmp_csupport.c for the GOMP interface Add OMPT support for the GOMP combined parallel region + loop API to maintain the OMPT task frame field reenter_runtime_frame. kmp_runtime.c: Use the new information passed by __kmp_join_call to adjust the OMPT parallel end callback invocations to carry the extra parameter indicating whether the program or the runtime invokes the master task for a parallel region. ompt_internal.h: Use the flavor of the parallel region API (GNU or Intel) to determine who invokes the master task. Differential Revision: http://reviews.llvm.org/D11259 llvm-svn: 242817
-
- Jul 09, 2015
-
-
Jonathan Peyton authored
These changes enable external debuggers to conveniently interface with the LLVM OpenMP Library. Structures are added which describe the important internal structures of the OpenMP Library e.g., teams, threads, etc. This feature is turned on by default (CMake variable LIBOMP_USE_DEBUGGER) and can be turned off with -DLIBOMP_USE_DEBUGGER=off. Differential Revision: http://reviews.llvm.org/D10038 llvm-svn: 241832
-
- Jun 04, 2015
-
-
Jonathan Peyton authored
This change changes kmp_bstate.old_tid to sign integer instead of unsigned integer. It also defines two new macros KMP_NSEC_PER_SEC and KMP_USEC_PER_SEC which lets us take control of the sign (we want them to be longs). Also, in kmp_wait_release.h, the byteref() function's return type is changed from char to unsigned char. llvm-svn: 239057
-
- Jun 03, 2015
-
-
Jonathan Peyton authored
Some old references to RML and IOMP which aren't used anywhere are deleted. http://lists.cs.uiuc.edu/pipermail/openmp-dev/2015-June/000664.html Patch by Jack Howarth and Jonathan Peyton llvm-svn: 238878
-
- Jun 02, 2015
-
-
Jonathan Peyton authored
Getting rid of more iomp references. http://lists.cs.uiuc.edu/pipermail/openmp-dev/2015-June/000659.html llvm-svn: 238847
-
- May 07, 2015
-
-
Andrey Churbanov authored
llvm-svn: 236753
-
- Apr 29, 2015
-
-
Andrey Churbanov authored
These are the actual changes in the runtime to issue OMPT-related functions. All of them are surrounded by #if OMPT_SUPPORT and can be disabled (which is the default). llvm-svn: 236122
-
Andrey Churbanov authored
llvm-svn: 236117
-
- Mar 10, 2015
-
-
Andrey Churbanov authored
llvm-svn: 231778
-
Andrey Churbanov authored
llvm-svn: 231776
-
Andrey Churbanov authored
llvm-svn: 231775
-
Andrey Churbanov authored
llvm-svn: 231773
-
- Feb 20, 2015
-
-
Andrey Churbanov authored
llvm-svn: 230033
-
Andrey Churbanov authored
llvm-svn: 230032
-
Andrey Churbanov authored
llvm-svn: 230029
-
- Feb 10, 2015
-
-
Andrey Churbanov authored
llvm-svn: 228718
-
- Jan 29, 2015
-
-
Andrey Churbanov authored
llvm-svn: 227467
-
- Jan 27, 2015
-
-
Andrey Churbanov authored
llvm-svn: 227207
-
Andrey Churbanov authored
Removes some unused variables (__kmp_ht_*) and changes__kmp_ncores and __kmp_nThreadsPerCore to static globals within kmp_affinity.cpp. llvm-svn: 227201
-
Andrey Churbanov authored
Replaces KMP_OS_WINDOWS && KMP_ARCH_X86_64 or any combination of those two options with the feature macro KMP_GROUP_AFFINITY. llvm-svn: 227199
-
- Jan 13, 2015
-
-
Andrey Churbanov authored
This patch enables the use of KMP_AFFINITY=balanced on non-MIC Architectures. The restriction for using balanced affinity on non-MIC architectures is it only works for one-package machines. llvm-svn: 225794
-
Andrey Churbanov authored
llvm-svn: 225792
-
- Oct 07, 2014
-
-
Jim Cownie authored
understand that this is not friendly, and are working to change our internal code-development to make it easier to make development features available more frequently and in finer (more functional) chunks. Unfortunately we haven't got that in place yet, and unpicking this into multiple separate check-ins would be non-trivial, so please bear with me on this one. We should be better in the future. Apologies over, what do we have here? GGC 4.9 compatibility -------------------- * We have implemented the new entrypoints used by code compiled by GCC 4.9 to implement the same functionality in gcc 4.8. Therefore code compiled with gcc 4.9 that used to work will continue to do so. However, there are some other new entrypoints (associated with task cancellation) which are not implemented. Therefore user code compiled by gcc 4.9 that uses these new features will not link against the LLVM runtime. (It remains unclear how to handle those entrypoints, since the GCC interface has potentially unpleasant performance implications for join barriers even when cancellation is not used) --- new parallel entry points --- new entry points that aren't OpenMP 4.0 related These are implemented fully :- GOMP_parallel_loop_dynamic() GOMP_parallel_loop_guided() GOMP_parallel_loop_runtime() GOMP_parallel_loop_static() GOMP_parallel_sections() GOMP_parallel() --- cancellation entry points --- Currently, these only give a runtime error if OMP_CANCELLATION is true because our plain barriers don't check for cancellation while waiting GOMP_barrier_cancel() GOMP_cancel() GOMP_cancellation_point() GOMP_loop_end_cancel() GOMP_sections_end_cancel() --- taskgroup entry points --- These are implemented fully. GOMP_taskgroup_start() GOMP_taskgroup_end() --- target entry points --- These are empty (as they are in libgomp) GOMP_target() GOMP_target_data() GOMP_target_end_data() GOMP_target_update() GOMP_teams() Improvements in Barriers and Fork/Join -------------------------------------- * Barrier and fork/join code is now in its own file (which makes it easier to understand and modify). * Wait/release code is now templated and in its own file; suspend/resume code is also templated * There's a new, hierarchical, barrier, which exploits the cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve fork/join and barrier performance. ***BEWARE*** the new source files have *not* been added to the legacy Cmake build system. If you want to use that fixes wil be required. Statistics Collection Code -------------------------- * New code has been added to collect application statistics (if this is enabled at library compile time; by default it is not). The statistics code itself is generally useful, the lightweight timing code uses the X86 rdtsc instruction, so will require changes for other architectures. The intent of this code is not for users to tune their codes but rather 1) For timing code-paths inside the runtime 2) For gathering general properties of OpenMP codes to focus attention on which OpenMP features are most used. Nested Hot Teams ---------------- * The runtime now maintains more state to reduce the overhead of creating and destroying inner parallel teams. This improves the performance of code that repeatedly uses nested parallelism with the same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL envirable to a depth to enable this (and, of course, OMP_NESTED=true to enable nested parallelism at all). Improved Intel(r) VTune(Tm) Amplifier support --------------------------------------------- * The runtime provides additional information to Vtune via the itt_notify interface to allow it to display better OpenMP specific analyses of load-imbalance. Support for OpenMP Composite Statements --------------------------------------- * Implement new entrypoints required by some of the OpenMP 4.1 composite statements. Improved ifdefs --------------- * More separation of concepts ("Does this platform do X?") from platforms ("Are we compiling for platform Y?"), which should simplify future porting. ScaleMP* contribution --------------------- Stack padding to improve the performance in their environment where cross-node coherency is managed at the page level. Redesign of wait and release code --------------------------------- The code is simplified and performance improved. Bug Fixes --------- *Fixes for Windows multiple processor groups. *Fix Fortran module build on Linux: offload attribute added. *Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen. *Fix an inconsistent error message for KMP_PLACE_THREADS environment variable. llvm-svn: 219214
-
- Aug 07, 2014
-
-
Jim Cownie authored
llvm-svn: 215093
-
- Mar 02, 2014
-
-
Alp Toker authored
The feature was previously guarded with KMP_OS_LINUX || KMP_OS_WINDOWS but can now be enabled/disabled independently to simplify porting. Completes the work started in r202478. llvm-svn: 202613
-
- Feb 28, 2014
-
-
Alp Toker authored
Port the OpenMP runtime to FreeBSD along with associated build system changes. Also begin to generalize affinity capabilities so they aren't tied explicitly to Windows and Linux. The port builds with stock clang and gmake and has no additional runtime dependencies. All but a handful of the validation suite tests are now passing on FreeBSD 10 x86_64. llvm-svn: 202478
-
- Feb 24, 2014
-
-
Alp Toker authored
llvm-svn: 202018
-
- Dec 23, 2013
-
-
Jim Cownie authored
This release use aligns with Intel(r) Composer XE 2013 SP1 Product Update 2 New features * The library can now be built with clang (though wiht some limitations since clang does not support 128 bit floats) * Support for Vtune analysis of load imbalance * Code contribution from Steven Noonan to build the runtime for ARM* architecture processors * First implementation of runtime API for OpenMP cancellation Bug Fixes * Fixed hang on Windows (only) when using KMP_BLOCKTIME=0 llvm-svn: 197914
-