Skip to content
  1. Jun 14, 2016
    • Jonathan Peyton's avatar
      Renaming change: 41 -> 45 and 4.1 -> 4.5 · df6818be
      Jonathan Peyton authored
      OpenMP 4.1 is now OpenMP 4.5.  Any mention of 41 or 4.1 is replaced with
      45 or 4.5.  Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that
      41 is deprecated and to use 45 instead.
      
      llvm-svn: 272687
      df6818be
  2. May 31, 2016
  3. May 20, 2016
  4. May 05, 2016
    • Jonathan Peyton's avatar
      [STATS] Use partitioned timer scheme · 11dc82fa
      Jonathan Peyton authored
      This change removes the current timers with ones that partition time properly.
      The current timers are nested, so that if a new timer, B, starts when the
      current timer, A, is already timing, A's time will include B's. To eliminate
      this problem, the partitioned timers are designed to stop the current timer (A),
      let the new timer run (B), and when the new timer is finished, restart the
      previously running timer (A). With this partitioning of time, a threads' timers
      all sum up to the OMP_worker_thread_life time and can now easily show the
      percentage of time a thread is spending in different parts of the runtime or
      user code.
      
      There is also a new state variable associated with each thread which tells where
      it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if
      time is spent in OMP_task_taskwait, then that thread executed tasks inside a
      #pragma omp taskwait construct.
      
      The changes are mostly changing the MACROs to use the new PARITIONED_* macros,
      the new partitionedTimers class and its methods, and new state logic.
      
      Differential Revision: http://reviews.llvm.org/D19229
      
      llvm-svn: 268640
      11dc82fa
  5. Apr 18, 2016
    • Jonathan Peyton's avatar
      Fix trip count calculation for parallel loops in runtime · 5235a1b6
      Jonathan Peyton authored
      The trip count calculation was incorrect for loops with large bounds. For example,
      for(int i=-2,000,000,000; i < 2,000,000,000; i+=50000000), the trip count
      calculation had overflow (trying to calculate 2,000,000,000 + 2,000,000,000 with
      signed integers) and wasn't giving the right value. This patch fixes this error
      in the runtime by using unsigned integers instead. There is still a bug in the
      clang compiler component because it warns that there is overflow in the
      test case file when there isn't. This error isn't there for the Intel Compiler.
      So for now, the test case is designated as XFAIL.
      
      Differential Revision: http://reviews.llvm.org/D19078
      
      llvm-svn: 266677
      5235a1b6
  6. Feb 25, 2016
    • Jonathan Peyton's avatar
      dd new OpenMP 4.5 schedule clause modifiers (monotonic/non-monotonic) feature · ea0fe1df
      Jonathan Peyton authored
      The monotonic/non-monotonic flags are sent to the runtime via the sched_type by
      setting the 30th (non-monotonic) or 29th (monotonic) bit in the sched_type.
      Macros are added to probe if monotonic or non-monotonic is specified
      (SCHEDULE_HAS_[NON]MONOTONIC & SCHEDULE_HAS_NO_MODIFIERS)
      and also to to get the base sched_type (SCHEDULE_WITHOUT_MODIFIERS)
      
      Currently, nothing is done with the modifiers.
      
      Also, this patch adds some comments on the use of the enumerations in at least
       one place where it is subtle.
      
      Differential Revision: http://reviews.llvm.org/D17406
      
      llvm-svn: 261906
      ea0fe1df
  7. Oct 09, 2015
  8. Sep 21, 2015
  9. Aug 11, 2015
    • Jonathan Peyton's avatar
      Tidy statistics collection · 45be4500
      Jonathan Peyton authored
      This removes some statistics counters and timers which were not used,
      adds new counters and timers for some language features that were not
      monitored previously and separates the counters and timers into those
      which are of interest for investigating user code and those which are
      only of interest to the developer of the runtime itself.
      The runtime developer statistics are now ony collected if the
      additional #define KMP_DEVELOPER_STATS is set.
      
      Additional user statistics which are now collected include:
      * Count of nested parallelism (omp parallel inside a parallel region)
      * Count of omp distribute occurrences
      * Count of omp teams occurrences
      * Counts of task related statistics (taskyield, task execution, task
        cancellation, task steal)
      * Values passed to omp_set_numtheads
      * Time spent in omp single and omp master
      
      None of this affects code compiled without stats gathering enabled,
      which is the normal library build mode.
      
      This also fixes the CMake build by linking to the standard c++ library
      when building the stats library as it is a requirement.  The normal library
      does not have this requirement and its link phase is left alone.
      
      Differential Revision: http://reviews.llvm.org/D11759
      
      llvm-svn: 244677
      45be4500
  10. May 23, 2015
  11. May 06, 2015
  12. Apr 29, 2015
  13. Jan 27, 2015
  14. Oct 07, 2014
    • Jim Cownie's avatar
      I apologise in advance for the size of this check-in. At Intel we do · 4cc4bb4c
      Jim Cownie authored
      understand that this is not friendly, and are working to change our
      internal code-development to make it easier to make development
      features available more frequently and in finer (more functional)
      chunks. Unfortunately we haven't got that in place yet, and unpicking
      this into multiple separate check-ins would be non-trivial, so please
      bear with me on this one. We should be better in the future.
      
      Apologies over, what do we have here?
      
      GGC 4.9 compatibility
      --------------------
      * We have implemented the new entrypoints used by code compiled by GCC
      4.9 to implement the same functionality in gcc 4.8. Therefore code
      compiled with gcc 4.9 that used to work will continue to do so.
      However, there are some other new entrypoints (associated with task
      cancellation) which are not implemented. Therefore user code compiled
      by gcc 4.9 that uses these new features will not link against the LLVM
      runtime. (It remains unclear how to handle those entrypoints, since
      the GCC interface has potentially unpleasant performance implications
      for join barriers even when cancellation is not used)
      
      --- new parallel entry points ---
      new entry points that aren't OpenMP 4.0 related
      These are implemented fully :-
            GOMP_parallel_loop_dynamic()
            GOMP_parallel_loop_guided()
            GOMP_parallel_loop_runtime()
            GOMP_parallel_loop_static()
            GOMP_parallel_sections()
            GOMP_parallel()
      
      --- cancellation entry points ---
      Currently, these only give a runtime error if OMP_CANCELLATION is true
      because our plain barriers don't check for cancellation while waiting
              GOMP_barrier_cancel()
              GOMP_cancel()
              GOMP_cancellation_point()
              GOMP_loop_end_cancel()
              GOMP_sections_end_cancel()
      
      --- taskgroup entry points ---
      These are implemented fully.
            GOMP_taskgroup_start()
            GOMP_taskgroup_end()
      
      --- target entry points ---
      These are empty (as they are in libgomp)
           GOMP_target()
           GOMP_target_data()
           GOMP_target_end_data()
           GOMP_target_update()
           GOMP_teams()
      
      Improvements in Barriers and Fork/Join
      --------------------------------------
      * Barrier and fork/join code is now in its own file (which makes it
      easier to understand and modify).
      * Wait/release code is now templated and in its own file; suspend/resume code is also templated
      * There's a new, hierarchical, barrier, which exploits the
      cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve
      fork/join and barrier performance.
      
      ***BEWARE*** the new source files have *not* been added to the legacy
      Cmake build system. If you want to use that fixes wil be required.
      
      Statistics Collection Code
      --------------------------
      * New code has been added to collect application statistics (if this
      is enabled at library compile time; by default it is not). The
      statistics code itself is generally useful, the lightweight timing
      code uses the X86 rdtsc instruction, so will require changes for other
      architectures.
      The intent of this code is not for users to tune their codes but
      rather 
      1) For timing code-paths inside the runtime
      2) For gathering general properties of OpenMP codes to focus attention
      on which OpenMP features are most used. 
      
      Nested Hot Teams
      ----------------
      * The runtime now maintains more state to reduce the overhead of
      creating and destroying inner parallel teams. This improves the
      performance of code that repeatedly uses nested parallelism with the
      same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL
      envirable to a depth to enable this (and, of course, OMP_NESTED=true
      to enable nested parallelism at all).
      
      Improved Intel(r) VTune(Tm) Amplifier support
      ---------------------------------------------
      * The runtime provides additional information to Vtune via the
      itt_notify interface to allow it to display better OpenMP specific
      analyses of load-imbalance.
      
      Support for OpenMP Composite Statements
      ---------------------------------------
      * Implement new entrypoints required by some of the OpenMP 4.1
      composite statements.
      
      Improved ifdefs
      ---------------
      * More separation of concepts ("Does this platform do X?") from
      platforms ("Are we compiling for platform Y?"), which should simplify
      future porting.
      
      
      ScaleMP* contribution
      ---------------------
      Stack padding to improve the performance in their environment where
      cross-node coherency is managed at the page level.
      
      Redesign of wait and release code
      ---------------------------------
      The code is simplified and performance improved.
      
      Bug Fixes
      ---------
          *Fixes for Windows multiple processor groups.
          *Fix Fortran module build on Linux: offload attribute added.
          *Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen.
          *Fix an inconsistent error message for KMP_PLACE_THREADS environment variable.
      
      llvm-svn: 219214
      4cc4bb4c
  15. Sep 27, 2013
Loading