Skip to content
  1. Mar 07, 2022
    • Jan Svoboda's avatar
      [clang][modules] Report module maps affecting `no_undeclared_includes` modules · b45888e9
      Jan Svoboda authored
      Since D106876, PCM files don't report module maps as input files unless they contributed to the compilation.
      
      Reporting only module maps of (transitively) imported modules is not enough, though. For modules marked with `[no_undeclared_includes]`, other module maps affect the compilation by introducing anti-dependencies.
      
      This patch makes sure such module maps are being reported as input files.
      
      Depends on D120463.
      
      Reviewed By: dexonsmith
      
      Differential Revision: https://reviews.llvm.org/D120464
      b45888e9
    • Jan Svoboda's avatar
      [clang][modules] NFC: Simplify and clarify test · 242b24c1
      Jan Svoboda authored
      This patch simplifies a test that checks only used module map files are reported as input files in PCM files.
      
      Instead of using opaque `diff`, this patch uses `clang -module-file-info` and `FileCheck` to verify this.
      
      Reviewed By: dexonsmith
      
      Differential Revision: https://reviews.llvm.org/D120463
      242b24c1
    • Qiu Chaofan's avatar
      [PowerPC] Add generic fnmsub intrinsic · b2497e54
      Qiu Chaofan authored
      Currently in Clang, we have two types of builtins for fnmsub operation:
      one for float/double vector, they'll be transformed into IR operations;
      one for float/double scalar, they'll generate corresponding intrinsics.
      
      But for the vector version of builtin, the 3 op chain may be recognized
      as expensive by some passes (like early cse). We need some way to keep
      the fnmsub form until code generation.
      
      This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
      intrinsics.
      
      Reviewed By: shchenz
      
      Differential Revision: https://reviews.llvm.org/D116015
      b2497e54
    • William S. Moses's avatar
      [OpenMPIRBuilder] Allocate temporary at the correct block in a nested parallel · 87ec6f41
      William S. Moses authored
      The OpenMPIRBuilder has a bug. Specifically, suppose you have two nested openmp parallel regions (writing with MLIR for ease)
      
      ```
      omp.parallel {
        %a = ...
        omp.parallel {
          use(%a)
        }
      }
      ```
      
      As OpenMP only permits pointer-like inputs, the builder will wrap all of the inputs into a stack allocation, and then pass this
      allocation to the inner parallel. For example, we would want to get something like the following:
      
      ```
      omp.parallel {
        %a = ...
        %tmp = alloc
        store %tmp[] = %a
        kmpc_fork(outlined, %tmp)
      }
      ```
      
      However, in practice, this is not what currently occurs in the context of nested parallel regions. Specifically to the OpenMPIRBuilder,
      the entirety of the function (at the LLVM level) is currently inlined with blocks marking the corresponding start and end of each
      region.
      
      ```
      entry:
        ...
      
      parallel1:
        %a = ...
        ...
      
      parallel2:
        use(%a)
        ...
      
      endparallel2:
        ...
      
      endparallel1:
        ...
      ```
      
      When the allocation is inserted, it presently inserted into the parent of the entire function (e.g. entry) rather than the parent
      allocation scope to the function being outlined. If we were outlining parallel2, the corresponding alloca location would be parallel1.
      
      This causes a variety of bugs, including https://github.com/llvm/llvm-project/issues/54165 as one example.
      
      This PR allows the stack allocation to be created at the correct allocation block, and thus remedies such issues.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D121061
      87ec6f41
  2. Mar 05, 2022
  3. Mar 04, 2022
  4. Mar 03, 2022
  5. Mar 02, 2022
  6. Mar 01, 2022
    • Florian Mayer's avatar
      [HWASAN] erase lifetime intrinsics if tag is outside. · 1d730d80
      Florian Mayer authored
      Reviewed By: eugenis
      
      Differential Revision: https://reviews.llvm.org/D120437
      1d730d80
    • Nicolas Miller's avatar
      [NVPTX] Add ex2.approx.f16/f16x2 support · 510fd283
      Nicolas Miller authored
      NOTE: this is a follow-up commit with the missing clang-side changes.
      
      This patch adds builtins and intrinsics for the f16 and f16x2 variants of the ex2
      instruction.
      
      These two variants were added in PTX7.0, and are supported by sm_75 and above.
      
      Note that this isn't wired with the exp2 llvm intrinsic because the ex2
      instruction is only available in its approx variant.
      
      Running ptxas on the assembly generated by the test f16-ex2.ll works as
      expected.
      
      Differential Revision: https://reviews.llvm.org/D119157
      510fd283
    • Jakub Chlanda's avatar
      [NVPTX] Add more FMA intriniscs/builtins · a8951823
      Jakub Chlanda authored
      This patch adds builtins/intrinsics for the following variants of FMA:
      
      NOTE: follow-up commit with the missing clang-side changes.
      
      - f16, f16x2
        - rn
        - rn_ftz
        - rn_sat
        - rn_ftz_sat
        - rn_relu
        - rn_ftz_relu
      - bf16, bf16x2
        - rn
        - rn_relu
      
      ptxas (Cuda compilation tools, release 11.0, V11.0.194) is happy with the generated assembly.
      
      Differential Revision: https://reviews.llvm.org/D118977
      a8951823
    • Jakub Chlanda's avatar
      [NVPTX] Expose float tys min, max, abs, neg as builtins · 7a6d692b
      Jakub Chlanda authored
      Adds support for the following builtins:
      
      abs, neg:
      - .bf16,
      - .bf16x2
      min, max
      - {.ftz}{.NaN}{.xorsign.abs}.f16
      - {.ftz}{.NaN}{.xorsign.abs}.f16x2
      - {.NaN}{.xorsign.abs}.bf16
      - {.NaN}{.xorsign.abs}.bf16x2
      - {.ftz}{.NaN}{.xorsign.abs}.f32
      
      Differential Revision: https://reviews.llvm.org/D117887
      7a6d692b
    • Tong Zhang's avatar
      [SanitizerBounds] Add support for NoSanitizeBounds function · 17ce89fa
      Tong Zhang authored
      Currently adding attribute no_sanitize("bounds") isn't disabling
      -fsanitize=local-bounds (also enabled in -fsanitize=bounds). The Clang
      frontend handles fsanitize=array-bounds which can already be disabled by
      no_sanitize("bounds"). However, instrumentation added by the
      BoundsChecking pass in the middle-end cannot be disabled by the
      attribute.
      
      The fix is very similar to D102772 that added the ability to selectively
      disable sanitizer pass on certain functions.
      
      In this patch, if no_sanitize("bounds") is provided, an additional
      function attribute (NoSanitizeBounds) is attached to IR to let the
      BoundsChecking pass know we want to disable local-bounds checking. In
      order to support this feature, the IR is extended (similar to D102772)
      to make Clang able to preserve the information and let BoundsChecking
      pass know bounds checking is disabled for certain function.
      
      Reviewed By: melver
      
      Differential Revision: https://reviews.llvm.org/D119816
      17ce89fa
    • Egor Zhdan's avatar
      [Clang] Add `-funstable` flag to enable unstable and experimental features · 3cdc1c15
      Egor Zhdan authored
      This new flag enables `__has_feature(cxx_unstable)` that would replace libc++ macros for individual unstable/experimental features, e.g. `_LIBCPP_HAS_NO_INCOMPLETE_RANGES` or `_LIBCPP_HAS_NO_INCOMPLETE_FORMAT`.
      
      This would make it easier and more convenient to opt-in into all libc++ unstable features at once.
      
      Differential Revision: https://reviews.llvm.org/D120160
      3cdc1c15
    • Kristina Bessonova's avatar
      [NVPTX] Fix nvvm.match.sync*.i64 intrinsics return type (i64 -> i32) · 57aaab3b
      Kristina Bessonova authored
      NVVM IR specification defines them with i32 return type:
      
        declare i32 @llvm.nvvm.match.any.sync.i64(i32 %membermask, i64 %value)
        declare {i32, i1} @llvm.nvvm.match.all.sync.i64(i32 %membermask, i64 %value)
        ...
        The i32 return value is a 32-bit mask where bit position in mask corresponds
        to thread’s laneid.
      
      as well as PTX ISA:
      
        9.7.12.8. Parallel Synchronization and Communication Instructions: match.sync
      
        match.any.sync.type  d, a, membermask;
        match.all.sync.type  d[|p], a, membermask;
        ...
        Destination d is a 32-bit mask where bit position in mask corresponds
        to thread’s laneid.
      
      Additionally, ptxas doesn't accept intructions, produced by NVPTX backend.
      After this patch, it compiles with no issues.
      
      Reviewed By: tra
      
      Differential Revision: https://reviews.llvm.org/D120499
      57aaab3b
    • Iain Sandoe's avatar
      [C++20][Modules][8/8] Amend module visibility rules for partitions. · a29f8dbb
      Iain Sandoe authored
      Implementation partitions bring two extra cases where we have
      visibility of module-private data.
      
      1) When we import a module implementation partition.
      2) When a partition implementation imports the primary module intertace.
      
      We maintain a record of direct imports into the current module since
      partition decls from direct imports (but not trasitive ones) are visible.
      
      The rules on decl-reachability are much more relaxed (with the standard
      giving permission for an implementation to load dependent modules and for
      the decls there to be reachable, but not visible).
      
      Differential Revision: https://reviews.llvm.org/D118599
      a29f8dbb
    • Balázs Kéri's avatar
      [clang][analyzer] Add modeling of 'errno'. · d8a2afb2
      Balázs Kéri authored
      Add a checker to maintain the system-defined value 'errno'.
      The value is supposed to be set in the future by existing or
      new checkers that evaluate errno-modifying function calls.
      
      Reviewed By: NoQ, steakhal
      
      Differential Revision: https://reviews.llvm.org/D120310
      d8a2afb2
    • Zhihao Yuan's avatar
      [Clang] Remove redundant init-parens in AST print · d1a59eef
      Zhihao Yuan authored
      Given a dependent `T` (maybe an undeduced `auto`),
      
      Before:
      
          new T(z)  -->  new T((z))  # changes meaning with more args
          new T{z}  -->  new T{z}
              T(z)  -->      T(z)
              T{z}  -->      T({z})  # forbidden if T is auto
      
      After:
      
          new T(z)  -->  new T(z)
          new T{z}  -->  new T{z}
              T(z)   -->     T(z)
              T{z}   -->     T{z}
      
      Depends on D113393
      
      Reviewed By: aaron.ballman
      
      Differential Revision: https://reviews.llvm.org/D120608
      d1a59eef
    • Zhihao Yuan's avatar
      [c++2b] Implement P0849R8 auto(x) · 136b2931
      Zhihao Yuan authored
      https://wg21.link/p0849
      
      Reviewed By: aaron.ballman, erichkeane
      
      Differential Revision: https://reviews.llvm.org/D113393
      136b2931
    • Michael Kruse's avatar
      [OpenMPIRBuilder] Implement static-chunked workshare-loop schedules. · a66f7769
      Michael Kruse authored
      Add applyStaticChunkedWorkshareLoop method implementing static schedule when chunk-size is specified. Unlike a static schedule without chunk-size (where chunk-size is chosen by the runtime such that each thread receives one chunk), we need two nested loops: one for looping over the iterations of a chunk, and a second for looping over all chunks assigned to the threads.
      
      This patch includes the following related changes:
       * Adapt applyWorkshareLoop to triage between the schedule types, now possible since all schedules have been implemented. The default schedule is assumed to be non-chunked static, as without OpenMPIRBuilder.
       * Remove the chunk parameter from applyStaticWorkshareLoop, it is ignored by the runtime. Change the value for the value passed to the init function to 0, as without OpenMPIRBuilder.
       * Refactor CanonicalLoopInfo::setTripCount and CanonicalLoopInfo::mapIndVar as used by both, applyStaticWorkshareLoop and applyStaticChunkedWorkshareLoop.
       * Enable Clang to use the OpenMPIRBuilder in the presence of the schedule clause.
      
      Differential Revision: https://reviews.llvm.org/D114413
      a66f7769
  7. Feb 28, 2022
Loading