Skip to content
  1. Mar 19, 2021
  2. Mar 18, 2021
    • Rob Suderman's avatar
      [mlir][tosa] Add tosa.concat to subtensor inserts lowering · 5627564f
      Rob Suderman authored
      Includes lowering for tosa.concat with indice computation with subtensor insert
      operations. Includes tests along two different indices.
      
      Differential Revision: https://reviews.llvm.org/D98813
      5627564f
    • Yuanfang Chen's avatar
      Fix test case in b4a8c0eb · 80df56f7
      Yuanfang Chen authored
      80df56f7
    • Craig Topper's avatar
      [DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters... · 182b831a
      Craig Topper authored
      [DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters with all zeros masks that use SPLAT_VECTOR.
      
      Previously only all zeros BUILD_VECTOR was recognized.
      182b831a
    • Yuanfang Chen's avatar
      [LTO][MC] Discard non-prevailing defined symbols in module-level assembly · b4a8c0eb
      Yuanfang Chen authored
      This is the alternative approach to D96931.
      
      In LTO, for each module with inlineasm block, prepend directive ".lto_discard <sym>, <sym>*" to the beginning of the inline
      asm.  ".lto_discard" is both a module inlineasm block marker and (optionally) provides a list of symbols to be discarded.
      
      In MC while emitting for inlineasm, discard symbol binding & symbol
      definitions according to ".lto_disard".
      
      Reviewed By: MaskRay
      
      Differential Revision: https://reviews.llvm.org/D98762
      b4a8c0eb
    • Shilei Tian's avatar
      [OpenMP] Fixed a crash in hidden helper thread · 2df65f87
      Shilei Tian authored
      It is reported that after enabling hidden helper thread, the program
      can hit the assertion `new_gtid < __kmp_threads_capacity` sometimes. The root
      cause is explained as follows. Let's say the default `__kmp_threads_capacity` is
      `N`. If hidden helper thread is enabled, `__kmp_threads_capacity` will be offset
      to `N+8` by default. If the number of threads we need exceeds `N+8`, e.g. via
      `num_threads` clause, we need to expand `__kmp_threads`. In
      `__kmp_expand_threads`, the expansion starts from `__kmp_threads_capacity`, and
      repeatedly doubling it until the new capacity meets the requirement. Let's
      assume the new requirement is `Y`.  If `Y` happens to meet the constraint
      `(N+8)*2^X=Y` where `X` is the number of iterations, the new capacity is not
      enough because we have 8 slots for hidden helper threads.
      
      Here is an example.
      ```
      #include <vector>
      
      int main(int argc, char *argv[]) {
        constexpr const size_t N = 1344;
        std::vector<int> data(N);
      
      #pragma omp parallel for
        for (unsigned i = 0; i < N; ++i) {
          data[i] = i;
        }
      
      #pragma omp parallel for num_threads(N)
        for (unsigned i = 0; i < N; ++i) {
          data[i] += i;
        }
      
        return 0;
      }
      ```
      My CPU is 20C40T, then `__kmp_threads_capacity` is 160. After offset,
      `__kmp_threads_capacity` becomes 168. `1344 = (160+8)*2^3`, then the assertions
      hit.
      
      Reviewed By: protze.joachim
      
      Differential Revision: https://reviews.llvm.org/D98838
      2df65f87
    • Craig Topper's avatar
      [SelectionDAG] Don't pass a scalable vector to MachinePointerInfo::getWithOffset in a unit test. · 305a0bad
      Craig Topper authored
      Suppresses an implicit TypeSize to uint64_t conversion warning.
      
      We might be able to just not offset it since we're writing to a
      Fixed stack object, but I wasn't sure so I just did what
      DAGTypeLegalizer::IncrementPointer does.
      
      Reviewed By: sdesmalen
      
      Differential Revision: https://reviews.llvm.org/D98736
      305a0bad
    • Stefan Gränitz's avatar
      [lli] Add Orc greedy mode as -jit-kind=orc · e1579894
      Stefan Gränitz authored
      In the existing OrcLazy mode, modules go through partitioning and outgoing calls are replaced by reexport stubs that resolve on call-through. In greedy mode that this patch unlocks for lli, modules materialize as a whole and trigger materialization for all required symbols recursively. This is useful for testing (e.g. D98785) and it's more similar to the way MCJIT works.
      e1579894
    • thomasraoux's avatar
      [mlir] Fix build failure due to 1a572f45 · 44f24f39
      thomasraoux authored
      44f24f39
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Remove cpol, tfe, and swz from MUBUF patterns · edd6da10
      Stanislav Mekhanoshin authored
      These are always selected as 0 anyway.
      
      Differential Revision: https://reviews.llvm.org/D98663
      edd6da10
Loading