Commit 43a95a54 authored Mar 23, 2020 by Uday Bondhugula
[MLIR] Introduce full/partial tile separation using if/else



This patch introduces a utility to separate full tiles from partial
tiles when tiling affine loop nests where trip counts are unknown or
where tile sizes don't divide trip counts. A conditional guard is
generated to separate out the full tile (with constant trip count loops)
into the then block of an 'affine.if' and the partial tile to the else
block. The separation allows the 'then' block (which has constant trip
count loops) to be optimized better subsequently: for eg. for
unroll-and-jam, register tiling, vectorization without leading to
cleanup code, or to offload to accelerators. Among techniques from the
literature, the if/else based separation leads to the most compact
cleanup code for multi-dimensional cases (because a single version is
used to model all partial tiles).

INPUT

  affine.for %i0 = 0 to %M {
    affine.for %i1 = 0 to %N {
      "foo"() : () -> ()
    }
  }

OUTPUT AFTER TILING W/O SEPARATION

  map0 = affine_map<(d0) -> (d0)>
  map1 = affine_map<(d0)[s0] -> (d0 + 32, s0)>

  affine.for %arg2 = 0 to %M step 32 {
    affine.for %arg3 = 0 to %N step 32 {
      affine.for %arg4 = #map0(%arg2) to min #map1(%arg2)[%M] {
        affine.for %arg5 = #map0(%arg3) to min #map1(%arg3)[%N] {
          "foo"() : () -> ()
        }
      }
    }
  }

  OUTPUT AFTER TILING WITH SEPARATION

  map0 = affine_map<(d0) -> (d0)>
  map1 = affine_map<(d0) -> (d0 + 32)>
  map2 = affine_map<(d0)[s0] -> (d0 + 32, s0)>

  #set0 = affine_set<(d0, d1)[s0, s1] : (-d0 + s0 - 32 >= 0, -d1 + s1 - 32 >= 0)>

  affine.for %arg2 = 0 to %M step 32 {
    affine.for %arg3 = 0 to %N step 32 {
      affine.if #set0(%arg2, %arg3)[%M, %N] {
        // Full tile.
        affine.for %arg4 = #map0(%arg2) to #map1(%arg2) {
          affine.for %arg5 = #map0(%arg3) to #map1(%arg3) {
            "foo"() : () -> ()
          }
        }
      } else {
        // Partial tile.
        affine.for %arg4 = #map0(%arg2) to min #map2(%arg2)[%M] {
          affine.for %arg5 = #map0(%arg3) to min #map2(%arg3)[%N] {
            "foo"() : () -> ()
          }
        }
      }
    }
  }

The separation is tested via a cmd line flag on the loop tiling pass.
The utility itself allows one to pass in any band of contiguously nested
loops, and can be used by other transforms/utilities. The current
implementation works for hyperrectangular loop nests.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Differential Revision: https://reviews.llvm.org/D76700
parent e5a85126
Show whitespace changes
Inline Side-by-side
Please to comment