Unverified Commit 0d87e257 authored Dec 01, 2023 by Spenser Bauman Committed by GitHub Dec 01, 2023

[mlir][tosa] Improve lowering to tosa.fully_connected (#73049)

The current lowering of tosa.fully_connected produces a linalg.matmul
followed by a linalg.generic to add the bias. The IR looks like the
following:

    %init = tensor.empty()
    %zero = linalg.fill ins(0 : f32) outs(%init)
    %prod = linalg.matmul ins(%A, %B) outs(%zero)

    // Add the bias
    %initB = tensor.empty()
    %result = linalg.generic ins(%prod, %bias) outs(%initB) {
       // add bias and product
    }

This has two down sides:

1. The tensor.empty operations typically result in additional
allocations after bufferization
2. There is a redundant traversal of the data to add the bias to the
matrix product.

This extra work can be avoided by leveraging the out-param of
linalg.matmul. The new IR sequence is:

    %init = tensor.empty()
    %broadcast = linalg.broadcast ins(%bias) outs(%init)
    %prod = linalg.matmul ins(%A, %B) outs(%broadcast)

In my experiments, this eliminates one loop and one allocation (post
bufferization) from the generated code.

parent faebb1b2

Show whitespace changes

Inline Side-by-side

Please to comment