Skip to content

[mlir][linalg] Enable scalable vectorization of linalg.unpack (WIP) #149293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

banach-space
Copy link
Contributor

This patch updates vectorizeAsTensorUnpackOp to support scalable vectorization by requiring user-specified vector sizes for both the read and write operations involved in linalg.unpack. Detailed rationale and an example are provided below.

Conceptually, linalg.unpack consists of the following high-level steps:

  1. Read from the source tensor.
  2. Transpose the value read in step (1).
  3. Write the value from step (2) into the destination tensor.

Currently, when vectorizing with user-provided vector sizes, only the sizes for the write operation (step 3) are required. Sizes for the read operation (step 1) are inferred from static shapes and inner tile sizes.

This logic breaks when the input shapes or tile sizes are dynamic (indeed, vectorizeUnPackOpPrecondition rejects such cases ATM and the vectorization fails). This patch addresses the issue by requiring explicit vector sizes for both the read and write sides, enabling scalable vectorization in such cases.

Example:

func.func @unpack(%in: tensor<1x1x8x?xf32>, %out: tensor<8x?xf32>) -> tensor<8x?xf32> {
  %vs = vector.vscale
  %c8 = arith.constant 8 : index
  %tile_size = arith.muli %vs, %c8 : index

  %unpack = linalg.unpack  %in
    inner_dims_pos = [0, 1]
    inner_tiles = [8, %tile_size]
    into %out : tensor<1x1x8x?xf32> -> tensor<8x?xf32>
  return %unpack : tensor<8x?xf32>
}

module attributes {transform.with_named_sequence} {
  transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
    %0 = transform.structured.match ops{["linalg.unpack"]} in %arg0 : (!transform.any_op) -> !transform.any_op
    transform.structured.vectorize %0 vector_sizes [1, 1, 8, [8],  8, [8]] : !transform.any_op
    //                                              \         /    \    /
    //                                              read-sizes   write-sizes
    transform.yield
  }
}

Finally, this patch also extends createReadOrMaskedRead and createWriteOrMaskedWrite to take scalable flags.

This patch updates `vectorizeAsTensorUnpackOp` to support scalable
vectorization by requiring user-specified vector sizes for both the
_read_ and _write_ operations involved in `linalg.unpack`. Detailed
rationale and an example are provided below.

Conceptually, `linalg.unpack` consists of the following high-level steps:
  1. _Read_ from the source tensor.
  2. Transpose the value read in step (1).
  3. _Write_ the value from step (2) into the destination tensor.

Currently, when vectorizing with user-provided vector sizes, only the
sizes for the _write_ operation (step 3) are required. Sizes for the
_read_ operation (step 1) are inferred from static shapes and inner tile
sizes.

This logic breaks when the input shapes or tile sizes are dynamic
(indeed, `vectorizeUnPackOpPrecondition` rejects such cases ATM and the
vectorization fails). This patch addresses the issue by requiring
explicit vector sizes for both the read and write sides, enabling
scalable vectorization in such cases.

Example:

```mlir
func.func @unpack(%in: tensor<1x1x8x?xf32>, %out: tensor<8x?xf32>) -> tensor<8x?xf32> {
  %vs = vector.vscale
  %c8 = arith.constant 8 : index
  %tile_size = arith.muli %vs, %c8 : index

  %unpack = linalg.unpack  %in
    inner_dims_pos = [0, 1]
    inner_tiles = [8, %tile_size]
    into %out : tensor<1x1x8x?xf32> -> tensor<8x?xf32>
  return %unpack : tensor<8x?xf32>
}

module attributes {transform.with_named_sequence} {
  transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
    %0 = transform.structured.match ops{["linalg.unpack"]} in %arg0 : (!transform.any_op) -> !transform.any_op
    transform.structured.vectorize %0 vector_sizes [1, 1, 8, [8],  8, [8]] : !transform.any_op
    //                                              \         /    \    /
    //                                              read-sizes   write-sizes
    transform.yield
  }
}
```

Finally, this patch also extends `createReadOrMaskedRead` and
`createWriteOrMaskedWrite` to take scalable flags.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant