Skip to content

[mlir][scf] Add ReturnLike to scf::InParallelOp #148237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fabianmcg
Copy link
Contributor

This patch adds the ReturnLike trait to scf::InParallelOp. This is the first step to clean the branching semantics of scf::ForallOp and scf::InParallelOp.

@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-mlir

Author: Fabian Mora (fabianmcg)

Changes

This patch adds the ReturnLike trait to scf::InParallelOp. This is the first step to clean the branching semantics of scf::ForallOp and scf::InParallelOp.


Full diff: https://github.com/llvm/llvm-project/pull/148237.diff

1 Files Affected:

  • (modified) mlir/include/mlir/Dialect/SCF/IR/SCFOps.td (+1)
diff --git a/mlir/include/mlir/Dialect/SCF/IR/SCFOps.td b/mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
index 2d15544e871b3..23c987dd777e2 100644
--- a/mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
+++ b/mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
@@ -633,6 +633,7 @@ def ForallOp : SCF_Op<"forall", [
 def InParallelOp : SCF_Op<"forall.in_parallel", [
        Pure,
        Terminator,
+       ReturnLike,
        DeclareOpInterfaceMethods<ParallelCombiningOpInterface>,
        HasParent<"ForallOp">,
       ] # GraphRegionNoTerminator.traits> {

@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-mlir-scf

Author: Fabian Mora (fabianmcg)

Changes

This patch adds the ReturnLike trait to scf::InParallelOp. This is the first step to clean the branching semantics of scf::ForallOp and scf::InParallelOp.


Full diff: https://github.com/llvm/llvm-project/pull/148237.diff

1 Files Affected:

  • (modified) mlir/include/mlir/Dialect/SCF/IR/SCFOps.td (+1)
diff --git a/mlir/include/mlir/Dialect/SCF/IR/SCFOps.td b/mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
index 2d15544e871b3..23c987dd777e2 100644
--- a/mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
+++ b/mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
@@ -633,6 +633,7 @@ def ForallOp : SCF_Op<"forall", [
 def InParallelOp : SCF_Op<"forall.in_parallel", [
        Pure,
        Terminator,
+       ReturnLike,
        DeclareOpInterfaceMethods<ParallelCombiningOpInterface>,
        HasParent<"ForallOp">,
       ] # GraphRegionNoTerminator.traits> {

@fabianmcg fabianmcg requested review from ftynse and joker-eph July 11, 2025 12:41
Copy link
Member

@ftynse ftynse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate more on the reasoning here? ReturnLike forwards its operands to the parent region along the control flow edge:

/// This trait indicates that a terminator operation is "return-like". This
/// means that it exits its current region and forwards its operands as "exit"
/// values to the parent region. Operations with this trait are not permitted to
/// contain successors or produce results.
. This doesn't sound applicable here as in_parallel doesn't even have operands.

@fabianmcg
Copy link
Contributor Author

fabianmcg commented Jul 13, 2025

Could you elaborate more on the reasoning here?

Sure.

Right now, scf.forall control-flow semantics are broken. scf.forall declares to be RegionBranchOpInterface but scf.in_parallel is not a RegionBranchTerminatorOpInterface, and this breaks things like dataflow. Furthermore, I'd argue this breaks the implicit contract of region control-flow, where each RegionBranchOpInterface should be matched by a RegionBranchTerminatorOpInterface.

There were 2 bad options to start fixing the control-flow semantics in this case:

  1. Remove RegionBranchOpInterface from scf.forall. This sounds like a breaking change that might break semantics for everyone. Further, in the future we would re-introduce it once the whole semantics issue is fixed.
  2. Start fixing scf.in_parallel, and try to break as few things as possible during the process to make it comply with RegionBranchTerminatorOpInterface.

I opted for 2. This patch fixes scf.in_parallel semantics for the buffer case (as the buffer case has no arguments) but not the tensor case, because as you mentioned this op doesn't forward its operands to the parent op.

The proper fix to this whole situation, (IMO) is to add a second region to scf.forall and do the in parallel computations there for tensors, then RegionBranchTerminatorOpInterface can always apply. But that's a large breaking change specially for testing, so I'm trying to figure out ways to keep the changes small, and this was my choice among a set of bad choices (I'm open to hear alternatives) as a first step, as it fixes the easy half of the problem (the buffer case).

Now, outside of this patch I would argue that control-flow semantics (outside blocks) in MLIR are somewhat broken, as this is not the first time I encounter a situation like this. But that's a larger conversation and out of scope for this patch.

@ftynse
Copy link
Member

ftynse commented Jul 13, 2025

The proper fix to this whole situation, (IMO) is to add a second region to scf.forall and do the in parallel computations there for tensors, then RegionBranchTerminatorOpInterface can always apply. But that's a large breaking change specially for testing, so I'm trying to figure out ways to keep the changes small, and this was my choice among a set of bad choices (I'm open to hear alternatives) as a first step, as it fixes the easy half of the problem (the buffer case).

This sounds like a good idea, but you'll have to forward values from one region to another. For testing, we can follow what I suggested on another thread and have a custom version of mlir-opt that accepts the previous syntax/structure and prints the new structure. Doing it with commented-out FileCheck lines is a bit tricky though.

Now, outside of this patch I would argue that control-flow semantics (outside blocks) in MLIR are somewhat broken, as this is not the first time I encounter a situation like this. But that's a larger conversation and out of scope for this patch.

This is something that can be raised to the forums. I know the modeling is quite limited, but I don't know whether is broken.

@joker-eph
Copy link
Collaborator

joker-eph commented Jul 13, 2025

While the op does not have explicit operands, the op is actually implicitly capturing SSA value, which counts as operands to me.
However it does not just forward its operands to the parent, since it relies on its body to form the actual values to yield (in a non-trivial manner).

I don't quite see what's broken in terms of control-flow here? The fact that it is a parallel fan-out?
The RegionBanchOpInterface is limited in its ability to track the actual data-flow here but that's another issue.

The proper fix to this whole situation, (IMO) is to add a second region to scf.forall and do the in parallel computations there for tensors, then RegionBranchTerminatorOpInterface can always apply.

That wouldn't help alone: you would need to materialize the reduction in the same way that XLA/HLO does with its reduce: by taking two values, writing the combiner, and yield the new value (instead of the "magic" parallel_insert_slice model existing now)

@fabianmcg
Copy link
Contributor Author

I suggested on another thread and have a custom version of mlir-opt that accepts the previous syntax/structure and prints the new structure. Doing it with commented-out FileCheck lines is a bit tricky though.

Yes, that's why I mentioned clang's Rewriter class, it would allow to keep the comments. However, since then I think it's easier to parse the comments into ops.

I don't quite see what's broken in terms of control-flow here?

I think the semantics of the implementations are broken here, eg. forall having the interface but in_parallel not having the matching interface. In the same way, I don't think SSACFG region semantics are necessarily broken, but the implementation with RegionBranch* likely is.

However it does not just forward its operands to the parent, since it relies on its body to form the actual values to yield (in a non-trivial manner).

That's another thing I considered, extending the interface and other data structures to also work with ranges of ops. However, there were two things I learned while trying to do that approach:

  1. MLIR relies heavily in control-flow operands to be actual op operands, eg. dataflow relies on getting the operand number.
  2. Mutation as it's currently prescribed by the interface is impossible for non values. This is one of the things I think it's broken in the interface, IMO mutation should be handled by the interface and we should remove the notion of operands from the interface.

Consequently, the proper fix there it's an even bigger breaking change. I'd argue this needs to happen anyways, but I don't want to propose that until we know how generic control-flow looks in MLIR, as the new system could potentially replace all control-flow and fix things in one go.

That wouldn't help alone: you would need to materialize the reduction in the same way that XLA/HLO does with its reduce: by taking two values, writing the combiner, and yield the new value (instead of the "magic" parallel_insert_slice model existing now)

Yeah, a change in parallel_insert_slice is also needed to yield a value. However, I see it as a potentially easier change, with the correct handling happening in the parallel_result region of forall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants