Training an TFNO with navier-stokes, with flops count #583

ML4SC · 2025-04-19T07:20:45Z

I've been working with TFNO models and recently developed a script that demonstrates model performance along with FLOPs analysis for both forward and backward passes.

I'd like to contribute to the NeuralOperator project by developing a training example and accompanying documentation that:

Demonstrates TFNO performance on the 2D Navier-Stokes equations
Includes FLOPs profiling for model introspection and optimization
Trains TFNO on multiple GPUs, with ongoing work to optimize communication loops between GPUs
Discusses strategies for efficient CPU–GPU communication during training

Please let me know if this would be a valuable addition to the project — I've opened a PR and would greatly appreciate any feedback as I iterate.

JeanKossaifi · 2025-04-20T06:23:13Z

Hi Natalie, Thank you for reaching out, glad to hear you’ve been using our TFNO! Yes, we certainly welcome all contributions and the ones you mention sound very valuable! Optimizing the hyper parameters of the TFNO to speed up forward and backward pass, along with other optimization is particularly interesting to me, especially since we put a lot of effort on providing a simple API for the factorized forward passes. How do you optimize the CPU-GPU communication? I assume this is for the case where the data is very large resolution, and has to be streamed from disk? Happy to setup a chat to discuss the details. Best, Jean

…

On Sat, Apr 19, 2025 at 12:21 AM Natalie Pham ***@***.***> wrote: I've been working with TFNO models and recently developed a script that demonstrates model performance along with FLOPs analysis for both forward and backward passes. I'd like to contribute to the NeuralOperator project by developing a training example and accompanying documentation that: - Demonstrates TFNO performance on the 2D Navier-Stokes equations - Includes FLOPs profiling for model introspection and optimization - Trains TFNO on multiple GPUs, with ongoing work to optimize communication loops between GPUs - Discusses strategies for efficient CPU–GPU communication during training Please let me know if this would be a valuable addition to the project — I've opened a PR and would greatly appreciate any feedback as I iterate. ------------------------------ You can view, comment on, or merge this pull request online at: #583 Commit Summary - f3984ec <f3984ec> Training an TFNO with navier-stokes, with flops count File Changes (1 file <https://github.com/neuraloperator/neuraloperator/pull/583/files>) - *A* examples/training/train_TFNO_NavierStoke_flops_count.py <https://github.com/neuraloperator/neuraloperator/pull/583/files#diff-57ada29790fe0cdca59fb8467c92134029e4b2fa8012af5dea1692526f0452e8> (137) Patch Links: - https://github.com/neuraloperator/neuraloperator/pull/583.patch - https://github.com/neuraloperator/neuraloperator/pull/583.diff — Reply to this email directly, view it on GitHub <#583>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGLIR3G4DC2VGGGTZ5TDNL22H2OJAVCNFSM6AAAAAB3OIYR4WVHI2DSMVQWIX3LMV43ASLTON2WKOZTGAYDMMRWGIYTKMI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

ML4SC · 2025-05-02T17:12:20Z

Hi Jean,

Thank you for your quick reply. I’ve used torch.profiler to record memory usage and kernel activity on a per-epoch basis. For CPU–GPU transfers, I’ve enabled pin_memory=True and non_blocking=True and set up asynchronous data loading to handle larger batch volumes.

When working with very high-resolution data, I’m exploring a distributed streaming approach, but I haven’t yet found any existing functionality for that in the NeuralOperators codebase. If I’ve overlooked something, could you point me to the relevant module or function? Otherwise, any guidance on where to start implementing distributed data streaming would be greatly appreciated.

Thanks again for your help!

Best,
Natalie

ML4SC · 2025-05-02T17:16:25Z

Meanwhile, I’d be grateful for any feedback or suggestions you have on my TFNO example using the Navier–Stokes dataset.

JeanKossaifi · 2025-07-02T09:02:26Z

Thank you @ML4SC - the example looks good, did you get to try building the doc and checking the result?
The training script probably should be in scripts, though I'm not sure if it is needed compared to the existing training script - what do you think @dhpitt ?

Training an TFNO with navier-stokes, with flops count

f3984ec

ML4SC added 2 commits May 2, 2025 00:49

add profiler for activities tracking

b7dd8ed

add ddp training for multiple GPUs

fda9219

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training an TFNO with navier-stokes, with flops count #583

Training an TFNO with navier-stokes, with flops count #583

Uh oh!

ML4SC commented Apr 19, 2025

Uh oh!

JeanKossaifi commented Apr 20, 2025 via email

Uh oh!

ML4SC commented May 2, 2025

Uh oh!

ML4SC commented May 2, 2025

Uh oh!

JeanKossaifi commented Jul 2, 2025

Uh oh!

Uh oh!

Training an TFNO with navier-stokes, with flops count #583

Are you sure you want to change the base?

Training an TFNO with navier-stokes, with flops count #583

Uh oh!

Conversation

ML4SC commented Apr 19, 2025

Uh oh!

JeanKossaifi commented Apr 20, 2025 via email

Uh oh!

ML4SC commented May 2, 2025

Uh oh!

ML4SC commented May 2, 2025

Uh oh!

JeanKossaifi commented Jul 2, 2025

Uh oh!

Uh oh!