enable offload #612

wdlctc · 2025-06-12T17:27:02Z

This PR introduces activation offloading support for Fourier Neural Operator (FNO) training in neuralop, targeting reduced GPU memory consumption during forward and backward passes.

New Activation Offloading Module (neuralop/training/offload.py):

Defines enable_activation_offload_for_FNO and supporting functions.

Wraps key forward passes (FNO, FNOBlocks, SpectralConv) with torch.autograd.graph.save_on_cpu(pin_memory=True) for CPU offloading of saved activations during training.

Training Script for Offloading (scripts/train_offload.py):

New training script demonstrating end-to-end training with activation offloading.

Compatible with distributed training, WandB logging, and multiresolution datasets (e.g., Darcy Flow).

Preserves model configuration and training loop from standard train.py but optionally enables memory-efficient execution.

dhpitt

This is really cool @wdlctc , thank you for opening! Since the helper function totally reimplements SpectralConv.forward and assumes full access to the method, I think it'll be much more maintainable if it directly interfaces with the forward calls themselves instead of living as helper functions in neuralop.training.

dhpitt · 2025-06-17T11:31:54Z

neuralop/training/offload.py

+        enable_activation_offload_for_SpectralConv(conv)
+
+
+def enable_activation_offload_for_SpectralConv(SpectralConv):


I wonder if this might be better as a module in neuralop.layers, or as a param to the original SpectralConv

dhpitt · 2025-06-17T11:32:17Z

neuralop/training/offload.py

+    def forward(
+        self, x: torch.Tensor, output_shape: Optional[Tuple[int]] = None
+    ):
+        with torch.autograd.graph.save_on_cpu(pin_memory=True):


This looks like the only thing that needs to be changed, right?

enable offload

2291df9

dhpitt reviewed Jun 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable offload #612

enable offload #612

wdlctc commented Jun 12, 2025

Uh oh!

dhpitt left a comment

Uh oh!

dhpitt Jun 17, 2025

Uh oh!

dhpitt Jun 17, 2025

Uh oh!

Uh oh!

		enable_activation_offload_for_SpectralConv(conv)


		def enable_activation_offload_for_SpectralConv(SpectralConv):

enable offload #612

Are you sure you want to change the base?

enable offload #612

Conversation

wdlctc commented Jun 12, 2025

Uh oh!

dhpitt left a comment

Choose a reason for hiding this comment

Uh oh!

dhpitt Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

dhpitt Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!