Skip to content

enable offload #612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

enable offload #612

wants to merge 1 commit into from

Conversation

wdlctc
Copy link

@wdlctc wdlctc commented Jun 12, 2025

This PR introduces activation offloading support for Fourier Neural Operator (FNO) training in neuralop, targeting reduced GPU memory consumption during forward and backward passes.

  1. New Activation Offloading Module (neuralop/training/offload.py):

Defines enable_activation_offload_for_FNO and supporting functions.

Wraps key forward passes (FNO, FNOBlocks, SpectralConv) with torch.autograd.graph.save_on_cpu(pin_memory=True) for CPU offloading of saved activations during training.

  1. Training Script for Offloading (scripts/train_offload.py):

New training script demonstrating end-to-end training with activation offloading.

Compatible with distributed training, WandB logging, and multiresolution datasets (e.g., Darcy Flow).

Preserves model configuration and training loop from standard train.py but optionally enables memory-efficient execution.

Copy link
Member

@dhpitt dhpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool @wdlctc , thank you for opening! Since the helper function totally reimplements SpectralConv.forward and assumes full access to the method, I think it'll be much more maintainable if it directly interfaces with the forward calls themselves instead of living as helper functions in neuralop.training.

enable_activation_offload_for_SpectralConv(conv)


def enable_activation_offload_for_SpectralConv(SpectralConv):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this might be better as a module in neuralop.layers, or as a param to the original SpectralConv

def forward(
self, x: torch.Tensor, output_shape: Optional[Tuple[int]] = None
):
with torch.autograd.graph.save_on_cpu(pin_memory=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the only thing that needs to be changed, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants