Skip to content

Frontier walltime fix. #946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 15, 2025
Merged

Conversation

sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Jul 15, 2025

User description

Frontier walltime too long... some jobs hang and it saps too much time.


PR Type

Bug fix


Description

  • Reduce Frontier job walltime from 4 hours to 2 hours

  • Fix hanging jobs consuming excessive compute time


Changes diagram

flowchart LR
  A["Original walltime: 03:59:00"] --> B["Updated walltime: 01:59:00"]
  B --> C["Prevent hanging jobs"]
Loading

Changes walkthrough 📝

Relevant files
Bug fix
submit-bench.sh
Reduce benchmark job walltime                                                       

.github/workflows/frontier/submit-bench.sh

  • Changed SBATCH walltime from 03:59:00 to 01:59:00
+1/-1     
submit.sh
Reduce standard job walltime                                                         

.github/workflows/frontier/submit.sh

  • Changed SBATCH walltime from 03:59:00 to 01:59:00
+1/-1     

Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • @Copilot Copilot AI review requested due to automatic review settings July 15, 2025 18:03
    Copy link
    Contributor

    @Copilot Copilot AI left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Pull Request Overview

    This PR reduces the walltime allocation for Frontier HPC jobs from 4 hours to 2 hours to address job hanging issues and excessive resource consumption.

    • Reduces SLURM job walltime from 03:59:00 to 01:59:00 in both regular and benchmark submission scripts
    • Addresses job hanging issues that were consuming excessive compute time
    • Maintains consistency across both submission script variants

    Reviewed Changes

    Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

    File Description
    .github/workflows/frontier/submit.sh Reduces walltime from ~4 hours to ~2 hours for regular job submissions
    .github/workflows/frontier/submit-bench.sh Reduces walltime from ~4 hours to ~2 hours for benchmark job submissions

    @sbryngelson sbryngelson merged commit e503fd0 into MFlowCode:master Jul 15, 2025
    12 checks passed
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 1 🔵⚪⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Possible Issue

    Reducing walltime from 4 hours to 2 hours may cause legitimate long-running jobs to be terminated prematurely. This could mask actual performance issues or cause false failures if jobs need more than 2 hours to complete successfully.

    #SBATCH -t 01:59:00                # Duration of the job (Ex: 15 mins)
    #SBATCH -o$job_slug.out            # Combined output and error messages file
    Possible Issue

    Same walltime reduction concern applies here - legitimate jobs requiring more than 2 hours may be killed, potentially causing CI failures or masking performance regressions.

    #SBATCH -t 01:59:00                # Duration of the job (Ex: 15 mins)
    #SBATCH -o$job_slug.out            # Combined output and error messages file

    @sbryngelson sbryngelson deleted the change-walltime branch July 15, 2025 18:03
    Copy link

    PR Code Suggestions ✨

    No code suggestions found for the PR.

    Copy link

    codecov bot commented Jul 15, 2025

    Codecov Report

    All modified and coverable lines are covered by tests ✅

    Project coverage is 43.74%. Comparing base (8c7e93f) to head (d5f33fc).
    Report is 1 commits behind head on master.

    Additional details and impacted files
    @@           Coverage Diff           @@
    ##           master     #946   +/-   ##
    =======================================
      Coverage   43.74%   43.74%           
    =======================================
      Files          68       68           
      Lines       18381    18381           
      Branches     2295     2295           
    =======================================
      Hits         8040     8040           
      Misses       8949     8949           
      Partials     1392     1392           

    ☔ View full report in Codecov by Sentry.
    📢 Have feedback on the report? Share it here.

    🚀 New features to boost your workflow:
    • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Development

    Successfully merging this pull request may close these issues.

    1 participant