-
Notifications
You must be signed in to change notification settings - Fork 110
Generating MFC Images and Testing Them on OSPool #935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
As of right now, I relied solely on Requested allocated resources are quite excessive for now and will be optimized later on to not get stuck in the queue forever. |
Grab the new workflow files from master and you can start doing CI again. You may need to merge in any changes you made. lmk if you have questions. |
Status Update: I faced a hurdle with ssh connectivity whether using SSH Keys (public/private) or Credentials (
Edit 1: I am going to inquire on how to ensure each job instance occurs on a distinct cluster i.e. 5-10 instances of a single job would run on 5-10 unique clusters increasing failure potentials. Edit 2: The batch job specs sorta prevent concurrency of job instances to be on the same machine/cluster.
Edit 3: use Edit 4: requesting distinct host os for CPU cases and distinct GPU compatibility for GPU cases will ensure uniqueness of cluster in each job instance, but the queue time is gonna be too lengthy. |
User description
Description
Concerning (#654),
Generating four images CPU, CPU_Benchmark, GPU, and GPU_Benchmark. All MFC builds occur on a GitHub runner, while testing and storing latest images take place on OSPOOL. They are retrievable on the CI itself as the images are pre-built MFC with pre-installed packages that can be accessed with simple commands.
Debugging info,
To locally generate images,
apptainer build mfc_cpu.sif Singularity.cpu
To start shell instance,
apptainer shell --fakeroot --writable-tmpfs mfc_cpu.sif
To execute directly specific commands,
apptainer exec --fakeroot --writable-tmpfs mfc_cpu.sif /bin/bash -c 'cd /opt/MFC && ./mfc.sh test -a'
To-dos,
Note to Self: current secrets are hosted in the fork, and prior to merge new dedicated ones should be added to the base repo. To do so, request access point under "GATech_Bryngelson" project, then upload public SSH key to https://registry.cilogon.org/. Later on, update secrets which include private SSH key and user@host.
Ref's
NVIDIA Container
PR Type
Other
Description
Remove existing CI workflows and testing infrastructure
Add Singularity container image building workflow
Create four container definitions for CPU/GPU variants
Implement automated image building and testing on OSPool
Changes diagram
Changes walkthrough 📝
17 files
Remove Frontier build script
Remove Frontier job submission script
Remove Frontier test script
Remove Phoenix benchmark script
Remove Phoenix benchmark submission script
Remove Phoenix job submission script
Remove Phoenix test script
Remove benchmark workflow
Remove code cleanliness workflow
Remove coverage check workflow
Remove documentation workflow
Remove formatting check workflow
Remove line count workflow
Remove source linting workflow
Remove toolchain linting workflow
Remove spell check workflow
Remove main test suite workflow
5 files
Add Singularity image building workflow
Add CPU container definition
Add CPU benchmark container definition
Add GPU container definition
Add GPU benchmark container definition