Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Ramble modifier to fill in allocation variables #195

Merged
merged 70 commits into from
Jun 6, 2024

Conversation

scheibelp
Copy link
Collaborator

@scheibelp scheibelp commented Apr 3, 2024

Closes #178
Fixes #221

Given an experiment that requests resources (nodes, cpus, gpus, etc.) and a system description (cpus-per-node, gpus-per-node, etc.) this intends to generate an appropriate scheduler request for resources. In some cases that ends up determining things like how many nodes are desired for a given benchmark.

#178 (comment) brings up some more-interesting examples like these, and this PR is an alternative approach.

This requires a newer Ramble than what Benchpark currently uses by default (right now I'm using https://github.com/GoogleCloudPlatform/ramble/pull/452).

./bin/benchpark setup saxpy/openmp nosite-x86_64 `pwd`/test-saxpy

Remaining work:

  • remove all experiment-specific execute_experiment.tpl files
  • Implement scheduler definition function for Sierra and Fugaku
  • Update all experiment files and all system config files (currently just experiments/saxpy/openmp and configs/nosite-x86_64 are changed to demonstrate the organization)
  • All experiment/*/*/ramble.yaml files have been translated, but need further updates to actually describe system resources (e.g. number of GPUs on each node etc.)
    • (April 22 2024) All LLNL systems are now updated with # of CPUs/GPUs per-node (in the latter case, only for systems that have them)
    • (April 23 2024) All Systems except Eiger are now updated (note that LUMI and Daint have partitions with different types of nodes, and currently the variables only describe one type)
  • (May 14 2024) Update CI to do a ramble workspace setup --dry-run of some configs and experiments: this actually runs the modifier defined here to generate batch scripts etc. with all resource requests filled in

Testing:

You can run any one of the following on any system

./bin/benchpark setup saxpy/openmp nosite-x86_64 <basedir>
./bin/benchpark setup amg2023/cuda LLNL-Sierra-IBM-power9-V100-Infiniband <basedir>
./bin/benchpark setup amg2023/cuda LLNL-Pascal-Penguin-broadwell-P100-OmniPath <basedir>

For the ramble workspace setup command it tells you to run, just append --phases make_experiments to the end of it (that will skip the concretize/install steps).

Oddities:

  • (April 23 2024) LUMI/Daint nodes have different characteristics based on what partition you request. For now, I only describe one type of node. I think we can handle this in the future by creating different configs based on what partition the user wants to submit to.
  • (April 12 2024) The GROMACS execute_experiment.tpl files are slightly different than the others: they have an extra {experiment_setup}; everything sets that variable to '' though, so I don't see a problem with removing them as well
  • (EDIT: now resolved) Some values must be defined before the modifier runs, e.g. n_ranks. I've arbitrarily decided the placeholder value for these is "7" (they must be positive integers, so I decided to choose a number that was (a) unlikely to be explicitly chosen and (b) small (in case they percolate to actual requests)

@scheibelp scheibelp marked this pull request as draft April 3, 2024 06:25
@github-actions github-actions bot added experiment New or modified experiment configs New or modified system config labels Apr 3, 2024
@scheibelp
Copy link
Collaborator Author

Example script generated from experiments/saxpy/openmp and configs/nosite-x86_64 (tweaked to assume slurm, for a more interesting output):

#SBATCH -n 8
#SBATCH -N 1
#SBATCH --time 120

cd <benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/experiments/saxpy/problem/saxpy_512_1_2

rm -f "<benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/experiments/saxpy/problem/saxpy_512_1_2/saxpy_512_1_2.out"
touch "<benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/experiments/saxpy/problem/saxpy_512_1_2/saxpy_512_1_2.out"
export OMP_NUM_THREADS="2";
. <benchpark-prefix>/test-saxpy-oslic-new/spack/share/spack/setup-env.sh
spack env activate <benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/software/saxpy.problem
srun -n 8 -N 1 saxpy -n 512 >> "<benchpark-prefix>/test-saxpy-oslic-new/saxpy/openmp/nosite-x86_64/workspace/experiments/saxpy/problem/saxpy_512_1_2/saxpy_512_1_2.out"

…n to avoid doing concretization/install as part of ramble workspace setup)
@github-actions github-actions bot added the ci Involving Project CI & Unit Tests label May 14, 2024
@pearce8 pearce8 linked an issue May 17, 2024 that may be closed by this pull request
@pearce8 pearce8 marked this pull request as ready for review June 6, 2024 15:13
@pearce8 pearce8 merged commit 4924234 into LLNL:develop Jun 6, 2024
8 checks passed
@scheibelp scheibelp mentioned this pull request Jun 25, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Involving Project CI & Unit Tests configs New or modified system config experiment New or modified experiment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set up dry-run tests for experiments in benchpark
3 participants