Skip to content

Commit

Permalink
Use Ramble modifier to fill in allocation variables (#195)
Browse files Browse the repository at this point in the history
* initial modifier

* partial work

* dont mess with locals()

* changed variable name

* Able to proceed with Ramble#452; that uncovered a str-to-int conversion issue

* remove debugging statements

* remove filled-in variable from experiment name

* intermediate work on also getting modifier to generate batch submissions

* finished up work that allows the modifier to define allocations as well

* style fix

* refactor away from context manager

* handle flux directives and timeout

* remove unused import

* add references for clarification

* n_threads is not special; also rename it to n_omp_threads_per_task

* intermediate work

* done with doing placeholder inference based on exceeding max-request limit

* env_var_modification needs mode; not sure 100 percent what that should be

* add n_cores_per_node (different than n_cores_per_rank)

* style edits

* there can now be one execute_experiment.tpl

* removal of all individual execute_experiment.tpl files

* update all system configs except Fugaku and Sierra

* update all experiments based on (a) new names and (b) logic that fills in variables

* style edit

* sierra batch/run cmd options implemented

* add fugaku batch opt generation logic

* replace variables for Sierra and Fugaku

* consolidate variable accessor logic into single class; add explanatory comment

* syntax error

* testing+fixing some issues for fugaku

* typos for sierra

* fix sierra reference errors etc. and recognition of 'queue' as variable

* style fix

* apply real values to sys_cpus_per_node/sys_gpus_per_node for LLNL systems

* the scheduler used for Sierra is called 'lsf', so use that name

* add basic alias substitution logic (omp_num_threads can be used instead of n_threads_per_proc)

* fix alias issue and add comments

* style fix

* set appropriate schedulers

* scheduler on Fugaku is called 'pjm'

* all experiments need to use the allocation modifier

* amg2023 benchmark should not be doing requesting any number of ranks/processes per node

* logic to set n_ranks based on n_gpus (if the latter is set and the former isnt)

* handle the most common case of gpu specification for Flux, not using jobspec

* add docstring

* syntax error

* style fix

* Fugaku system description

* LUMI system description

* add reference link

* Piz Daint system description

* add reference link

* partial description of Eiger/Alps

* proper detection of unset vars; fixed error w/ calculation of n_nodes from n_gpus

* Both flux and lsf want gpus_per_rank

* style fix

* more style fixes

* restore default nosite config

* missed converting input param name

* saxpy/raja-perf cuda/rocm experiments should just specify the number of gpus they want

* add CI checks to exercise the allocation modifier logic (use --dry-run to avoid doing concretization/install as part of ramble workspace setup)

* sys_cpus_per_node -> sys_cores_per_node

* intercept divide-by-zero error

* clarify we currently only support lrun and not jsrun

* style fix

---------

Co-authored-by: pearce8 <[email protected]>
  • Loading branch information
scheibelp and pearce8 authored Jun 6, 2024
1 parent 46b2f9f commit 4924234
Show file tree
Hide file tree
Showing 54 changed files with 671 additions and 422 deletions.
29 changes: 29 additions & 0 deletions .github/workflows/run.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,32 @@ jobs:
--unsigned \
--update-index ci-buildcache \
$(spack find --format '/{hash}')
allocationmodifier:
runs-on: ubuntu-latest
steps:
- name: Checkout Benchpark
uses: actions/checkout@9bb56186c3b09b4f86b1c65136769dd318469633

- name: Add needed Python libs
run: |
pip install -r ./requirements.txt
- name: Dry run amg2023/cuda on Sierra
run: |
./bin/benchpark setup amg2023/cuda LLNL-Sierra-IBM-power9-V100-Infiniband workspace/
. workspace/setup.sh
ramble \
--workspace-dir workspace/amg2023/cuda/LLNL-Sierra-IBM-power9-V100-Infiniband/workspace \
--disable-progress-bar \
--disable-logger \
workspace setup --dry-run
- name: Dry run amg2023/cuda on Pascal
run: |
./bin/benchpark setup amg2023/cuda LLNL-Pascal-Penguin-broadwell-P100-OmniPath workspace/
. workspace/setup.sh
ramble \
--workspace-dir workspace/amg2023/cuda/LLNL-Pascal-Penguin-broadwell-P100-OmniPath/workspace \
--disable-progress-bar \
--disable-logger \
workspace setup --dry-run
4 changes: 4 additions & 0 deletions bin/benchpark
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,10 @@ def benchpark_setup_handler(args):
ramble_spack_experiment_configs_dir,
include_fn,
)
os.symlink(
source_dir / "experiments" / "universal-resources" / "execute_experiment.tpl",
ramble_configs_dir / "execute_experiment.tpl",
)

spack_location = experiments_root / "spack"
ramble_location = experiments_root / "ramble"
Expand Down
19 changes: 11 additions & 8 deletions configs/CSC-LUMI-HPECray-zen3-MI250X-Slingshot/variables.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,15 @@
variables:
gtl_flag: '' # to be overwritten by tests that need GTL
rocm_arch: 'gfx90a'
batch_time: '02:00'
mpi_command: 'srun -N {n_nodes} -n {n_ranks}'
batch_submit: 'sbatch {execute_experiment}'
batch_nodes: '#SBATCH -N {n_nodes}'
batch_ranks: '#SBATCH -n {n_ranks}'
batch_timeout: '#SBATCH -t {batch_time}:00'
cpu_partition: '#SBATCH -p small'
gpu_partition: '#SBATCH -p small-g'
timeout: '120'
scheduler: "slurm"
# This describes the LUMI-G partition: https://docs.lumi-supercomputer.eu/hardware/lumig/
sys_cores_per_node: "64"
sys_gpus_per_node: "8"
sys_mem_per_node: "512"
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"

17 changes: 11 additions & 6 deletions configs/CSCS-Daint-HPECray-haswell-P100-Infiniband/variables.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,17 @@
# SPDX-License-Identifier: Apache-2.0

variables:
batch_time: '02:00'
mpi_command: 'srun -N {n_nodes} -n {n_ranks}'
batch_submit: 'sbatch {execute_experiment}'
batch_nodes: '#SBATCH -N {n_nodes}'
batch_ranks: '#SBATCH -n {n_ranks}'
batch_timeout: '#SBATCH -t {batch_time}:00'
default_cuda_version: '11.2.0'
cuda_arch: '60'
enable_mps: '/usr/tcetmp/bin/enable_mps'
timeout: '120'
scheduler: "slurm"
# This describes the XC50 compute nodes: https://www.cscs.ch/computers/piz-daint
sys_cores_per_node: "12"
sys_gpus_per_node: "1"
sys_mem_per_node: "64"
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"
16 changes: 10 additions & 6 deletions configs/CSCS-Eiger-HPECray-zen2-Slingshot/variables.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,13 @@
# SPDX-License-Identifier: Apache-2.0

variables:
batch_time: '00:30'
mpi_command: 'srun -N {n_nodes} -n {n_ranks}'
batch_submit: 'sbatch {execute_experiment}'
batch_nodes: '#SBATCH -N {n_nodes}'
batch_ranks: '#SBATCH -n {n_ranks}'
batch_timeout: '#SBATCH -t {batch_time}:00'
timeout: '30'
scheduler: "slurm"
sys_cores_per_node: "128"
# sys_gpus_per_node unset
# sys_mem_per_node unset
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"
14 changes: 8 additions & 6 deletions configs/LLNL-Magma-Penguin-icelake-OmniPath/variables.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@
# SPDX-License-Identifier: Apache-2.0

variables:
batch_time: '02:00'
mpi_command: 'srun -N {n_nodes} -n {n_ranks}'
batch_submit: 'sbatch {execute_experiment}'
batch_nodes: '#SBATCH -N {n_nodes}'
batch_ranks: '#SBATCH -n {n_ranks}'
batch_timeout: '#SBATCH -t {batch_time}:00'
timeout: "120"
scheduler: "slurm"
sys_cores_per_node: "96"
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,12 @@ variables:
gtl_flag: '' # to be overwritten by tests that need GTL
cuda_arch: '60'
default_cuda_version: '11.8.0'
batch_time: '02:00'
mpi_command: 'srun -N {n_nodes} -n {n_ranks}'
batch_submit: 'sbatch {execute_experiment}'
batch_nodes: '#SBATCH -N {n_nodes}'
batch_ranks: '#SBATCH -n {n_ranks} -G {n_ranks}'
batch_timeout: '#SBATCH -t {batch_time}:00'
timeout: "120"
scheduler: "slurm"
sys_cores_per_node: "36"
sys_gpus_per_node: "2"
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"
16 changes: 10 additions & 6 deletions configs/LLNL-Sierra-IBM-power9-V100-Infiniband/variables.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,15 @@

variables:
gtl_flag: '' # to be overwritten by tests that need GTL
batch_time: '02:00'
mpi_command: '/usr/tcetmp/bin/lrun -n {n_ranks} -T {processes_per_node} {gtl_flag}'
batch_submit: 'bsub -q pdebug {execute_experiment}'
batch_nodes: '#BSUB -nnodes {n_nodes}'
batch_ranks: ''
batch_timeout: '#BSUB -W {batch_time}'
default_cuda_version: '11.8.0'
cuda_arch: '70'
timeout: "120"
scheduler: "lsf"
queue: "pdebug"
sys_cores_per_node: "44"
sys_gpus_per_node: "4"
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"
15 changes: 9 additions & 6 deletions configs/LLNL-Tioga-HPECray-zen3-MI250X-Slingshot/variables.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@
variables:
gtl_flag: '' # to be overwritten by tests that need GTL
rocm_arch: 'gfx90a'
batch_time: '120m'
mpi_command: 'flux run -N {n_nodes} -n {n_ranks}'
batch_submit: 'flux batch {execute_experiment}'
batch_nodes: '# flux: -N {n_nodes}'
batch_ranks: '# flux: -n {n_ranks}'
batch_timeout: '# flux: -t {batch_time}'
timeout: "120"
scheduler: "flux"
sys_cores_per_node: "64"
sys_gpus_per_node: "4"
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"
25 changes: 13 additions & 12 deletions configs/RCCS-Fugaku-Fujitsu-A64FX-TofuD/variables.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,16 @@
# SPDX-License-Identifier: Apache-2.0

variables:
batch_time: '02:00'
mpi_command: 'mpiexec'
batch_submit: 'pjsub {execute_experiment}'
batch_nodes: '#PJM -L "node={n_nodes}"'
batch_ranks: '#PJM --mpi proc={n_ranks}'
batch_timeout: '#PJM -L "elapse={batch_time}:00" -x PJM_LLIO_GFSCACHE="/vol0002:/vol0003:/vol0004:/vol0005:/vol0006"'
default_comp: '[email protected]'
#default_comp: '[email protected]'
#default_comp: '[email protected]'
fj_comp_version: '4.10.0'
sys_arch: 'arch=linux-rhel8-a64fx'

default_fj_version: '4.10.0'
default_llvm_version: '17.0.2'
default_gnu_version: '13.2.0'
timeout: "120"
scheduler: "pjm"
sys_cores_per_node: "48"
sys_mem_per_node: "32"
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"
#sys_arch: 'arch=linux-rhel8-a64fx'
15 changes: 9 additions & 6 deletions configs/nosite-AWS_PCluster_Hpc7a-zen4-EFA/variables.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,12 @@
# SPDX-License-Identifier: Apache-2.0

variables:
batch_time: '02:00'
mpi_command: 'srun -N {n_nodes} -n {n_ranks} --mpi=pmix --export=ALL,FI_EFA_USE_DEVICE_RDMA=1,FI_PROVIDER="efa",OMPI_MCA_mtl_base_verbose=100'
batch_submit: 'sbatch {execute_experiment}'
batch_nodes: '#SBATCH -N {n_nodes}'
batch_ranks: '#SBATCH -n {n_ranks}'
batch_timeout: '#SBATCH -t {batch_time}:00'
timeout: "120"
scheduler: "slurm"
sys_cores_per_node: "1"
# sys_gpus_per_node unset
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"
15 changes: 9 additions & 6 deletions configs/nosite-HPECray-zen3-MI250X-Slingshot/variables.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@
variables:
gtl_flag: '' # to be overwritten by tests that need GTL
rocm_arch: 'gfx90a'
batch_time: '02:00'
mpi_command: 'srun -N {n_nodes} -n {n_ranks}'
batch_submit: 'sbatch {execute_experiment}'
batch_nodes: '#SBATCH -N {n_nodes}'
batch_ranks: '#SBATCH -n {n_ranks}'
batch_timeout: '#SBATCH -t {batch_time}:00'
timeout: "120"
scheduler: "slurm"
sys_cores_per_node: "1"
# sys_gpus_per_node unset
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"
14 changes: 8 additions & 6 deletions configs/nosite-x86_64/variables.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@
# SPDX-License-Identifier: Apache-2.0

variables:
batch_time: ''
mpi_command: 'mpirun -n {n_nodes} -c {n_ranks} --oversubscribe'
batch_submit: '{execute_experiment}'
batch_nodes: ''
batch_ranks: ''
batch_timeout: ''
scheduler: "mpi"
sys_cores_per_node: "1"
# sys_gpus_per_node unset
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"
9 changes: 5 additions & 4 deletions experiments/amg2023/cuda/ramble.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,14 @@ ramble:
install: '--add --keep-stage'
concretize: '-U -f'

modifiers:
- name: allocation

applications:
amg2023:
workloads:
problem1:
variables:
n_ranks: '{processes_per_node} * {n_nodes}'
p: 2
px: '{p}'
py: '{p}'
Expand All @@ -32,11 +34,10 @@ ramble:
gtl: ['gtl', 'nogtl']
gtlflag: ['-M"-gpu"', '']
experiments:
amg2023_cuda_problem1_{gtl}_{n_nodes}_{px}_{py}_{pz}_{nx}_{ny}_{nz}:
amg2023_cuda_problem1_{gtl}_{px}_{py}_{pz}_{nx}_{ny}_{nz}:
variables:
env_name: amg2023
processes_per_node: '4'
n_nodes: '2'
n_gpus: '8'
zips:
gtl_info:
- gtl
Expand Down
13 changes: 0 additions & 13 deletions experiments/amg2023/openmp/execute_experiment.tpl

This file was deleted.

14 changes: 6 additions & 8 deletions experiments/amg2023/openmp/ramble.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,14 @@ ramble:
install: '--add --keep-stage'
concretize: '-U -f'

modifier:
- name: allocation

applications:
amg2023:
workloads:
problem1:
env_vars:
set:
OMP_NUM_THREADS: '{omp_num_threads}'
variables:
n_ranks: '{processes_per_node} * {n_nodes}'
p: 2
px: '{p}'
py: '{p}'
Expand All @@ -32,18 +31,17 @@ ramble:
nx: '{n}'
ny: '{n}'
nz: '{n}'
processes_per_node: ['8', '4']
n_ranks_per_node: ['8', '4']
n_nodes: ['1', '2']
threads_per_node_core: ['4', '6', '12']
omp_num_threads: '{threads_per_node_core} * {n_nodes}'
n_threads_per_proc: ['4', '6', '12']
experiments:
amg2023_omp_problem1_{n_nodes}_{omp_num_threads}_{px}_{py}_{pz}_{nx}_{ny}_{nz}:
variables:
env_name: amg2023-omp
matrices:
- size_threads:
- n
- threads_per_node_core
- n_threads_per_proc
spack:
concretized: true
packages:
Expand Down
13 changes: 0 additions & 13 deletions experiments/amg2023/rocm/execute_experiment.tpl

This file was deleted.

Loading

0 comments on commit 4924234

Please sign in to comment.