Dispatch jobs based on internal configuration #23

dylanmcreynolds · 2024-05-14T00:44:31Z

Right now, MLEX apps must know a tremendous amount about the infrastructure that will launch jobs to run training or inference. We would like to let app be able to present the user with a particular job type (from a selection of job types) and simply run it. But then the job type would be a configuration on the server that maps to a lot of information about how to launch a job e.g.:

run locally through conda
run on HPC facility A through SFAPI
run on HPC facility B through Globus Flows

A quick summary example:

In the segmentation app, the user selects TUNet+ and their model parameters
The app starts a generic parent flow in prefect.
The parent flow looks up a local configuration (from the file system, or prefect block?)
The parent flow looks up a configuration dictionary that defined, at this very moment, TUNet+ is being served by NERSC over SFAPI. The configuration may also store some parameters that are very specific to running TUNet+ NERSC SFAPI.

This seems pretty flexible. In some cases the user might care very much where the model is run, in which cases the configuration could be TUNet+ at NERSC or in some cases, we think the user will not care.

This brings up the question...what is the best way to have the segmentation app get a list of current configuration? @Wiebke @taxe10

The text was updated successfully, but these errors were encountered:

Wiebke · 2024-06-20T15:31:50Z

Based on an in person discussion, we will be exploring the use of Prefect Blocks to store job configuration details concerning infrastructure and leave it up to the developers setting up Prefect to define blocks for to their local infrastructure and support of the logic to switch between multiple compute resouces (e.g. local vs supercomputer such as Nersc).

Our first iteration will need to support current use of the flows defined in mlex_prefect_worker in mlexchange/mlex_highres_segmentation and mlexchange/mlex_latentspaceexplorer, as well as currently used compute infrastructure where podman, conda, and mamba environments, as well as slurm are in use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dispatch jobs based on internal configuration #23

Dispatch jobs based on internal configuration #23

dylanmcreynolds commented May 14, 2024 •

edited

Loading

Wiebke commented Jun 20, 2024

Dispatch jobs based on internal configuration #23

Dispatch jobs based on internal configuration #23

Comments

dylanmcreynolds commented May 14, 2024 • edited Loading

Wiebke commented Jun 20, 2024

dylanmcreynolds commented May 14, 2024 •

edited

Loading