Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dispatch jobs based on internal configuration #23

Open
dylanmcreynolds opened this issue May 14, 2024 · 1 comment
Open

Dispatch jobs based on internal configuration #23

dylanmcreynolds opened this issue May 14, 2024 · 1 comment

Comments

@dylanmcreynolds
Copy link
Member

dylanmcreynolds commented May 14, 2024

Right now, MLEX apps must know a tremendous amount about the infrastructure that will launch jobs to run training or inference. We would like to let app be able to present the user with a particular job type (from a selection of job types) and simply run it. But then the job type would be a configuration on the server that maps to a lot of information about how to launch a job e.g.:

  • run locally through conda
  • run on HPC facility A through SFAPI
  • run on HPC facility B through Globus Flows

A quick summary example:

  1. In the segmentation app, the user selects TUNet+ and their model parameters
  2. The app starts a generic parent flow in prefect.
  3. The parent flow looks up a local configuration (from the file system, or prefect block?)
  4. The parent flow looks up a configuration dictionary that defined, at this very moment, TUNet+ is being served by NERSC over SFAPI. The configuration may also store some parameters that are very specific to running TUNet+ NERSC SFAPI.

This seems pretty flexible. In some cases the user might care very much where the model is run, in which cases the configuration could be TUNet+ at NERSC or in some cases, we think the user will not care.

This brings up the question...what is the best way to have the segmentation app get a list of current configuration? @Wiebke @taxe10

@Wiebke
Copy link
Member

Wiebke commented Jun 20, 2024

Based on an in person discussion, we will be exploring the use of Prefect Blocks to store job configuration details concerning infrastructure and leave it up to the developers setting up Prefect to define blocks for to their local infrastructure and support of the logic to switch between multiple compute resouces (e.g. local vs supercomputer such as Nersc).

Our first iteration will need to support current use of the flows defined in mlex_prefect_worker in mlexchange/mlex_highres_segmentation and mlexchange/mlex_latentspaceexplorer, as well as currently used compute infrastructure where podman, conda, and mamba environments, as well as slurm are in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants