Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardening the computational backend #1730

Open
3 of 14 tasks
mguidon opened this issue Sep 23, 2024 · 0 comments
Open
3 of 14 tasks

Hardening the computational backend #1730

mguidon opened this issue Sep 23, 2024 · 0 comments
Assignees
Labels
PO issue Created by Product owners

Comments

@mguidon
Copy link
Member

mguidon commented Sep 23, 2024

There are a few issues in the computational backend that block us from scaling up. Jobs can get stuck in the queue and ec2 instances are not terminated properly.

GOAL: User can scale up to dozens of ec2 machines of different types.

In addition the Task Manager in the sim4life application can be improved upon (see #1731)

This reflects the following PO requirement:

  • submitting jobs to the comp backend, monitoring jobs
  • receiving computed jobs and performing postprocessing

MartinKippenberger

  1. a:autoscaling computational clusters
    pcrespov sanderegg
  2. a:director-v2 computational clusters
    sanderegg
  3. High Priority a:clusters-keeper computational clusters
    sanderegg
  4. a:clusters-keeper computational clusters
    sanderegg

Tasks

  1. a:director-v2
    sanderegg
  2. a:director-v2
    sanderegg
  3. a:autoscaling a:infra+ops
    sanderegg
  4. a:frontend a:webserver
    odeimaiz sanderegg
  5. a:autoscaling a:dask-service computational clusters t:enhancement
    sanderegg
  6. a:autoscaling t:enhancement
    sanderegg
  7. 1 of 3
    a:autoscaling a:frontend a:webserver computational clusters t:enhancement
    sanderegg
  8. a:autoscaling
    sanderegg
  9. a:autoscaling t:maintenance
    matusdrobuliak66
  10. a:autoscaling bug
    sanderegg
@mguidon mguidon self-assigned this Sep 23, 2024
@mguidon mguidon added the PO issue Created by Product owners label Sep 23, 2024
@pcrespov pcrespov self-assigned this Sep 26, 2024
@pcrespov pcrespov added this to the MartinKippenberger milestone Sep 27, 2024
@pcrespov pcrespov assigned sanderegg and unassigned mguidon Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PO issue Created by Product owners
Projects
None yet
Development

No branches or pull requests

3 participants