Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JupyterHub: Developing a GA4GH TES Service Plugin for JupyterHub – individual cells #7

Open
viktoriaas opened this issue Nov 3, 2024 · 0 comments

Comments

@viktoriaas
Copy link

viktoriaas commented Nov 3, 2024

Why?

Jupyter Notebook is an application for creating and sharing computational documents. JupyterHub is a way of providing the Notebooks to multiple users. The benefit is that users gain easy interactive access to computational resources without need to install anything.

GA4GH TES (Task Execution Service) API is a standardized schema and API for describing and executing batch execution tasks on any underlying computational backend. Full TES spec defines TES capabilities.

The goal of this issue is to develop or to lay foundations to GA4GH TES service plugin for JupyterHub that would execute individual cells in the TES instance.

Objective: Build a plugin or extension within JupyterHub that allows seamless access to GA4GH TES, streamlining federated task submission. The plugin will focus on the goal of executing a single cell through TES

Scope: Focus on plugin development, installation instructions, and usage documentation so administrators can easily deploy it across ELIXIR nodes.

More useful information and link: document online

How?

This is a larger meta issue that might (should) require discussions. Here are some helping points:

Considerations:

  • Core Components
    • TES Client Library: You'll need a client library in Python (the language Jupyter notebooks use) to interact with the TES instance. This library will handle:
      • Constructing TES task requests based on notebook cell content.
      • Submitting these tasks to the TES server.
      • Monitoring task execution status.
      • Retrieving results and outputs.
    • Notebook Integration: Develop a mechanism within the Jupyter notebook environment to:
      • Identify code cells to be executed on TES. (Perhaps a magic command like %%tes or a dedicated cell tag)
      • Extract code and dependencies from these cells.
      • Package them into a format suitable for TES (e.g., Docker image).
      • Display task status and results within the notebook.
    • Cell Identification: Use a magic command (e.g., %%tes) or a cell tag to mark cells for TES execution.
    • Dependency Management: Automatically detect or allow users to specify required packages for the code in the cell.
    • Task Creation: The client library will generate a TES task definition:
    • Inputs: Code from the cell, required data, and dependency specifications.
    • Container: A Docker image containing the execution environment (with necessary packages).
    • Command: The command to execute the code within the container.
    • Outputs: Specify where to store the results (files, object storage).
    • Submission and Monitoring: Submit the task to TES and provide visual feedback in the notebook (e.g., a progress bar, status updates).
    • Result Retrieval: Fetch outputs from TES and display them in the notebook (e.g., print output, display plots).
  • Implementation Considerations
    • Security: Securely handle authentication and authorization to the TES instance.
    • Scalability: Design for efficient execution of large notebooks with many cells and complex dependencies.
    • Usability: Provide a user-friendly interface within the notebook for TES interaction.
    • Flexibility: Support the option to choose from multiple TES instances and allow customization of task parameters.
  • Tools and Technologies
    • TES Implementations: Funnel, TESK, TES Azure
    • Python TES Client: py-tes
    • Docker: For containerization
    • Jupyter Extensions: To enhance the notebook interface

Example Workflow for individual cell execution

  1. User adds %%tes to a code cell in their Jupyter notebook.
  2. A "Run on TES" button appears next to the cell.
  3. User clicks the button.
  4. The client library packages the code, data, and dependencies into a Docker image.
  5. A TES task is created and submitted.
  6. The notebook displays the task status (e.g., "Queued," "Running," "Complete").
  7. When the task finishes, the results are fetched from TES and displayed in the notebook.

If you want to work on this issue:

  • Assign yourself to the issue (if someone else is already assigned, first ask them if they would mind help on the issue - or pick another one)
  • Once assigned, move your issue to the "In progress" column on the project board
  • Start working 🚀
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

1 participant