Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs framework for how to clone a new DANDI instance #104

Open
wants to merge 78 commits into
base: master
Choose a base branch
from

Conversation

aaronkanzer
Copy link
Member

@aaronkanzer aaronkanzer commented Dec 1, 2023

Looking for review for now, no need to merge

These documents provide a step-by-step process if another user would like to launch their own Dandi-like ecosystem

please see here if you'd like to observe a live link: https://aquamarine-profiterole-e20e84.netlify.app/

or specifically:

https://lincbrain.github.io/handbook/40_initialization/

@jwodder
Copy link
Member

jwodder commented Dec 1, 2023

@aaronkanzer Why do the instructions say to create an account on PyPI? That should only be done if you're planning to release packages on PyPI, which has nothing to do with interacting with DANDI.

@kabilar
Copy link
Member

kabilar commented Dec 1, 2023

@aaronkanzer Why do the instructions say to create an account on PyPI? That should only be done if you're planning to release packages on PyPI, which has nothing to do with interacting with DANDI.

Hi @jwodder, These instructions are meant for the developers of the data archive and associated tools, especially for developing a new DANDI-like ecosystem which we are doing for the LINC project. Since the DANDI CLI and Python API are a method of interacting with the archive, Aaron added instructions here for releasing the Python package to PyPI. Hope this helps to answer your question.

@jwodder
Copy link
Member

jwodder commented Dec 1, 2023

@kabilar

Since the DANDI CLI and Python API are a method of interacting with the archive, Aaron added instructions here for releasing the Python package to PyPI.

Releasing what package? This sentence implies you'll be releasing dandi, which only four people can do, none of which are you.

@kabilar
Copy link
Member

kabilar commented Dec 1, 2023

@kabilar

Since the DANDI CLI and Python API are a method of interacting with the archive, Aaron added instructions here for releasing the Python package to PyPI.

Releasing what package? This sentence implies you'll be releasing dandi, which only four people can do, none of which are you.

Hi @jwodder, we are releasing a clone of the dandi client, as the lincbrain client for interacting with LINC datasets. (Please disregard the current semantic version as it will be deleted and released as 0.X.0.)

@yarikoptic
Copy link
Member

Hi @aaronkanzer , should we strive to finalize this PR to some form and merge? That

  • would help avoiding future conflicts
  • give others immediate access to such useful docs, e.g. we would need to get gears spinning for the EMBER deployment

etc

@aaronkanzer
Copy link
Member Author

Hi @aaronkanzer , should we strive to finalize this PR to some form and merge? That

  • would help avoiding future conflicts
  • give others immediate access to such useful docs, e.g. we would need to get gears spinning for the EMBER deployment

etc

Thanks @yarikoptic -- yes, it would be great to get review outside of my own words -- I am currently in the process of some cleanup here; however, review would be helpful.

It is still quite a living document as there are more and more things that I think @kabilar and I are slowly abstracting -- let me know if you'd like to Zoom to discuss how we could make the handbook beneficial here for EMBER deployment in the short-term

@aaronkanzer aaronkanzer changed the title Add docs framework for how to clone a new DANDI instance, add Initialize Vendor Accounts page Add docs framework for how to clone a new DANDI instance Oct 25, 2024
@aaronkanzer
Copy link
Member Author

aaronkanzer commented Oct 25, 2024

@kabilar @yarikoptic @satra @asmacdo @jwodder @waxlamp @jjnesbitt @mvandenburgh

Hi all, I'd like to start the review process (and get opinions on what is unclear/missing) for this PR. This PR is a brain-dump of essentially "how to clone DANDI" in its current state

Here is a working doc as well on thematic differences between DANDI and LINC (a downstream clone-turned-fork)
as well

To perhaps make the PR much more approachable, I tagged specific users at the top of given pages that no one is required to review the entire PR.

If you'd like to visualize the docs live, I've launched a temp. Netlify site -- https://aquamarine-profiterole-e20e84.netlify.app/59_getting_started_replicating_dandi/

Thanks all in advance

@@ -0,0 +1,6 @@
The DANDI ecosystem includes a self-hosted Jupyter notebook service. This service is orchestrated on a Kubernetes (k8s) cluster
Copy link
Member Author

@aaronkanzer aaronkanzer Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asmacdo would be great if you could review

For reference to easily read in staging setting: https://aquamarine-profiterole-e20e84.netlify.app/65_dandi_hub/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With @kabilar 's suggestions, LGTM

@@ -0,0 +1,83 @@
# Work In Progress

## Setting up your GitHub OAuth Account
Copy link
Member Author

@aaronkanzer aaronkanzer Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waxlamp @jjnesbitt @mvandenburgh would be great if you could review

For reference to easily read in staging setting: https://aquamarine-profiterole-e20e84.netlify.app/61_dandi_authentication/

@@ -0,0 +1,448 @@
# Initialize Vendor Accounts
Copy link
Member Author

@aaronkanzer aaronkanzer Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waxlamp @yarikoptic @satra @kabilar would be great if you could review

For reference to easily read in staging setting: https://aquamarine-profiterole-e20e84.netlify.app/60_initialize_vendors/

Copy link
Member

@asmacdo asmacdo Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs Dockerhub (or some other container registry)?

@@ -0,0 +1,48 @@
For data management (predominately `upload`, `download` and `validation` of data
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwodder @yarikoptic would be great if you could review

For reference to easily read in staging setting: https://aquamarine-profiterole-e20e84.netlify.app/62_dandi_cli/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Aaron. Just noting that given your recent developments to push lincbrain-cli changes upstream to the dandi-cli in dandi/dandi-cli#1519, we will need to update these instructions.

@@ -0,0 +1,185 @@
# Work In Progress

## Configuring Terraform
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waxlamp @jjnesbitt @mvandenburgh would be great if you could review

For reference to easily read in staging setting: https://aquamarine-profiterole-e20e84.netlify.app/63_dandi_infrastructure/

@@ -0,0 +1,203 @@
This step assumes that you have completed all steps in: [Initialize Vendors](../60_initialize_vendors) & [DANDI Infrastructure](../63_dandi_infrastructure)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waxlamp @jjnesbitt @mvandenburgh @yarikoptic @satra @kabilar would be great if you could review

For reference to easily read in staging setting: https://aquamarine-profiterole-e20e84.netlify.app/64_dandi_archive/

Comment on lines +1 to +2
The DANDI ecosystem includes a self-hosted Jupyter notebook service. This service is orchestrated on a Kubernetes (k8s) cluster
that provides different instance types of users to efficiently interact with data in the DANDI Archive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The DANDI ecosystem includes a self-hosted Jupyter notebook service. This service is orchestrated on a Kubernetes (k8s) cluster
that provides different instance types of users to efficiently interact with data in the DANDI Archive.
The DANDI ecosystem includes a self-hosted Jupyter notebook service. This service is hosted on AWS and orchestrated with a Kubernetes (k8s) cluster
that provides different instance types for users to efficiently interact with data in the DANDI Archive.

Comment on lines +4 to +6
[Proceed to the following README](https://github.com/dandi/dandi-hub/blob/main/README.md#dandihub) to see how you can
set up your own DANDI Hub -- **Note: it is important that your k8s cluster is in the same region
as your data**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[Proceed to the following README](https://github.com/dandi/dandi-hub/blob/main/README.md#dandihub) to see how you can
set up your own DANDI Hub -- **Note: it is important that your k8s cluster is in the same region
as your data**
The instructions for configuring and deploying your own JupyterHub instance are available in the [dandi-hub repository](https://github.com/dandi/dandi-hub) (see [README](https://github.com/dandi/dandi-hub/blob/main/README.md#dandihub)).
For example configurations that have been previously generated for the DANDI, LINC, and BICAN projects see the [envs directory](https://github.com/dandi/dandi-hub/tree/main/envs).
**Note: it is important that your k8s cluster is in the same region as your data.**

@kabilar
Copy link
Member

kabilar commented Oct 25, 2024

Hi all, I'd like to start the review process (and get opinions on what is unclear/missing) for this PR. This PR is a brain-dump of essentially "how to clone DANDI" in its current state

Thank you, Aaron. This is great. I will be reviewing over the next week and slowly adding suggestions.


[Proceed to the following README](https://github.com/dandi/dandi-hub/blob/main/README.md#dandihub) to see how you can
set up your own DANDI Hub -- **Note: it is important that your k8s cluster is in the same region
as your data**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to your Google Doc, perhaps we can add some quick links to each page.

Suggested change
as your data**
as your data**
Resources
1. [Source code and instructions]( https://github.com/dandi/dandi-hub)
1. [DANDI Hub](https://hub.dandiarchive.org/)
1. [LINC Hub](https://hub.lincbrain.org/)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kabi -- added.

Comment on lines 19 to 21
**Datalad (TBD)**

**git-annex (TBD)**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are just tools -- no account needed and overall they just rely on above services (GitHub) and git configuration to run.

But for completeness -- we do need a host (in DANDI case it is drogon server) with an account under which to run all those additional "cron jobs", hence

Suggested change
**Datalad (TBD)**
**git-annex (TBD)**
In addition a host (local server or an instance in cloud) is needed to run additional services employing DataLad and git-annex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are cron-jobs part of the infrastructure? or should be considered essential?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have definition of "infrastructure" to define the boundary.

if we consider https://github.com/dandisets etc as part of infrastructure, then yes.

as "essential" -- likely not as long as not integrated within dandiarchive.org web UI.

Comment on lines +446 to +448
## datalad (TBD)

## git-annex (TBD)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yarikoptic needs to finish up but likely elsewhere and here just describe setup of the box:

Suggested change
## datalad (TBD)
## git-annex (TBD)
## A host for extra services
Some services are not yet integrated within the main infrastructure:
- https://github.com/dandi/backups2datalad - to populate/update https://github.com/dandi/dandisets, https://github.com/dandisets, and https://github.com/dandizarrs/
- TODO: heroku logs
- TODO: aws s3 access stats dump
- TODO: con/tinuous dumps of CI logs
- TODO: zarr manifests generation (ATM not on drogon even)
- TODO: access stats analysis/plots (yet to be finished/cron deployed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yarik -- updated

@aaronkanzer
Copy link
Member Author

aaronkanzer commented Oct 30, 2024

@satra @waxlamp @jwodder @jjnesbitt @mvandenburgh @asmacdo -- just wanted to bump this, any chance a quick read-through, feedback could occur?

The outcomes of this handbook will help inform what we automate/abstract into infra-as-code vs. what remains manual, thus any feedback is greatly appreciated

Comment on lines 5 to 17
**Heroku**

**AWS**

**GitHub**

**Terraform Cloud**

**Netlify**

**Sentry**

**PyPI**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be good to add what each of these services provide to the infrastructure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Satra -- included a brief blurb for what each service is responsible for

style="width: 60%; height: auto; display: block; margin-left: auto; margin-right: auto;"/>
<br/><br/>

Keep this value for further steps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be valuable to know what sizes of instances we have for DANDI and LINC to guide installations of other instances.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure -- I can add, they are defined in the DANDI Infra api.tf Girder extension here: https://github.com/dandi/dandi-infrastructure/blob/master/terraform/api.tf#L14-L18

On this note, has any stress-testing ever been done to evaluate if these worker sizes are appropriate or not for DANDI?

style="width: 60%; height: auto; display: block; margin-left: auto; margin-right: auto;"/>
<br/><br/>

Your frontend should be able to deploy to an auto-generated URL via Netlify now! Steps for domain management and configuration are described further in the [Frontend Deployment](../64_dandi_archive/#frontend-deployment) section of these within the DANDI Archive setup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question came up on knowing how many minutes is needed by netlify for DANDI instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say minutes in this context for Netlify, do you mean "build minutes"? (e.g. how long Netlify runners are required to run to deploy?) Or something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

6 participants