Create a "non-data" full release pipeline (ontology metadata curation only) #382

kltm · 2024-07-23T22:32:10Z

In order to support the joint pipeline with GOA, we want to create a high-frequency high-success pipeline that produces all of the data products that GOA needs to complete their parts of the pipeline.

We want to produce:

Ontology
User, groups, dbxrefs, and other metadata
Curation tool resources
- PAINT annotations
- Noctua-derived annotations (standard and GO-CAM)
- "Automated upstreams" (MGI)

In such a way as to enable easy pickup and signalling for GOA.

kltm · 2024-07-23T22:32:43Z

Current discussion is looking at:

daily
dated/released
signalled to GOA

kltm · 2024-07-23T22:33:00Z

Tagging @pgaudet

kltm · 2024-07-23T22:47:38Z

Considering doing a full "raw data" release, including Zenodo and a CF endpoint. This may actually be easiest as it mirrors what we already do--basically the first stage of snapshot, plus the second stage's "publish" step. I'll want to check size and whether Zenodo can digest; we want this fully automated and on smooth rails. Maybe skip Zenodo, as we still will have the full release there.

kltm · 2024-08-08T19:11:18Z

Basing around "raw-data"

…line; for geneontology/pipeline#382

…files are now the same (empty) and it seems to be tripping up for some reason; for geneontology/pipeline#382

kltm · 2024-08-14T23:50:56Z

I'm doing some exploring of a partial run. Looking at what I have, I expect that all raw upstreams and first-order products (excluding blazegraph and solr), to run about 10G. This puts us well under typical limits for Zenodo and our usual publications (which clock in at nearly 50G). If working weekly, this would allow us to use a monthly buffer (or/with S3 lifecycle) or Zenodo as transport without incurring too much overhead or cost.

…iate targets and flush for raw-data.geneontology.org; for #382

kltm · 2024-08-22T19:54:30Z

From talking to @pgaudet, I think I'll move raw-data.geneontology.org a little closer to where we want it to be by removing "annotations/" and "blazegraph/".

…rimary' annotations; for #382

kltm · 2024-08-22T21:49:45Z

TBD: after talking to Alex, the best way to package and communicate our data for remote processing and re-ingest.

kltm added the enhancement label Jul 23, 2024

kltm self-assigned this Jul 23, 2024

kltm added this to GOC / GOA Joint Pipeline Jul 23, 2024

kltm moved this to In Progress in GOC / GOA Joint Pipeline Jul 23, 2024

kltm changed the title ~~Create a "non-data" full release pipeline (ontology and metadata only)~~ Create a "non-data" full release pipeline (ontology metadata curation only) Jul 23, 2024

kltm mentioned this issue Jul 25, 2024

Rolling ontology build should have a time stamp #384

Open

kltm mentioned this issue Aug 8, 2024

Missing GO term pombase/canto#2852

Closed

kltm added a commit that referenced this issue Aug 8, 2024

inital setup for a try at raw data pipeline; for #382

336b2be

kltm added a commit that referenced this issue Aug 8, 2024

merge-ish; for #382

d9dc3a3

kltm added a commit that referenced this issue Aug 8, 2024

doc for #382

063b8bb

kltm added a commit to geneontology/go-site that referenced this issue Aug 8, 2024

attempt to use dummy files experiment with parameters of partial pipe…

412ee8c

…line; for geneontology/pipeline#382

kltm added a commit that referenced this issue Aug 8, 2024

shift to experimental repos for test of #382

046201a

kltm added a commit that referenced this issue Aug 9, 2024

try and paper over goa_uniprot_all issues; for #382

7249b5e

kltm added a commit to geneontology/go-site that referenced this issue Aug 11, 2024

alter the creation of the noiea version to something trivial, as the …

03fccde

…files are now the same (empty) and it seems to be tripping up for some reason; for geneontology/pipeline#382

kltm added a commit to geneontology/go-site that referenced this issue Aug 11, 2024

make non-gzipped version of empties; for geneontology/pipeline#382

20bbb9b

kltm added a commit to geneontology/go-site that referenced this issue Aug 11, 2024

syntax whoops; for geneontology/pipeline#382

0bf85ee

kltm added a commit that referenced this issue Aug 14, 2024

add go-raw-data to watchdog, add in metadata and publish, add appropr…

62d0596

…iate targets and flush for raw-data.geneontology.org; for #382

kltm added a commit that referenced this issue Aug 15, 2024

references no longer done here; #382

0d9cac4

kltm added a commit that referenced this issue Aug 16, 2024

bucket name error; for #382

b686633

kltm added a commit that referenced this issue Aug 22, 2024

try and trim for a slightly more compact profile--no blazegraph or 'p…

7687e6b

…rimary' annotations; for #382

kltm mentioned this issue Oct 2, 2024

Assemble final second stage post-GOA with derivatives-from-goa pipeline #393

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a "non-data" full release pipeline (ontology metadata curation only) #382

Create a "non-data" full release pipeline (ontology metadata curation only) #382

kltm commented Jul 23, 2024

kltm commented Jul 23, 2024

kltm commented Jul 23, 2024

kltm commented Jul 23, 2024 •

edited

Loading

kltm commented Aug 8, 2024

kltm commented Aug 14, 2024

kltm commented Aug 22, 2024

kltm commented Aug 22, 2024 •

edited

Loading

Create a "non-data" full release pipeline (ontology metadata curation only) #382

Create a "non-data" full release pipeline (ontology metadata curation only) #382

Comments

kltm commented Jul 23, 2024

kltm commented Jul 23, 2024

kltm commented Jul 23, 2024

kltm commented Jul 23, 2024 • edited Loading

kltm commented Aug 8, 2024

kltm commented Aug 14, 2024

kltm commented Aug 22, 2024

kltm commented Aug 22, 2024 • edited Loading

kltm commented Jul 23, 2024 •

edited

Loading

kltm commented Aug 22, 2024 •

edited

Loading