Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First step: Import Reactome Pathways for MOD species #298

Open
2 of 3 tasks
ukemi opened this issue Oct 25, 2023 · 6 comments
Open
2 of 3 tasks

First step: Import Reactome Pathways for MOD species #298

ukemi opened this issue Oct 25, 2023 · 6 comments
Assignees

Comments

@ukemi
Copy link

ukemi commented Oct 25, 2023

This step is to provide a ground state assessment of what pathways for orthology projections look like that are imported from Reactome. Decisions to be made:

  • which species?
  • All pathways or a subset?
  • Results of the import

Pathways will be imported as development models
Shex and logical checks will be run on pathways
Evidence on pathways will be left blank and filled in during the second step when possible
We will explore other ways to capture evidence

@ukemi ukemi self-assigned this Oct 25, 2023
@ukemi ukemi transferred this issue from geneontology/pipeline Oct 25, 2023
@ukemi
Copy link
Author

ukemi commented Oct 27, 2023

  1. Because when we were importing individual pathways at the NYC meeting we lost information, we would like to import the whole pathway space for the organism.
  2. Projected models for all Alliance species. We will closely review fly, worm and mouse.
  3. @dustine32 @kltm can the underlying database handle all of these new models? There are 1910 pathways imported from Reactome. At most if we import all of the Alliance organisms, it will be 1910 x 8= 15280.
  4. Set up a time frame for when we can do the first run of this. This initial import will be for a high-level view of they way things stand right now.
  5. How will these models be used? Cloned or edited? We need to put together some SOPs for the use of these models. What will people use these models for? Duplication will effect over-representation results.

@kltm
Copy link
Member

kltm commented Oct 27, 2023

@ukemi Re "3". It can be tested, but a ~1/3 increase would be expected to slow things down a bit. With ongoing infrastructure work and other imports going on, I think we need to clearly spell out what the overall growth is expected to be over the next year or so, before giving commitments. As well, I'd like to understand the timeframe that this conversation is occurring in.

@deustp01
Copy link
Collaborator

1910 x 8= 15280

... is an upper bound. For now, 3 species (worm, mouse, fly) comes with an upper bound of 5,730. And if we can find a way of building GO-CAMs for selected pathways with no information loss (rather than having to build a complete GO-CAM set - @dustine32 can explain this better) that may reduce the bound further and slow the growth in numbers of GO-CAMs.

At the same time, if the Alliance wants integrated (i.e., GO-CAM / Noctua rather than classic atomic GO) annotation and wants to exploit shared biology between model organisms, something like this project with something like this demand on resources seems necessary.

@thomaspd
Copy link

thomaspd commented Nov 3, 2023

I think this should not be a priority right now. It would make much more sense to finish the conversion of the native (i.e. human) Reactome pathways first: we know there's a lot of work still to be done there, so we know all of the remaining issues will be found in the non-human projected pathways as well.

If this ticket is driven by a specific request from one or more MODs, we should have some larger meetings (not just the Reactome2GO team) to discuss how and when it would make sense to work on it.

@deustp01
Copy link
Collaborator

deustp01 commented Nov 3, 2023

I think this should not be a priority right now.

For a large-scale systematic effort, I agree. For pilot work to figure out procedures and strategies, including careful thinking about the resources to do this on a large scale, we need to be at work now. In the specific case of resources / infrastructure, with budgeting done in 5-year chunks, we are stuck planning now for usage several years from now. My own view of tactics is that it makes sense to start with a subgroup of people who are committed to pathway annotation and comfortable, or willing to become comfortable, with the GO-CAM / Noctua curation environment. That work, at that scale, is already underway and looks promising.

It makes no sense to me to tell a potential user like Steven Marygold / FlyBase that he should wait for an indefinite period before starting to use the fairly complete models we can already generate for metabolism, because the whole project must proceed as a monolith, so nothing moves until complexities of signaling and cell cycle progression are definitively sorted out.

A key goal of this initial phase if the development of templates and annotation strategies that others will be comfortable using. It has not escaped our notice that such templates might be a useful way to overcome general reluctance to adopt GO-CAM / Noctua as a human-friendly curation environment.

I'm speaking in part as the PI of the pathways2GO U24 project that will end (really end, not just pause pending possible renewal) on June 30, 2026, at which time work will be continued by the participating organizations per the language of the grant application.

We can discuss this further on Wednesday.

@thomaspd
Copy link

thomaspd commented Nov 3, 2023

Yes, we can discuss on Wednesday.

If Steven's needs are driving this prioritization decision, then I think we should have that discussion with him and include a larger group. My understanding after talking to him recently was that they are currently looking for a curator for fly metabolic pathway GO-CAMs, and he will reach out when they have hired someone.

I agree that this does not have to be monolithic or indefinite in time frame, but there are remaining items to address even for metabolic pathway conversion to GO-CAM that we should finish first, to make sure we are maximally efficient with curator time, even for a pilot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants