Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Represent enzymatic complex enablers according to GO-CAM spec #302

Open
thomaspd opened this issue Oct 30, 2023 · 12 comments
Open

Represent enzymatic complex enablers according to GO-CAM spec #302

thomaspd opened this issue Oct 30, 2023 · 12 comments

Comments

@thomaspd
Copy link

thomaspd commented Oct 30, 2023

For Reactome complexes that are controllers of enzymatic reactions, these should be aligned with GO-CAM to specify gene product IDs and not PRO IDs. To do this, we will need to handle two different cases:

  1. If Reactome specifies the catalytic subunit, the enabler of the reaction should be the catalytic subunit and not the complex. The rest of the complex should be ignored for now.
  2. If Reactome does not specify the catalytic subunit, the enabler of the reaction should be a GO protein-containing complex instance (GO:0032991) with has_part links to each protein subunit. Small molecule components of complexes should be ignored for now. So if a Reactome complex is composed of just one gene product and one or more small molecules, then it should be treated the same as case 1 above and connect the activity directly to the gene product without a protein-containing complex individual.
@deustp01
Copy link
Collaborator

Yes! We have treated case 2 as obviously true for a long time, but I'm not sure the code to enable it has ever been implemented in the GO-CAM conversion process. And when it is implemented, it should also generate a report of all the catalystActivity instances that it fixed in this way, to be fed back to Reactome to patch the Reactome annotations that are the single source of truth here. @dustine32 ?

@dustine32
Copy link
Collaborator

@deustp01 Yes, we can log out the number 2 cases where the resulting "complex" only has a single has_part GP.

@thomaspd If you can find an example complex with the active unit specified that would help. I'll keep looking too.

@dustine32
Copy link
Collaborator

Also, asking @huaiyumi for any examples of active subunit annotation in Reactome that I can look for in the BioPAX.

@deustp01
Copy link
Collaborator

deustp01 commented Oct 31, 2023

There are 1784 catalystActivity instances in our central database whose activeUnit slot is not null. Let me figure out who to ask here to get you a table of the subset of these instances that has actually been released. We should be able to generate a table of the dbID of each instance, its physicalEntity (the complex), its activeUnit (the individual EWAS gene product), and the dbID and name of the reaction in which it occurs. Are there other attributes you'd want in the table?

@deustp01
Copy link
Collaborator

deustp01 commented Nov 2, 2023

@dustine32 but meanwhile, here is a short arbitrary list of catalystActivity instances whose physicalEntity is a heteromeric complex and whose activeUnit is a protein monomer, as a starting point to begin to explore the BioPAX to see what can be done on point 2, above, and what a useful format would be for bulk processing.

https://reactome.org/content/schema/instance/browser/1806156
https://reactome.org/content/schema/instance/browser/5676051
https://reactome.org/content/schema/instance/browser/6798176
https://reactome.org/content/schema/instance/browser/1806283
https://reactome.org/content/schema/instance/browser/8868073
https://reactome.org/content/schema/instance/browser/5358378
https://reactome.org/content/schema/instance/browser/109879
https://reactome.org/content/schema/instance/browser/9836928

Each URL points to a page that lists the names and dbIDs of the heteromeric complex, the protein monomer activeUnit, and the reaction that the caqtalystActivity mediates.

I can also make a list of samples of catalystActivity instances where the physicalEntity is a set of heteromeric complexes and the activeUnit is a set of monomers or a set of subcomplexes, also of cases where the heteromeric complex involves both protein and non-protein (RNA or DNA or small-molecule) subunits, if any of those are of interest.

I hope, from this test material, we can figure out what you need in a comprehensive list.

@ukemi
Copy link

ukemi commented Nov 2, 2023

@dustine32 Does the catalyst activity here help? R-HSA-21271

@dustine32
Copy link
Collaborator

@deustp01 @ukemi Thank you for these examples! I don't really need the full list of all activeUnits as these few helped me find where in the BioPAX I can expect to find them. An example for reaction R-HSA-1675883:

  <bp:Catalysis rdf:ID="Catalysis1698">
    <bp:controller rdf:resource="#Complex3671" />
    <bp:controlled rdf:resource="#BiochemicalReaction3397" />
    <bp:controlType rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ACTIVATION</bp:controlType>
    <bp:xref rdf:resource="#RelationshipXref3080" />
    <bp:xref rdf:resource="#RelationshipXref3090" />
    <bp:dataSource rdf:resource="#Provenance1" />
    <bp:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">activeUnit: #Protein9680</bp:comment>
  </bp:Catalysis>

Here, the activeUnit, which eventually points to PI4KB [Golgi membrane] (Homo sapiens) for complex ARF1/3:GTP:PI4KB, is embedded in a comment field. Not the greatest feeling about this placement but it'll definitely do for now!

@deustp01
Copy link
Collaborator

deustp01 commented Nov 2, 2023

is embedded in a comment field

If I understand what you're saying correctly, yes, if you look at the instancebrowser view for the EWAS PI4KB [Golgi membrane] (Homo sapiens) its role as the activeUnit of a complex involved in catalysis is shown as a comment. But if you come at the annotation from the other direction - start with the catalystActivity instance 1-phosphatidylinositol 4-kinase activity of ARF1/3:GTP:PI4KB [Golgi membrane] then its role is shown as an attribute. Or am I misunderstanding the problem?

Also, it makes sense to me to work starting from reactions that have associated catalystActivities, systematically looking at what the physicalEntity of each catalystActivity is, and if that physicalEntity is not at EWAS or set of EWASs, then proceed further to see if it fits case 2 above.

Also also, if a by-product of this survey were a list of catalystActivites where the physicalEntity is a complex or set of complexes but the activeUnit slot is null, that list would be the starting point for re-curation to fill the empty slots. And if in each case the components of the complex could be checked in the central GO annotation file to see if any have been assigned the same GO molecular function as Reactome has assigned to the whole complex, that would make the re-curation process at Reactome much faster and more reliable. @ukemi I know we talked about something like this with Ben Good; I don;t know how close he got to implementing it.

Or does this last part duplicate work you've already done to generate the tables described in #296 (which I haven't looked at yet)?

@deustp01
Copy link
Collaborator

deustp01 commented Nov 13, 2023

And if in each case the components of the complex could be checked in the central GO annotation file to see if any have been assigned the same GO molecular function as Reactome has assigned to the whole complex, that would make the re-curation process at Reactome much faster and more reliable. @ukemi I know we talked about something like this with Ben Good; I don;t know how close he got to implementing it.

That failed - in many cases the catalystActivity of the whole complex has been assigned to all of its protein components - but perhaps re-doing it with the cleaned-up fly set of complex component functions would yield good results.

@deustp01
Copy link
Collaborator

deustp01 commented Nov 13, 2023

Summarizing the discussion so far as a to-do list.

  • Identify cases where Reactome has assigned a catalystActivity to an entire heteromeric complex but experimental data identify one of the gene product components of the complex as capable of the catalystActivity. Annotate that component of the complex as its activeUnit.
  • In the case of a homomeric complex, creation of activeUnit annotations is formally redundant, but is it useful to support accurate parsing of Reactome BioPAX entries into GO-CAM models?
  • In cases where the catalystActivity is an emergent property of two or more gene product components of the complex, current GO documentation suggests that correct GO annotation is to assert that all of these components "contribute to" the catalystActivity. In Reactome, the activeUnit attribute of a catalystActivity instance can be multivalued. Could we enforce a new curation standard in Reactome, connected to a new rule for parsing BioPAX into GO-CAM, that if the activeUnit slot has a single gene product entry, that gene product "enables" the catalystActivity while if the slot has more than one value, each of those gene products "contributes to" the molecular activity?
  • To annotate regulatory subunits of multimeric complexes, Reactome allows creation of regulation instances whose physicalEntity is a complex and whose [regulatory] activeUnit is a gene product subunit of the complex, so regulatory abilities of individual subunits of the catalytic complex could be annotated just as catalytic abilities are, and emergent regulatory abilities of two or more subunits could be annotated just as emergent catalytic activities are.
  • Can we capture
  • In all of this, can the cleaned-up list of annotations of specific functions to specific components of Drosophila complexes (previous comment) be useful to predict specific functions of specific components of the homologous human complexes?
    Tag @huaiyumi @vanaukenk @sjm41 to be sure they're on this ticket

@deustp01
Copy link
Collaborator

deustp01 commented Nov 14, 2023

@deustp01 Yes, we can log out the number 2 cases where the resulting "complex" only has a single has_part GP.

@dustine32 @ukemi just to document the current state / need, here is an active unit in the first reaction of the "carnitine biosynthesis" pathway and in the derived GO-CAM. The physical entity is a complex involving one copy of one gene product and one copy each of a couple of chemical entities. Can the GO-CAM generation script be re-done to identify the gene product and make it the enabler?

Screenshot 2023-11-14 at 5 47 50 PM

Or if that is hard or dangerous, can we plan to generate a list (partial is OK to start) of the number 2 cases, that we can use to figure out how to bulk-edit the Reactome annotations to add the missing activeUnit annotations, so that the existing GO-CAM genertion script can use them? (A practical issue here is whether David and I, as we (re)curate pathways should add this information manually as part of our work, or leave it out because a script will soon be available to do it automatically?

@ukemi
Copy link

ukemi commented Sep 5, 2024

@dustine32 I think this proposal is directly in line with what we had talked about on the call today. I think you have already done it for when there is a single GP as the enabler, but we should do it with the complexes too. For these cases, would it be possible to use the UniProt GCRP identifiers instead of a REACTO id? We need to start weeding away the Reacto identifiers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants