Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPAD 1.2 to GPAD 2.0 seems to make an error with some relations #397

Open
kltm opened this issue Oct 8, 2024 · 10 comments
Open

GPAD 1.2 to GPAD 2.0 seems to make an error with some relations #397

kltm opened this issue Oct 8, 2024 · 10 comments

Comments

@kltm
Copy link
Member

kltm commented Oct 8, 2024

GPAD "1.2" input:

UniProtKB	P46934	enables	GO:0004842	PMID:17996703	ECO:0000315			20180501	WB	part_of(GO:0006511),part_of(GO:0006974),occurs_in(GO:0005634),has_input(UniProtKB:P24928),directly_negatively_regulates(GO:0001055)	model-state=production|noctua-model-id=gomodel:5323da1800000002|contributor=https://orcid.org/0000-0002-1706-4196

Produces the GPAD 2.0 output:

UniProtKB:P46934		RO:0002327	GO:0004842	PMID:17996703	ECO:0000315			2018-05-01	WB	BFO:0000050(GO:0006511),BFO:0000050(GO:0006974),BFO:0000066(GO:0005634),RO:0002233(UniProtKB:P24928),RO:0002449(GO:0001055)	model-state=production|noctua-model-id=gomodel:5323da1800000002|contributor=https://orcid.org/0000-0002-1706-4196

Of specific interest to us is that, in the extensions, it looks like directly_negatively_regulates is getting translated to RO:0002449, instead of the expected RO:0002630. Taking a quick look around the ontobio code, while I could find some hardwired stuff in rdfgen/, I didn't see anything that would obviously produce this.

My instinct is that there would either be an old hard-coded cache or the use of a stale ontology file. I'm also thinkging there might some composition logic going on with enables/.

This is possibly going to be the GO "shadow ticket" for a future ontobio ticket.

Tagging @dustine32 @mugitty @sierra-moxon , as people who probably have the most recent familiarity with the extensions transformation.

Also looping in reporters: @balhoff @vanaukenk @pgaudet

@sierra-moxon
Copy link
Member

@sierra-moxon
Copy link
Member

short term, an update to relations.py hard-coded dict might fix this? longer term, we may want to consider storing the mapping outside the code.

@kltm
Copy link
Member Author

kltm commented Oct 8, 2024

@sierra-moxon I would agree with both of those points.

(I missed the .replace(" ", "_"), so also missed how this got into the stream.)

@dustine32 , it looks like you've been most active in there. Have you been handling that manually, or have you had some other process?

@sierra-moxon
Copy link
Member

sierra-moxon commented Oct 8, 2024

@kltm - also, can you confirm your example is using the same "row"/annotation in both the 1.2 and 2.0 format? I put in a PR, and am writing a test for it.

biolink/ontobio#696

@kltm
Copy link
Member Author

kltm commented Oct 8, 2024

@sierra-moxon Apologies--copy-paste mistake. I've updated the initial issue above to the correct lines: #397 (comment)

@dustine32
Copy link
Contributor

@kltm Yes, I've just been manually updating this hard-coding. It would be nice to eventually read in RO and do simple label->ID lookups. I think the initial impetus for this relations.py lookup was to have some control over transforming data during MOD imports, though we could also control it in gocamgen.py here.

I'm guessing there must've been an update to these relations in RO and @sierra-moxon's PR brings it back up-to-date.

@mugitty
Copy link
Contributor

mugitty commented Oct 8, 2024

Unrelated, but FYI - Ontobio also repairs relations via GORULE61 in qc.py

@pgaudet
Copy link
Contributor

pgaudet commented Oct 9, 2024

It would be nice to eventually read in RO and do simple label->ID lookups.

The thing is, this relation is valid in RO, and the inference chain is correct. However we have decided that we dont want to use this in GO. I had added subsets in RO to express which relations were valid for GO (annotations, extensions, ontology...) but @cmungall remarked that we shouldn't have information specific to a project in a 'general'/broader application ontology, which makes sense.

My question is, where can we put this information, since RO is not sufficient? I see a few options:

  1. We trust RO and deepen relations as appropriate (but that would be reverting many changes that we asked curators to fic)
  2. Use GOREL (which seems to be unavoidable?)
  3. Use ShEX.

There must be a documented source of truth for what's hard-coded in the code.

Thanks, Pascale

@sierra-moxon
Copy link
Member

sierra-moxon commented Oct 9, 2024

For my understanding, is GOREL curated or otherwise automatically maintained as the source of truth for relations applicable in GO? From this file: https://github.com/geneontology/go-ontology/blob/master/src/ontology/extensions/gorel.obo I see that all terms in GOREL have an xref to RO. Can I use that with confidence to do this same work in the code (but use the GOREL ontology instead of the map in the code)? If that is the case, I don't think it would be terribly hard to convert.

@kltm
Copy link
Member Author

kltm commented Oct 9, 2024

Okay, I want to separate this into two issues: 1, here) getting the immediate fix in, which @sierra-moxon has done with the PR above; 2) a long-term solution to make sure these are in sync in an automated way. The latter conversation should be had here now: #398

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants