Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use functional syntax instead of OBO #121

Merged
merged 11 commits into from
Oct 25, 2024
Merged

Use functional syntax instead of OBO #121

merged 11 commits into from
Oct 25, 2024

Conversation

balhoff
Copy link
Member

@balhoff balhoff commented Oct 11, 2024

This PR is an alternative to #119, and would fix geneontology/noctua#902.

In this repo we have had a few different hacks related to prefixes and OBO format. In #119 I updated things to use the new OBO support for prefixes, but still ran into a few corner cases. Here I just updated the text output from the perl scripts to write OFN instead of OBO. This gives us much more control over the exact OWL we end up with, and has reliable handling of prefixes. Also, now the OBO file is a terminal output and doesn't need to round-tripped within this build.

I think the only remaining issues are those with TAIR IDs mentioned in #119. But I do still need to do a careful diff. @cmungall suggests taking a simple prefix-based approach to the TAIR IDs (as done here) since the IRIs aren't used currently in Noctua models anyway.

@balhoff balhoff requested review from kltm and cmungall October 11, 2024 16:27
@kltm
Copy link
Member

kltm commented Oct 11, 2024

@balhoff I want to clarify what's going on here. Is this that the intermediate is ofn, then gets converted to obo or owl at the end. This seems to be true, as I still see the "neo.obo neo.owl" targets, but want to confirm.

@cmungall
Copy link
Member

@kltm that is correct. I ran it locally works like a charm (or as much as perl can ever be charming)

@balhoff
Copy link
Member Author

balhoff commented Oct 23, 2024

@kltm I think this is good to go but diffing the two outputs (previous and new) is challenging. Some known changes:

PseudoCAP terms (clear fixed problem):

  • old: <http://purl.obolibrary.org/obo/PseudoCAP_PA0001>
  • new: <http://identifiers.org/pseudomonas/PA0001>

RefSeq terms (clear fixed problem):

  • old: <http://purl.obolibrary.org/obo/RefSeq#_NM_001001130>
  • new: <http://identifiers.org/refseq/NM_001001130>

TAIR locus terms (change due to using only simple prefix substitution rather than regex):

  • old: <http://identifiers.org/tair.locus/1005028051>
  • new: <http://identifiers.org/tair/locus:1005028051>

My understanding is that @cmungall says go ahead with the TAIR change and continue working with them on IRI structures.

@kltm
Copy link
Member

kltm commented Oct 23, 2024

@balhoff Just to clarify, would a change in IDs(!) require a scan and transition of all models that already have the old ones? Or has this been done and these have not been (widely) used?

@balhoff
Copy link
Member Author

balhoff commented Oct 23, 2024

@kltm I did some grepping:

  • PseudoCAP does not appear in Noctua models
  • RefSeq appears only in string CURIE form like <http://model.geneontology.org/MGI_MGI_1921620/f20b4303-9057-44e9-946d-be9fd4a6d60c> <http://geneontology.org/lego/evidence-with> "RefSeq:NP_065812"
    • this is okay
  • tair.locus is found, e.g., <http://identifiers.org/tair.locus/2157383>
    • this is a problem, so I will just need to implement a special case prefix mechanism for TAIR

@balhoff
Copy link
Member Author

balhoff commented Oct 25, 2024

@kltm I fixed the TAIR identifier problem. I also regenerated Makefile-gafs using a new datasets.json, which fixed my wb download issues. Additionally, I added a prefix for EcoCyc, which was a new issue uncovered after updating datasets.json. EcoCyc does not seem to appear in Noctua models. I used the URL expansion from Bioregistry, as EcoCyc does not appear to have an entry in identifiers.org.

@kltm
Copy link
Member

kltm commented Oct 25, 2024

Okay, just to tag on @vanaukenk to this conversation:

  • Change in internal data flow to bypass problems with OBO format; no change to final products
  • Prefix changes are there as new data flow requires everything to be defined
  • No identifier changes have been made to anything that is currently present in a model

@balhoff if that is correct and @vanaukenk you are okay with this from our conversation yesterday, I will merge (and then do an internal test run).

@vanaukenk
Copy link

Thanks @balhoff and @kltm
Okay with me to merge and do an internal test run.

@kltm kltm merged commit 88d583a into master Oct 25, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disappearing URS
4 participants