-
-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFAM family <-> PDB structure ID mapping #324
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #324 +/- ##
==========================================
+ Coverage 40.27% 44.70% +4.43%
==========================================
Files 48 114 +66
Lines 2811 7982 +5171
==========================================
+ Hits 1132 3568 +2436
- Misses 1679 4414 +2735 ☔ View full report in Codecov by Sentry. |
Thanks for this Ramon, looks great! Let me have a think about how to integrate this more. An immediate thought is to couple this to the Re finding families, is There's also I think favouring the metadata/indices stored on the FTP server over the API might be better from a user POV (probably faster & no worries about being rate limited). We could make a wrapper for this similar to the PDBManager? There also seems to be a ton of metadata on the FTP server. I'm not sure what else could be useful to pull in 🤔 |
This sounds good. I feel quite silly, I completely missed these two files! Yes, downloading via the FTP server would definitely be much faster. I'll modify the script to just download these two files (and perhaps merge everything into a single dataframe?). We could then look into how to integrate this into |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Kudos, SonarCloud Quality Gate passed! |
Quality Gate passedIssues Measures |
Here's a script to retrieve a mapping between RFAM families and PDB structure IDs. How could this be integrated into the codebase?
There are two points that might need to be addressed:
max_id
to specify the max ID limit (e.g. settingmax_id=4236
will stop querying after RF04236).graphein/datasets
? It might be important to allow the users to re-download the data in case of updates in the RFAM database.Pull Request Checklist
./CHANGELOG.md
file (if applicable)./graphein/tests/*
directories (if applicable)./notebooks/
(if applicable)python -m py.test tests/
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,python -m py.test tests/protein/test_graphs.py
)black .
andisort .