ProtDomRetriever

ProtDomRetriever is a simple Python tool for retrieving protein domain information from the InterPro database based on UniProtKB accessions and specified InterPro entries. The script utilizes the application programming interface (API) of InterPro, extracts the position of every domain for each entry and selects the longest domain if multiple entries overlap. Facultatively the program returns a trimmed fasta file imported from UniProt. The program allows the retrieval of multiple domains in tandem if any, and it attributes a domain number to the uniprot accession code.

Created by Nicolas-Frédéric Lipp, PhD.

Features

Retrieve domain information for multiple UniProtKB accessions
Filter domains based on specified InterPro entries
Generate TSV output with domain ranges
Create FASTA files for the retrieved protein domains
User-friendly GUI for file selection

Requirements

Python 3.6+
Required Python packages:
- tkinter
- requests

Installation

Clone this repository: git clone https://github.com/NicoFrL/ProtDomRetriever.git
Navigate to the project directory: cd ProtDomRetriever
Install required packages: pip install -r requirements.txt

Usage

Run the script using Python:

python3 ProtDomRetriever.py

Follow the on-screen prompts to:

Select an input file containing UniProtKB accessions
Enter InterPro entries for domain filtering
Choose whether to fetch FASTA files for the protein domains

Output

The script generates three main output files in a new directory:

*_result_table.tsv: A tab-separated file containing protein accessions, InterPro entries, and domain ranges
*_domain_ranges.txt: A text file listing the domain ranges for each protein
*_output_domains.fasta: A FASTA file containing the sequences of the retrieved protein domains (if FASTA retrieval is selected)

Examples

Two example datasets are provided in the examples directory:

ORP dataset (example1)
Spectrin dataset (example2)

Each example includes input files, suggested InterPro entries, and sample output files.

Manual Sequence Retrieval

Users can always use the content of the output file *_domain_ranges.txt at https://www.uniprot.org/id-mapping to map UniProtKB AC/ID to UniProtKB and retrieve the sequences manually, for instance as a comprehensive Excel file.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you encounter any problems or have any questions, please open an issue on the GitHub repository.

Author

Nicolas-Frédéric Lipp, PhD
https://github.com/NicoFrL

Development Notes

This project was developed with the assistance of AI language models, which provided guidance on code structure, best practices, and documentation. The core algorithm and scientific approach were designed and implemented by the author on the basis of InterPro documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProtDomRetriever

Features

Requirements

Installation

Usage

Output

Examples

Manual Sequence Retrieval

Contributing

License

Support

Author

Development Notes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
examples		examples
LICENSE		LICENSE
ProtDomRetriever.py		ProtDomRetriever.py
README.md		README.md
requirements.txt		requirements.txt

License

NicoFrL/ProtDomRetriever

Folders and files

Latest commit

History

Repository files navigation

ProtDomRetriever

Features

Requirements

Installation

Usage

Output

Examples

Manual Sequence Retrieval

Contributing

License

Support

Author

Development Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages