Protein2SMILES Transformer is De novo drug discovery of protein-specific using Transformer Neural Network, as described in the "Attention is All You Need" paper, to generate novel drugs specific to proteins.
Protein2SMILES Transformer is a de novo drug discovery approach that generates SMILES strings, a text-based representation of a molecule, for specific protein targets. The model is trained on a large database of molecules and protein sequences collected from Bindingdb, and can generate new molecules that are optimized for binding to a target protein.
All datasets used in this project are available in my Google Drive. You can access them using the following links:
Protein2SMILES Transformer requires the following dependencies:
- Python 3.7 or later
- PyTorch 1.13.1
- Torchtext 0.14.1
- NumPy 1.22.4
- PyQt5 5.15.9
- rdkit 2022.9.5
To use Protein2SMILES Transformer, follow these steps:
- Clone this repository to your local machine using git clone:
$ git clone https://github.com/atilmohamine/protein2smiles-transformer.git
- Install the required dependencies by running the following command:
$ pip install -r requirements.txt
- Run the predict.py script with the desired protein sequence as input. For example:
$ python predict.py --input MGLSDGEWQLVLNVWGKVEGARQPL
This will generate a SMILES string that is optimized for binding to the specified protein.
There are several key args for prediction as follows:
Argument | Description | Default | Type |
---|---|---|---|
--input | Input Protein | none (required) | string |
--vis | Molecule Visualization | True | boolean |
--max | Max generated sentence lenght | 150 | integer |
--pad | Padding token | 1 | integer |
--sos | SOS token | 2 | integer |
--eos | EOS token | 3 | integer |
- The output SMILES string can be used for further analysis, such as molecular docking or structure-based drug design.
If you find this project useful in your research, please consider citing our paper:
@article{AmineFadila2023,
author = {Atil Mohamed El Amine, Atil Fadila},
title = {Transformer neural network for protein-specific drug discovery and validation using QSAR},
journal = {Journal of Proteins and Proteomics},
year = {2023},
doi = {10.1007/s42485-023-00124-6}
}
Protein2SMILES Transformer is released under the MIT License.