Skip to content

Latest commit

 

History

History
135 lines (112 loc) · 4.17 KB

README.md

File metadata and controls

135 lines (112 loc) · 4.17 KB

TM-Tools

Python bindings for the TM-align algorithm and code developed by Zhang et al for protein structure comparison.

Installation

You can install the released version of the package directly from PyPI by running

    pip install tmtools

Pre-built wheels are available for Linux, macOS, and Windows, for Python 3.6 and up.

Usage

The function tmtools.tm_align takes two NumPy arrays with coordinates for the residues (with shape (N, 3)) and two sequences of peptide codes, performs the alignment, and returns the optimal rotation matrix and translation, along with the TM score:

>>> import numpy as np
>>> from tmtools import tm_align
>>>
>>> coords1 = np.array(
...     [[1.2, 3.4, 1.5],
...      [4.0, 2.8, 3.7],
...      [1.2, 4.2, 4.3],
...      [0.0, 1.0, 2.0]])
>>> coords2 = np.array(
...     [[2.3, 7.4, 1.5],
...      [4.0, 2.9, -1.7],
...      [1.2, 4.2, 4.3]])
>>>
>>> seq1 = "AYLP"
>>> seq2 = "ARN"
>>>
>>> res = tm_align(coords1, coords2, seq1, seq2)
>>> res.t
array([ 2.94676159,  5.55265245, -1.75151383])
>>> res.u
array([[ 0.40393231,  0.04161396, -0.91384187],
       [-0.59535733,  0.77040999, -0.22807475],
       [ 0.69454181,  0.63618922,  0.33596866]])
>>> res.tm_norm_chain1
0.3105833326322145
>>> res.tm_norm_chain2
0.414111110176286
>>> res.rmsd
0.39002811082975875

If you already have some PDB files, you can use the functions from tmalign.io to retrieve the coordinate and sequence data. These functions rely on BioPython, which is not installed by default to keep dependencies lightweight. To use them, you have to install BioPython first (pip install biopython). Then run:

>>> from tmtools.io import get_structure, get_residue_data
>>> from tmtools.testing import get_pdb_path
>>> s = get_structure(get_pdb_path("2gtl"))
>>> s
<Structure id=2gtl>
>>> chain = next(s.get_chains())
>>> coords, seq = get_residue_data(chain)
>>> seq
'DCCSYEDRREIRHIWDDVWSSSFTDRRVAIVRAVFDDLFKHYPTSKALFERVKIDEPESGEFKSHLVRVANGLKLLINLLDDTLVLQSHLGHLADQHIQRKGVTKEYFRGIGEAFARVLPQVLSCFNVDAWNRCFHRLVARIAKDLP'
>>> coords.shape
(147, 3)

Development mode

To build the package from scratch, e.g. because you want to contribute to it, clone this repository, and then from the root of the repository, run

    pip install -e . -v

This requires a C++ compiler to be installed with support for C++ 14.

This project uses ruff as a code formatter and linter. Ruff is run automatically via GitHub actions on new commits, please consider running locally (preferably via a pre-commit hook) to notice and fix any errors early on.

Running the tests

The test suite uses the standard Python unittest framework. To run the test suite, run the following command (from the root of the repository, with the development environment activated):

    python -m unittest discover -v .

When adding to the test suite, please adhere to the given/when/then pattern. You can refer to the existing tests for an example.

Credits

This package arose out of a personal desire to better understand both the TM-score algorithm and the pybind11 library to interface with C++ code. At this point in time it contains no original research code.

If you use the package for research, you should cite the original TM-score papers:

  • Y. Zhang, J. Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 57: 702-710 (2004).
  • J. Xu, Y. Zhang, How significant is a protein structure similarity with TM-score=0.5? Bioinformatics, 26, 889-895 (2010).

License

The original TM-align software (version 20210224, released under the MIT license) is bundled with this repository (src/extern/TMalign.cpp). Some small tweaks had to be made to compile the code on macOS and to embed it as a library. This modifications are also released under the MIT license.

The rest of the codebase is released under the GPL v3 license.