Skip to content

Commit

Permalink
PDBE-6103 (#7)
Browse files Browse the repository at this point in the history
* x-axis lowered, vlines extended

* Python version bump and version bump

* Residue range parsable into dendro fname

It is now possible to parse in a residue range (possibly UniProt) into the application. This now affects the file name of the dendrogram, if rendered. If the residue IDs are not specified, behaviour reverts to normal

* Trial with 3.10.10

* Bump Python version to 3.10.14

* File name typo in dendrogram

* Updated docs and removed old examples
  • Loading branch information
Joseph-Ellaway authored May 16, 2024
1 parent 31a2621 commit c2e12ed
Show file tree
Hide file tree
Showing 18 changed files with 102 additions and 1,004 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
strategy:
max-parallel: 4
matrix:
python-version: [3.10.12]
python-version: [3.10.14]

steps:
- uses: actions/checkout@v2
Expand Down Expand Up @@ -56,7 +56,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.10.12]
python-version: [3.10.14]
needs: build

steps:
Expand Down
21 changes: 14 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ For intructions on importing `protein-cluster-conformers` into your own Python c
`protein-cluster-conformers` requires >=Python3.10 to run. Initialise virtual environment and install dependencies with:

```shell
$ cd contact_map_difference
$ cd protein-cluster-conformers]
$ python3.10 -m venv cluster_venv
$ source cluster_venv/bin/activate
$ python -m pip install -r requirements.txt
Expand All @@ -31,6 +31,7 @@ $ python find_conformers.py [-h] [-v] -u UNIPROT -m MMCIF [MMCIF ...]
[-g PATH_DENDROGRAM [PATH_DENDROGRAM ...]]
[-w PATH_SWARM [PATH_SWARM ...]] [-o PATH_HISTOGRAM]
[-a PATH_ALPHA_FOLD]
[-0 FIRST_RESIDUE_POSITION] [-1 LAST_RESIDUE_POSITION]
```
The following parameters can be parsed:
Expand Down Expand Up @@ -58,6 +59,10 @@ optional arguments:
Path to save histograms of distance difference maps
-a PATH_ALPHA_FOLD, --path_alpha_fold PATH_ALPHA_FOLD
Path to save AlphaFold Database structure
-0 FIRST_RESIDUE_POSITION, --first_residue_position FIRST_RESIDUE_POSITION
First residue position in (UniProt) sequence
-1 LAST_RESIDUE_POSITION, --last_residue_position LAST_RESIDUE_POSITION
Last residue position in (UniProt) sequence

```
Expand Down Expand Up @@ -148,7 +153,7 @@ The resulting plots are saved in PNG format (to save render time). E.g:
**Example**: O34926
```shell
$ python run_find_clusters.py -u "O34926" \
$ python find_clusters.py -u "O34926" \
-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \
-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \
-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \
Expand All @@ -163,22 +168,24 @@ $ python run_find_clusters.py -u "O34926" \
From the clustering results, a dendrogram can be rendered to show the relationships between all clustered chains. To save a dendrogram of the hierarchical clustering results, run:
```shell
$ python run_find_clusters.py -u "A12345" \
$ python find_clusters.py -u "A12345" \
-m /path/to/structure_1.cif [chains] \
-m ... \
-g /path/to/save/dendrogram/ [png svg]
```
where either a `png` or `svg` file type is saved. E.g.
<img src="./benchmark_data/figures/P14902_agglomerative_dendrogram.png" alt="Dendrogram of clustered UniProt:P14902 chains, via UPGMA agglomerative clustering" height="350"/>
<img src="./benchmark_data/figures/O34926_1_405_agglomerative_dendrogram.png" alt="Dendrogram of clustered UniProt:P14902 chains, via UPGMA agglomerative clustering" width="400"/>
<img src="./benchmark_data/figures/P15291_122_398_agglomerative_dendrogram.png" alt="Dendrogram of clustered UniProt:P14902 chains, via UPGMA agglomerative clustering" width="400"/>
<br>
**Example**: O34926
```shell
$ python run_find_clusters.py -u "O34926" \
$ python find_clusters.py -u "O34926" \
-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \
-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \
-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc6_updated.cif A B \
Expand All @@ -193,7 +200,7 @@ $ python run_find_clusters.py -u "O34926" \
The scores generated between pairwise structure comparisons can be plotted as a swarm plot by parsing the `-w` flag:
```shell
$ python run_find_clusters.py -u "A12345" \
$ python find_clusters.py -u "A12345" \
-m /path/to/structure_1.cif [chains] \
-m ... \
-w /path/to/save/swarm_plot/ [png svg]
Expand Down Expand Up @@ -269,7 +276,7 @@ $ ./run_O34926.sh
**Example #2:** P15291
``` shell
python3 run_find_conformers.py -u "P15291" \
python3 find_conformers.py -u "P15291" \
-m benchmark_data/examples/P15291/P15291_updated_mmcif/2fy7_updated.cif A \
-m benchmark_data/examples/P15291/P15291_updated_mmcif/2fya_updated.cif A \
-m benchmark_data/examples/P15291/P15291_updated_mmcif/2fyb_updated.cif A \
Expand Down
2 changes: 1 addition & 1 deletion __version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "1.1.0"
__version__ = "1.2.0"
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
41 changes: 32 additions & 9 deletions cluster_conformers/cluster_chains.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,8 @@ def plot_dendrogram(
unp: str, axis, linkage_matrix: ndarray = None, cutoff: float = None, **kwargs
) -> "tuple(Figure, Axes)":
"""
Create linkage matrix from SKLearn model and plot the dendrogram of nodes.
Create linkage matrix from SKLearn model and plot the dendrogram of nodes. Applies
this to parsed Matplotlib axis, inplace.
:param unp: UniProt accession
:type unp: str
Expand All @@ -123,26 +124,48 @@ def plot_dendrogram(
"""

# Plot the corresponding dendrogram
dn = dendrogram(linkage_matrix, ax=axis, **kwargs)
del dn
dendrogram_plot = dendrogram(linkage_matrix, ax=axis, **kwargs)

hline_props = {
"colors": ["black"],
"linestyles": ["dashed"],
"linewidths": 1,
"alpha": 0.5,
}

# Add horizontal line where cutoff is placed
if cutoff:
max_parent = max(linkage_matrix[:, 2])
max_parent = linkage_matrix[:, 2].max()
axis_xlimits = axis.get_xlim()
axis.hlines(
y=max_parent * cutoff,
xmin=axis_xlimits[0],
xmax=axis_xlimits[1],
colors=["black"],
linestyles=["dashed"],
linewidths=1,
alpha=0.5,
**hline_props,
)

axis.set_title(f"Agglomerative clustering dendrogram: {unp}", fontweight="bold")
axis.set_title(f"Agglomerative clustering results: {unp}", fontweight="bold")
axis.set_ylabel("Score (\u212B)")

# Set the y-axis limits
rock_bottom = -max_parent * 0.025
axis.set_ylim(rock_bottom, max_parent * 1.05)

# Add vlines for each leaf below the x-axis
axis.vlines(
x=axis.get_xticks(),
ymin=rock_bottom,
ymax=0,
colors=dendrogram_plot["leaves_color_list"],
linestyles=["solid"],
linewidths=1,
# alpha=0.5,
)

axis.hlines(y=0, xmin=axis_xlimits[0], xmax=axis_xlimits[1], **hline_props)

del dendrogram_plot


def plot_swarmplot(y_data: Iterable, unp: str) -> "tuple(Figure, Axes)":
"""Creates a strip plot of non-overlapping data points for a given list of data. The
Expand Down
16 changes: 14 additions & 2 deletions cluster_conformers/cluster_monomers.py
Original file line number Diff line number Diff line change
Expand Up @@ -672,6 +672,7 @@ def render_dendrogram(
path_save: PosixPath = None,
png: bool = False,
svg: bool = False,
unp_range: "tuple[int, int]" = None,
) -> None:
"""
Plot hierachical dendrogram from clustering results. Must have a linkage matrix and
Expand All @@ -684,6 +685,7 @@ def render_dendrogram(
:type png: bool, optional
:param svg: Save dendrogram image in SVG format, defaults to False
:type svg: bool, optional
:param unp_range: Range of UniProt residues used for clustering
"""

# Set matplotlib global formatting
Expand Down Expand Up @@ -718,14 +720,24 @@ def render_dendrogram(
leaf_rotation=90,
) # p=3

# UniProt residue range specified, make modifications (optional)
if unp_range:
ax.set_title(
f"Agglomerative clustering results: {unp} ({unp_range[0]}-{unp_range[1]})",
fontweight="bold",
)
fname = f"{unp}_{unp_range[0]}_{unp_range[1]}_agglomerative_dendrogram"
else:
fname = f"{unp}_agglomerative_dendrogram"

# Save file
io_utils.save_figure(
path_save,
save_fname=f"{unp}_agglomerative_dendrogram",
save_fname=fname,
png=png,
svg=svg,
)

# plt.clf()
plt.close(fig=fig)

else:
Expand Down
2 changes: 1 addition & 1 deletion cluster_conformers/utils/appearance_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,5 @@ def init_plot_appearance():
"""

# Figure parameters and formatting
plt.style.use("seaborn-v0_8-colorblind")
# plt.style.use("seaborn-v0_8-colorblind")
plt.style.use("fast")
4 changes: 3 additions & 1 deletion examples/run_O34926.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ rm benchmark_data/examples/O34926/O34926_cluster_results/*

# mprof run --python

python find_conformers.py -u "O34926" \
python3 find_conformers.py -u "O34926" \
-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc7_updated.cif A B \
-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc5_updated.cif A B \
-m benchmark_data/examples/O34926/O34926_updated_mmcif/3nc3_updated.cif A B \
Expand All @@ -18,6 +18,8 @@ python find_conformers.py -u "O34926" \
-i 3nc6 \
-f \
-g benchmark_data/examples/O34926/O34926_cluster_results/ png svg \
-0 1 \
-1 405
# -v \
# -a benchmark_data/examples/O34926/O34926_alpha_fold_mmcifs

Expand Down
143 changes: 0 additions & 143 deletions examples/run_P00519_merged_segms.sh

This file was deleted.

Loading

0 comments on commit c2e12ed

Please sign in to comment.