This repository contains an implementation of Phylo2Vec. It is distributed under the GNU Lesser General Public License v3.0 (LGPL).
Link to the paper: https://doi.org/10.1093/sysbio/syae030
- python>=3.9
- numba==0.56.4
- numpy==1.23.5
- biopython==1.80.0
- joblib>=1.2.0
- ete3==3.1.3
pip install phylo2vec
- We recommend to setup an isolated enviroment, using conda, mamba or virtualenv.
- Clone the repository and install using
pip
:
git clone https://github.com/Neclow/phylo2vec_dev.git
pip install -e .
- pytest==7.4.2
- six==1.16.0
After installation, you can launch the test suite from outside the source directory:
pytest phylo2vec
Warning! You might need to clear your __pycache__
folders beforehand:
rm -rf phylo2vec/__pycache__/
rm -rf phylo2vec/base/__pycache__/
- The
base
module contains elements to convert a Newick string (to_vector
) to a Phylo2Vec vector and vice versa (to_newick
)
Example:
import numpy as np
from phylo2vec.base import to_newick, to_vector
v = np.array([0, 1, 2, 3, 4])
newick = to_newick(v) # '(0,(1,(2,(3,(4,5)6)7)8)9)10;'
v_converted = to_vector(newick) # array([0, 1, 2, 3, 4], dtype=int16)
- The
opt
module contains methods to perform phylogenetic inference using Phylo2Vec vectors - TODO: include GradME from https://github.com/Neclow/GradME
Example:
from phylo2vec.opt import HillClimbingOptimizer
hc = HillClimbingOptimizer(raxml_cmd="/path/to/raxml-ng_v1.2.0_linux_x86_64/raxml-ng", verbose=True)
v_opt, taxa_dict, losses = hc.fit("/path/to/your_fasta_file.fa")
@article{phylo2vec,
title={Phylo2Vec: a vector representation for binary trees},
author={Penn, Matthew J and Scheidwasser, Neil and Khurana, Mark P and Duch{\^e}ne, David A and Donnelly, Christl A and Bhatt, Samir},
journal={arXiv preprint arXiv:2304.12693},
year={2023}
}
- Preprint repository (core functions are deprecated): https://github.com/Neclow/phylo2vec_preprint
- C++ version (deprecated): https://github.com/Neclow/phylo2vec_cpp
- GradME: https://github.com/Neclow/GradME = phylo2vec + minimum evolution + gradient descent