A digital twin software framework for optimal neoantigen-based treatments.
Workflow of NeoAgDT from sequencing data to vaccine composition. Variant/mutation calling and gene expression steps are not part of the pipeline but are shown for general understanding of data flow. NeoAgDT input data is shown as individual boxes (green) in the large box at the bottom left.
A. Mösch, F. Grazioli, P. Machart, B. Malone: "NeoAgDT: Optimization of personal neoantigen vaccine composition by digital twin simulation of a cancer cell population", Bioinformatics 2024. https://doi.org/10.1093/bioinformatics/btae205
pip install .[all]
Alternatively, we provide an Anaconda-based installation:
conda env create -f environment.yml # create environment with all dependencies
conda activate neoagdt
pip install . # install the neoagdt Python package (dependencies should already be insalled at the previous step)
Python 3.9 is required.
Sample commands for running tools that have been used to create the data from Tran et al.
Gene expression with STAR and cufflinks
STAR --genomeDir /path/to/GenomeDir --readFilesIn 1.fastq 2.fastq --outSAMtype BAM SortedByCoordinate
cufflinks -G /path/to/matching/annotation.gtf --library-type fr-firststrand Aligned.sortedByCoord.out.bam
pvacbind including NetChop and NetMHCstab
pvacbind run -b 9999999 --iedb-install-directory /path/to/iedb/ sample.fasta sample samplehlatypes all .
cd MHC_Class_I
pvacbind net_chop --method 20s --threshold 0.0 sample.all_epitopes.tsv sample.fasta sample.all_epitopes.netchop.tsv
pvacbind netmhc_stab sample.all_epitopes.tsv sample.all_epitopes.netmhcstab.tsv
To create the data needed as input for the neoantigen digital twin, run notebook in notebooks_general/create_dt_input_files_frpm_pvacbind.ipynb
Please check the comments and notes in the notebook before running it. The input can be generated by any other script, too, as long as it follows the general requirements (see configuration file)
Configurations are in etc/
.
Paths can also be absolute and do not need to be relative to the repository.
simulate-cancer-cells etc/cells-config.yaml --logging-level INFO
The vaccine design optimization supports multiprocessing leveraging on a local dask
cluster.
It distributes calls to the MIP optimization.
--num-procs <NUM_PROCS>
: number of parallel processes (and of CPU cores)--num-threads-per-proc <NUM_THREADS_PER_PROC>
: number of threads to allocate for each process. So the total number of threads for a local cluster will be (args.num_procs * args.num_threads_per_proc
)
optimize-vaccine-ilp etc/optimization-config.yaml --num-procs <NUM_PROCS> --num-threads-per-proc <NUM_THREADS_PER_PROC> --logging-level INFO
create-bar-chart etc/bar-charts-config.yaml --logging-level INFO
evaluate-vaccine-response etc/response-likelihood-config.yaml --logging-level INFO
You can run all above-mentioned steps (simulation, optimization, evaluation and plotting) with the following command:
bash ./run-sim.sh
pip install .[test]
pytest tests
pip install .[test]
pylint neoag_dt
The documentation project for this project can be built with sphinx
. The necessary dependencies are install by pip
when installing either the all
or docs
optional dependencies.
pip install .[docs]
cd docs
make html