Metagenome read simulation of multiple synthetic communities
Straight-forward simulations of metagenome data from a collection of reference bacterial/archaeal genomes.
- Can simulate Illumina, PacBio, and/or Nanopore reads
- For Illumina, synthetic long reads (read clouds) can also be simulated
- Generate communities differing in:
- Sequencing depth
- Richness
- Beta diversity
The workflow:
- [optional] Download reference genomes
- Format reference genomes
- e.g., rename contigs
- Simulate communities
- Simulate reads for each community
See environment.yml
for a list of dependencies.
You can install via:
mamba env create -f environment.yml -n mgsim
mamba is much faster than conda
pip install MGSIM
python setpy.py install
conda-forge::pytest>=5.3
conda-forge::pytest-console-scripts>=1.2
In the MGSIM base directory, use the command pytest
to
run all of the tests.
To run tests on a particular test file:
pytest -s --script-launch-mode=subprocess path/to/the/test/file
Example:
pytest -s --script-launch-mode=subprocess ./tests/test_Reads.py
See all subcommands:
MGSIM --list
MGSIM genome_download -h
MGSIM communities -h
MGSIM reads -h
MGSIM ht_reads -h
Create Taxon-accession table
mkdir -p tutorial
cat <<-EOF > tutorial/taxon_accession.tsv
Taxon Accession
Escherichia coli O104-H4 NC_018658.1
Clostridium perfringens ATCC.13124 NC_008261
Methanosarcina barkeri [MS] NZ_CP009528.1
EOF
Download genomes
MGSIM genome_download -d tutorial/ tutorial/taxon_accession.tsv > tutorial/genomes.tsv
Simulate 2 communities
MGSIM communities --n-comm 2 tutorial/genomes.tsv tutorial/communities
Illumina reads
MGSIM reads tutorial/genomes.tsv --sr-seq-depth 1e5 tutorial/communities_abund.txt tutorial/illumina_reads/
PacBio reads
MGSIM reads tutorial/genomes.tsv --pb-seq-depth 1e3 tutorial/communities_abund.txt tutorial/pacbio_reads/
Nanopore reads
MGSIM reads tutorial/genomes.tsv --np-seq-depth 1e3 tutorial/communities_abund.txt tutorial/nanopore_reads/
See LICENSE