LAGOON-MCL is a FAIR pipeline using Nextflow as workflow manager. The main objective of the pipeline is to build putative protein families using sequence similarity networks and graph clustering. To explore the resulting clusters, LAGOON-MCL uses annotations (functional, taxonomic, ...) provided by the user or obtained with the pipeline using Pfam. To take sequence exploration a step further, ESM Metagenomic Atlas clustered at 30% identity can be scanned for information on the protein's three-dimensional structure.
- The first step is to build a Sequence Similarity Network (SSNs), aligning all the sequences against itself with Diamond BLASTp. Network clustering with Markov CLustering algorithm (MCL).
- The second [optional] step is to obtain information about the sequences (function, taxonomy, etc.). LAGOON-MCL can scan Pfam using MMseqs2.
- The third stage of the pipeline calculates a homogeneity score for each cluster based on sequence information (the homogeneity score is calculated for each annotation).
-
Install Nextflow
-
Install Singularity
-
Download the pipeline
git clone https://github.com/jroussea/lagoon-mcl.git
- Build Singularity images
singularity build --fakeroot containers/diamond/2.1.0/diamond.sif docker://quay.io/biocontainers/diamond:2.1.10--h43eeafb_0
singularity build --fakeroot containers/mcl/22.282/mcl.sif docker://quay.io/biocontainers/mcl:22.282--pl5321h031d066_2
singularity build --fakeroot containers/seqkit/2.9.0/seqkit.sif docker://quay.io/biocontainers/seqkit:2.9.0--h9ee0642_0
singularity build --fakeroot containers/mmseqs2/15.6f452/mmseqs.sif docker://quay.io/biocontainers/mmseqs2:15.6f452--pl5321h6a68c12_3
singularity build --fakeroot containers/lagoon-mcl/1.1.0/lagoon-mcl.sif docker://jroussea/lagoon-mcl:latest
- Test the pipeline
chmod +x bin/*
nextflow run main.nf -profile test,singularity
- Run your analysis
nextflow run main.nf -profile custom,singularity [-c <institute_config_file>]
For more information about LAGOON-MCL, please read the documentation.
LAGOON-MCL is actively supported and developed pipeline. Please use the issue tracker for malfunctions and the GitHub discussions for questions, comments, feature requests, etc.
LArGe cOmparative Omics Networks (LAGOON) Markov CLustering algorithm (MCL) is developed by the Atelier de BioInformatique team of the Institut de Systématique, Évolution, Biodiversité - UMR 7205 (Muséum National d'Histoire Naturelle, Paris, France).
LAGOON-MCL is a new version of LAGOON developed by Dylan Klein.
If you use LAGOON-MCL, references can be found in CITATION.md