Skip to content

jroussea/lagoon-mcl

Repository files navigation

LArGe cOmparative Omics Network - Markov CLustering

LAGOON-MCL Nextflow Singularity

Introduction

LAGOON-MCL is a FAIR pipeline using Nextflow as workflow manager. The main objective of the pipeline is to build putative protein families using sequence similarity networks and graph clustering. To explore the resulting clusters, LAGOON-MCL uses annotations (functional, taxonomic, ...) provided by the user or obtained with the pipeline using Pfam. To take sequence exploration a step further, ESM Metagenomic Atlas clustered at 30% identity can be scanned for information on the protein's three-dimensional structure.

  • The first step is to build a Sequence Similarity Network (SSNs), aligning all the sequences against itself with Diamond BLASTp. Network clustering with Markov CLustering algorithm (MCL).
  • The second [optional] step is to obtain information about the sequences (function, taxonomy, etc.). LAGOON-MCL can scan Pfam using MMseqs2.
  • The third stage of the pipeline calculates a homogeneity score for each cluster based on sequence information (the homogeneity score is calculated for each annotation).

Start with LAGOON-MCL

  1. Install Nextflow

  2. Install Singularity

  3. Download the pipeline

git clone https://github.com/jroussea/lagoon-mcl.git
  1. Build Singularity images
singularity build --fakeroot containers/diamond/2.1.0/diamond.sif docker://quay.io/biocontainers/diamond:2.1.10--h43eeafb_0

singularity build --fakeroot containers/mcl/22.282/mcl.sif docker://quay.io/biocontainers/mcl:22.282--pl5321h031d066_2

singularity build --fakeroot containers/seqkit/2.9.0/seqkit.sif docker://quay.io/biocontainers/seqkit:2.9.0--h9ee0642_0

singularity build --fakeroot containers/mmseqs2/15.6f452/mmseqs.sif docker://quay.io/biocontainers/mmseqs2:15.6f452--pl5321h6a68c12_3

singularity build --fakeroot containers/lagoon-mcl/1.1.0/lagoon-mcl.sif docker://jroussea/lagoon-mcl:latest
  1. Test the pipeline
chmod +x bin/*
nextflow run main.nf -profile test,singularity
  1. Run your analysis
nextflow run main.nf -profile custom,singularity [-c <institute_config_file>]

Documentation

For more information about LAGOON-MCL, please read the documentation.

Contributions and Support

LAGOON-MCL is actively supported and developed pipeline. Please use the issue tracker for malfunctions and the GitHub discussions for questions, comments, feature requests, etc.

Acknowledgments

LArGe cOmparative Omics Networks (LAGOON) Markov CLustering algorithm (MCL) is developed by the Atelier de BioInformatique team of the Institut de Systématique, Évolution, Biodiversité - UMR 7205 (Muséum National d'Histoire Naturelle, Paris, France).
LAGOON-MCL is a new version of LAGOON developed by Dylan Klein.

Citations

If you use LAGOON-MCL, references can be found in CITATION.md