-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MaAslin2-Tutorial #5173
base: main
Are you sure you want to change the base?
MaAslin2-Tutorial #5173
Conversation
Hi @renu-pal, thanks for your contribution! I have taken it out of draft mode so that our tests can run and people will know they can review it :) |
topics/microbiome/tutorials/multivariable-association/tutorial.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Saskia Hiltemann <[email protected]>
Sounds great! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! would be nice to extand a bit, show different options and some more output
- name: 'DOI: 10.5281/zenodo.12614561' | ||
description: latest | ||
items: | ||
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv/content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://zenodo.org/records/12614561/files/HMP2_metadata.tsv
should also work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv/content | |
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it resolved, It's still not correct ... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought miss Shilteman made the last update about this exactly , I did not notice it was data-library.yaml file you were mentioning. I am extremely sorry for that. I will make the necessary changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, made the change
63a44c4
topics/microbiome/tutorials/multivariable-association/tutorial.md
Outdated
Show resolved
Hide resolved
topics/microbiome/tutorials/multivariable-association/tutorial.md
Outdated
Show resolved
Hide resolved
> | ||
{: .agenda} | ||
|
||
# Get the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the data come from, what does it contain, was it used in other studies, are there other studies that suggest ideal maaslin2 parameters for this kind of data ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find studies which used both HMP2 data as well as Maaslin2 tool. So instead I mentioned studies which used Maaslin2 tool with the parameters used. If it does not feel right , then please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the main maaslin2 paper using HMP data ? https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009442#sec002
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe its enought to add that this was the initial data used to demonstrate maaslin2 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds better
topics/microbiome/tutorials/multivariable-association/tutorial.md
Outdated
Show resolved
Hide resolved
topics/microbiome/tutorials/multivariable-association/tutorial.md
Outdated
Show resolved
Hide resolved
Can you look into the linting issues as well ? |
@renu-pal do you need any help here? |
Definitely @bgruening :) . Can you please go through the tutorial and let me know if you find anything wrong. |
- name: 'DOI: 10.5281/zenodo.12614561' | ||
description: latest | ||
items: | ||
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv/content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv/content | |
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv |
topics/microbiome/tutorials/multivariable-association/data-library.yaml
Outdated
Show resolved
Hide resolved
topics/microbiome/tutorials/multivariable-association/data-library.yaml
Outdated
Show resolved
Hide resolved
"annotation": "", | ||
"comments": [], | ||
"format-version": "0.1", | ||
"name": "Workflow constructed from history 'Determining multivariable association between various meta\u2019omic features using MaAslin2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"name": "Workflow constructed from history 'Determining multivariable association between various meta\u2019omic features using MaAslin2", | |
"name": "Determining multivariable association between various meta-omic features using MaAslin2", |
This way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the omitted part just to differentiate a between the title of workflow file and tutorial file. If it doesn't cause any issue then I believe we should not change it. What are your thoughts on this ? @bgruening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason why a workflow should be named 'construceted from history'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remove the two workflows, starting with Workflow constructed from history ...` and only keep the main one with the correct title ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
topics/microbiome/tutorials/multivariable-association/tutorial.md
Outdated
Show resolved
Hide resolved
topics/microbiome/tutorials/multivariable-association/tutorial.md
Outdated
Show resolved
Hide resolved
…rary.yaml Co-authored-by: Björn Grüning <[email protected]>
…rary.yaml Co-authored-by: Björn Grüning <[email protected]>
> | ||
{: .agenda} | ||
|
||
# Get the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the main maaslin2 paper using HMP data ? https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009442#sec002
topics/microbiome/tutorials/multivariable-association/tutorial.md
Outdated
Show resolved
Hide resolved
- **Sensitivity** measures how well the methods detect true signals, higher values lead to better performance. | ||
- **False discovery rate (FDR)** measures the proportion of false positives among detected signals (lower FDR is better). | ||
- MaAsLin2 is the clear standout for both differential abundance detection and multivariable association detection, showing high sensitivity and maintaining a low FDR. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add something like:
In this regard, MaAsLin2 can be seen as a Swiss army knife for differential analysis of microbiome data. With some text processing various omics data types could be used as input e.g. from these GTN tutorials:
then check the existing GTN toturial and add those that provide matching data
- bla
- blumb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: Galaxy Training Network Material | ||
synopsis: Galaxy Training Network Material. See https://training.galaxyproject.org | ||
items: | ||
- name: The new topic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- name: The new topic | |
- name: The new topic |
?
topics/microbiome/tutorials/multivariable-association/tutorial.md
Outdated
Show resolved
Hide resolved
| **Phyloseq + DESeq2**| Strong for RNA-seq and transcriptomics; integrates with Phyloseq | Lacks compositionality awareness | While DESeq2 works for microbiome data, MaAsLin2 offers more suitable options for compositional data and covariate handling. | | ||
| **Limma-Voom** | Effective for RNA-seq and microarray data, handles low counts | Not tailored for compositional microbiome data | Limma-Voom is well-suited for gene expression, but MaAsLin2 better accounts for the unique characteristics of microbiome data. | | ||
|
||
- ANCOM-BC and MaAsLin2, outperform general-purpose tools like DESeq2 and limma-voom when it comes to microbiome data. This is due to their handling of the compositional nature of microbiome data and the sparsity typical of microbial datasets.[PMID: 36617187](https://pubmed.ncbi.nlm.nih.gov/36617187/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please exchange all links to studies with citations as shown here:
Line 89 in 5012353
The currently available studies used Illumina sequencing, generating short reads. Longer read lengths, generated by third-generation sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), make it **easier and more practical to identify strains with fewer reads**. MinION (from Oxford Nanopore) is a portable, real-time device for ONT sequencing. Several proof-of-principle studies have shown the **utility of ONT long-read sequencing from metagenomic samples for pathogen identification** ({% cite CIUFFREDA20211497 %}). |
- **Taxonomy (or features) file**: \ | ||
This file is tab-delimited.\ | ||
Formatted with features as columns and samples as rows.\ | ||
The transposition of this format is also okay.\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ? Can our wrapper work with both ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out @paulzierep . I cross checked and unfortunately our wrapper cannot work with both. I have made the changes accordingly in the tutorial. Sorry for the confusion.
|
||
MaAsLin2 requires the following input files: | ||
|
||
- **Taxonomy (or features) file**: \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Taxonomy (or features) file**: \ | |
- **Features file**: \ |
Would like to keep it more generic. Maybe add some examples like: (OTU/ASV abundance table, MAGs abundance matrix, taxonomy table, gene count matrix at the end of the list
> | ||
{: .agenda} | ||
|
||
# Get the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe its enought to add that this was the initial data used to demonstrate maaslin2 ?
Formatted with features as columns and samples as rows.\ | ||
The transposition of this format is also okay. | ||
|
||
The Taxonomy file can contain samples not included in the metadata file (or vice versa). For both cases, those samples not included in both files will be removed from the analysis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Taxonomy file can contain samples not included in the metadata file (or vice versa). For both cases, those samples not included in both files will be removed from the analysis. | |
The feature file can contain samples not included in the metadata file (or vice versa). For both cases, those samples not included in both files will be removed from the analysis. |
ff35d6a
to
fd99f28
Compare
author = {GTN community}, | ||
title = {GTN Training Materials: Collection of tutorials developed and maintained by the worldwide Galaxy community}, | ||
url = {https://training.galaxyproject.org}, | ||
urldate = {2021-03-24} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our bibtex parser is failing to parse urldate key. I think we should support this in the future, but perhaps for now you can set the date in a note so that the tests don't fail, e.g.:
note = "[Online; accessed Fri Nov 05 2021]"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! Will make the necessary changes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@renu-pal can you check the linting ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not mark comments as resolved if they are not resolved, or explain otherwise why it's not needed.
Otherwise, minor changes, then we should be good to merge. Thanks !
- name: 'DOI: 10.5281/zenodo.12614561' | ||
description: latest | ||
items: | ||
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv/content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it resolved, It's still not correct ... ?
> {: .question} | ||
> | ||
{: .hands_on} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Data that can be analysed with MaAsLi2 | |
Can you add some subsections to make it better structured ?
topics/microbiome/tutorials/multivariable-association/tutorial.md
Outdated
Show resolved
Hide resolved
topics/microbiome/tutorials/multivariable-association/tutorial.md
Outdated
Show resolved
Hide resolved
|
||
![sensitivity and false discovery rate (FDR) across different tools](https://journals.plos.org/ploscompbiol/article/figure/image?size=large&id=10.1371/journal.pcbi.1009442.g004 "Source: <a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009442#pcbi-1009442-g004">sensitivity and false discovery rate (FDR) across different tools</a>"){:width="60%"} | ||
|
||
- The above figure compares various tools for differential abundance detection (Panel A) and multivariable association detection (Panel B) in microbiome studies, based on sensitivity and false discovery rate (FDR). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the next 4 sentences do no need to be bullet points.
9. **analysis_method** [ Default: "LM" ] [ Options: "LM", "CPLM", "ZICP", "NEGBIN", "ZINB" ] | ||
- The analysis method to apply. | ||
- Options: \ | ||
1. <u>Linear Model (LM)</u>: Determines how changes in metadata are associated with changes in the taxonomy data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. <u>Linear Model (LM)</u>: Determines how changes in metadata are associated with changes in the taxonomy data. | |
1. <u>Linear Model (LM)</u>: The defaul linear model of MaAsLin2. By default, MaAsLin2 uses linear regression (for continuous metadata) or logistic regression (for binary metadata) as its association method. |
All models should do that, but what is the difference between them
14. **plot_scatter**: Generate scatter plots for the significant associations [ Default: TRUE ] | ||
15. **cores**: The number of R processes to run in parallel [ Default: 1 ] | ||
|
||
# Model types in MaAslin2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear to me what this paragraph means, you already explain the models analysis_method - so its not clear what point 2 and 3 mean, can you choose these options ? Would remove or rephrase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not the option that can be chosen. These are broader categories of models under which the analysis methods explained above fall . For instance CPLM, ZICP etc come under generalized linear model. If user chooses random effect , then model works as linear mixed model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulzierep I can make it more clear in the tutorial if you want. Let me know if the above explaination makes sense to you. I can add this explaination in the analysis method section and remove the following para on these models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then you should say, that those are models chosen by maaslin depending on the data input and the parameter choice. Otherwise, it's confusing. However, if CPLM, ZICP lead to Generalized Linear Models then this should also be stated. You got the When to Use: part - but users cannot choose anything - so that would confuse me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I will make the necessary changes. Thanks for the detailed explanation :)
In essence, uncovering associations between microbial features and metadata variables through tools like MaAsLin2 not only deepens our understanding of microbiome dynamics but also holds promise for clinical applications, personalized health strategies, and advancing the field of microbiome research. | ||
|
||
The results obtained from MaAslin2 can further be visualized using tools like [**phyloseq**](https://training.galaxyproject.org/training-material/by-tool/interactive_tool_phyloseq.html).\ | ||
Tools like phyloseq require that you prepare your data (OTU/ASV table, metadata) in a structured format before visualizing.\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add here, that the features need to be subsampled to only include significant features. This can be done with https://usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fampvis2_subset_taxa%2Fampvis2_subset_taxa%2F2.8.9%2Bgalaxy0&version=latest
"annotation": "", | ||
"comments": [], | ||
"format-version": "0.1", | ||
"name": "Workflow constructed from history 'Determining multivariable association between various meta\u2019omic features using MaAslin2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remove the two workflows, starting with Workflow constructed from history ...` and only keep the main one with the correct title ?
…s/Workflow-constructed-from-history-'Determining-multivariable-association-between-various-meta’omic-features-using-MaAslin2.ga
…ivariable-association-between-various-meta’omic-features-using-MaAslin2-tests.yml to Determining-multivariable-association-between-various-meta’omic-features-using-MaAslin2-tests.yml
…s/main_workflow.ga
Co-authored-by: paulzierep <[email protected]>
Co-authored-by: paulzierep <[email protected]>
Co-authored-by: paulzierep <[email protected]>
Co-authored-by: paulzierep <[email protected]>
Co-authored-by: paulzierep <[email protected]>
Tutorial draft on Maaslin2. Would love any suggestions or changes on this.