Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaAslin2-Tutorial #5173

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

Conversation

renu-pal
Copy link

@renu-pal renu-pal commented Jul 16, 2024

Tutorial draft on Maaslin2. Would love any suggestions or changes on this.

@shiltemann shiltemann marked this pull request as ready for review July 30, 2024 09:57
@shiltemann
Copy link
Member

Hi @renu-pal, thanks for your contribution! I have taken it out of draft mode so that our tests can run and people will know they can review it :)

@renu-pal
Copy link
Author

Hi @renu-pal, thanks for your contribution! I have taken it out of draft mode so that our tests can run and people will know they can review it :)

Sounds great! :)

@shiltemann shiltemann requested a review from a team August 8, 2024 09:31
Copy link
Collaborator

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! would be nice to extand a bit, show different options and some more output

- name: 'DOI: 10.5281/zenodo.12614561'
description: latest
items:
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv/content
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://zenodo.org/records/12614561/files/HMP2_metadata.tsv should also work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv/content
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it resolved, It's still not correct ... ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought miss Shilteman made the last update about this exactly , I did not notice it was data-library.yaml file you were mentioning. I am extremely sorry for that. I will make the necessary changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, made the change
63a44c4

>
{: .agenda}

# Get the data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the data come from, what does it contain, was it used in other studies, are there other studies that suggest ideal maaslin2 parameters for this kind of data ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find studies which used both HMP2 data as well as Maaslin2 tool. So instead I mentioned studies which used Maaslin2 tool with the parameters used. If it does not feel right , then please let me know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its enought to add that this was the initial data used to demonstrate maaslin2 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds better

@paulzierep
Copy link
Collaborator

Can you look into the linting issues as well ?

@bgruening
Copy link
Member

@renu-pal do you need any help here?

@renu-pal
Copy link
Author

renu-pal commented Sep 8, 2024

@renu-pal do you need any help here?

Definitely @bgruening :) . Can you please go through the tutorial and let me know if you find anything wrong.

bgruening
bgruening previously approved these changes Sep 28, 2024
- name: 'DOI: 10.5281/zenodo.12614561'
description: latest
items:
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv/content
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv/content
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv

"annotation": "",
"comments": [],
"format-version": "0.1",
"name": "Workflow constructed from history 'Determining multivariable association between various meta\u2019omic features using MaAslin2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"name": "Workflow constructed from history 'Determining multivariable association between various meta\u2019omic features using MaAslin2",
"name": "Determining multivariable association between various meta-omic features using MaAslin2",

This way?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the omitted part just to differentiate a between the title of workflow file and tutorial file. If it doesn't cause any issue then I believe we should not change it. What are your thoughts on this ? @bgruening

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason why a workflow should be named 'construceted from history'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove the two workflows, starting with Workflow constructed from history ...` and only keep the main one with the correct title ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>
{: .agenda}

# Get the data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- **Sensitivity** measures how well the methods detect true signals, higher values lead to better performance.
- **False discovery rate (FDR)** measures the proportion of false positives among detected signals (lower FDR is better).
- MaAsLin2 is the clear standout for both differential abundance detection and multivariable association detection, showing high sensitivity and maintaining a low FDR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add something like:
In this regard, MaAsLin2 can be seen as a Swiss army knife for differential analysis of microbiome data. With some text processing various omics data types could be used as input e.g. from these GTN tutorials:

then check the existing GTN toturial and add those that provide matching data

  • bla
  • blumb

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

description: Galaxy Training Network Material
synopsis: Galaxy Training Network Material. See https://training.galaxyproject.org
items:
- name: The new topic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: The new topic
- name: The new topic

?

| **Phyloseq + DESeq2**| Strong for RNA-seq and transcriptomics; integrates with Phyloseq | Lacks compositionality awareness | While DESeq2 works for microbiome data, MaAsLin2 offers more suitable options for compositional data and covariate handling. |
| **Limma-Voom** | Effective for RNA-seq and microarray data, handles low counts | Not tailored for compositional microbiome data | Limma-Voom is well-suited for gene expression, but MaAsLin2 better accounts for the unique characteristics of microbiome data. |

- ANCOM-BC and MaAsLin2, outperform general-purpose tools like DESeq2 and limma-voom when it comes to microbiome data. This is due to their handling of the compositional nature of microbiome data and the sparsity typical of microbial datasets.[PMID: 36617187](https://pubmed.ncbi.nlm.nih.gov/36617187/)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please exchange all links to studies with citations as shown here:

The currently available studies used Illumina sequencing, generating short reads. Longer read lengths, generated by third-generation sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), make it **easier and more practical to identify strains with fewer reads**. MinION (from Oxford Nanopore) is a portable, real-time device for ONT sequencing. Several proof-of-principle studies have shown the **utility of ONT long-read sequencing from metagenomic samples for pathogen identification** ({% cite CIUFFREDA20211497 %}).

- **Taxonomy (or features) file**: \
This file is tab-delimited.\
Formatted with features as columns and samples as rows.\
The transposition of this format is also okay.\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ? Can our wrapper work with both ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out @paulzierep . I cross checked and unfortunately our wrapper cannot work with both. I have made the changes accordingly in the tutorial. Sorry for the confusion.


MaAsLin2 requires the following input files:

- **Taxonomy (or features) file**: \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Taxonomy (or features) file**: \
- **Features file**: \

Would like to keep it more generic. Maybe add some examples like: (OTU/ASV abundance table, MAGs abundance matrix, taxonomy table, gene count matrix at the end of the list

>
{: .agenda}

# Get the data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its enought to add that this was the initial data used to demonstrate maaslin2 ?

Formatted with features as columns and samples as rows.\
The transposition of this format is also okay.

The Taxonomy file can contain samples not included in the metadata file (or vice versa). For both cases, those samples not included in both files will be removed from the analysis.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Taxonomy file can contain samples not included in the metadata file (or vice versa). For both cases, those samples not included in both files will be removed from the analysis.
The feature file can contain samples not included in the metadata file (or vice versa). For both cases, those samples not included in both files will be removed from the analysis.

author = {GTN community},
title = {GTN Training Materials: Collection of tutorials developed and maintained by the worldwide Galaxy community},
url = {https://training.galaxyproject.org},
urldate = {2021-03-24}
Copy link
Member

@shiltemann shiltemann Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our bibtex parser is failing to parse urldate key. I think we should support this in the future, but perhaps for now you can set the date in a note so that the tests don't fail, e.g.:

note = "[Online; accessed Fri Nov 05 2021]"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! Will make the necessary changes :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulzierep
Copy link
Collaborator

@renu-pal can you check the linting ?

Copy link
Collaborator

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not mark comments as resolved if they are not resolved, or explain otherwise why it's not needed.
Otherwise, minor changes, then we should be good to merge. Thanks !

- name: 'DOI: 10.5281/zenodo.12614561'
description: latest
items:
- url: https://zenodo.org/api/records/12614561/files/HMP2_metadata.tsv/content
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it resolved, It's still not correct ... ?

> {: .question}
>
{: .hands_on}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Data that can be analysed with MaAsLi2

Can you add some subsections to make it better structured ?


![sensitivity and false discovery rate (FDR) across different tools](https://journals.plos.org/ploscompbiol/article/figure/image?size=large&id=10.1371/journal.pcbi.1009442.g004 "Source: <a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009442#pcbi-1009442-g004">sensitivity and false discovery rate (FDR) across different tools</a>"){:width="60%"}

- The above figure compares various tools for differential abundance detection (Panel A) and multivariable association detection (Panel B) in microbiome studies, based on sensitivity and false discovery rate (FDR).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the next 4 sentences do no need to be bullet points.

9. **analysis_method** [ Default: "LM" ] [ Options: "LM", "CPLM", "ZICP", "NEGBIN", "ZINB" ]
- The analysis method to apply.
- Options: \
1. <u>Linear Model (LM)</u>: Determines how changes in metadata are associated with changes in the taxonomy data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. <u>Linear Model (LM)</u>: Determines how changes in metadata are associated with changes in the taxonomy data.
1. <u>Linear Model (LM)</u>: The defaul linear model of MaAsLin2. By default, MaAsLin2 uses linear regression (for continuous metadata) or logistic regression (for binary metadata) as its association method.

All models should do that, but what is the difference between them

14. **plot_scatter**: Generate scatter plots for the significant associations [ Default: TRUE ]
15. **cores**: The number of R processes to run in parallel [ Default: 1 ]

# Model types in MaAslin2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear to me what this paragraph means, you already explain the models analysis_method - so its not clear what point 2 and 3 mean, can you choose these options ? Would remove or rephrase.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not the option that can be chosen. These are broader categories of models under which the analysis methods explained above fall . For instance CPLM, ZICP etc come under generalized linear model. If user chooses random effect , then model works as linear mixed model.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulzierep I can make it more clear in the tutorial if you want. Let me know if the above explaination makes sense to you. I can add this explaination in the analysis method section and remove the following para on these models.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then you should say, that those are models chosen by maaslin depending on the data input and the parameter choice. Otherwise, it's confusing. However, if CPLM, ZICP lead to Generalized Linear Models then this should also be stated. You got the When to Use: part - but users cannot choose anything - so that would confuse me

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I will make the necessary changes. Thanks for the detailed explanation :)

In essence, uncovering associations between microbial features and metadata variables through tools like MaAsLin2 not only deepens our understanding of microbiome dynamics but also holds promise for clinical applications, personalized health strategies, and advancing the field of microbiome research.

The results obtained from MaAslin2 can further be visualized using tools like [**phyloseq**](https://training.galaxyproject.org/training-material/by-tool/interactive_tool_phyloseq.html).\
Tools like phyloseq require that you prepare your data (OTU/ASV table, metadata) in a structured format before visualizing.\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add here, that the features need to be subsampled to only include significant features. This can be done with https://usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fiuc%2Fampvis2_subset_taxa%2Fampvis2_subset_taxa%2F2.8.9%2Bgalaxy0&version=latest

"annotation": "",
"comments": [],
"format-version": "0.1",
"name": "Workflow constructed from history 'Determining multivariable association between various meta\u2019omic features using MaAslin2",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove the two workflows, starting with Workflow constructed from history ...` and only keep the main one with the correct title ?

renu-pal and others added 10 commits October 16, 2024 17:20
…s/Workflow-constructed-from-history-'Determining-multivariable-association-between-various-meta’omic-features-using-MaAslin2.ga
…ivariable-association-between-various-meta’omic-features-using-MaAslin2-tests.yml to Determining-multivariable-association-between-various-meta’omic-features-using-MaAslin2-tests.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants