From 2a7b9c847d876c80e24f2c263cc5892e2f90df98 Mon Sep 17 00:00:00 2001 From: Victor Lin <13424970+victorlin@users.noreply.github.com> Date: Fri, 26 Apr 2024 11:53:08 -0700 Subject: [PATCH 1/3] Deduplicate augur refine docs Having a copy of the CLI reference on the FAQ page can be handy, but a link to the actual reference page is sufficient. A similar argument can be made for moving the guide out of the reference page. Aligns with recent changes in "Link to new location for filtering/subsampling docs" (8bdb0be2cd) --- docs/faq/faq.rst | 2 +- docs/faq/refine.rst | 134 ++++++++++++++++++++++++++++++++++++- docs/usage/cli/refine.rst | 135 +------------------------------------- 3 files changed, 137 insertions(+), 134 deletions(-) mode change 120000 => 100644 docs/faq/refine.rst diff --git a/docs/faq/faq.rst b/docs/faq/faq.rst index 52edb03d2..67fe6a5b7 100644 --- a/docs/faq/faq.rst +++ b/docs/faq/faq.rst @@ -13,5 +13,5 @@ common questions and problems users run into. what-is-a-build metadata clades - Specifying `refine` rates + refine Creating a tree using your own tree builder diff --git a/docs/faq/refine.rst b/docs/faq/refine.rst deleted file mode 120000 index a3513aadd..000000000 --- a/docs/faq/refine.rst +++ /dev/null @@ -1 +0,0 @@ -../usage/cli/refine.rst \ No newline at end of file diff --git a/docs/faq/refine.rst b/docs/faq/refine.rst new file mode 100644 index 000000000..d0c896f43 --- /dev/null +++ b/docs/faq/refine.rst @@ -0,0 +1,133 @@ +=========================== +Specifying ``refine`` rates +=========================== + +How we use refine in the zika tutorial +====================================== + +In the Zika tutorial we used the following basic rule to run the :doc:`../usage/cli/refine` command: + +.. code-block:: python + + rule refine: + input: + tree = rules.tree.output.tree, + alignment = rules.align.output, + metadata = "data/metadata.tsv" + output: + tree = "results/tree.nwk", + node_data = "results/branch_lengths.json" + shell: + """ + augur refine \ + --tree {input.tree} \ + --alignment {input.alignment} \ + --metadata {input.metadata} \ + --timetree \ + --output-tree {output.tree} \ + --output-node-data {output.node_data} + """ + + +This rule will estimate the rate of the molecular clock, reroot the tree, and estimate a time tree. +The paragraphs below will detail how to exert more control on each of these steps through additional options the refine command. + + +Specify the evolutionary rate +============================= + +By default ``augur`` (through ``treetime``) will estimate the rate of evolution from the data by regressing divergence vs sampling date. +In some scenarios, however, there is insufficient temporal signal to reliably estimate the rate and the analysis will be more robust and reproducible if one fixes this rate explicitly. +This can be done via the flag ``--clock-rate `` where the implied units are substitutions per site and year. +In our zika example, this would look like this + +.. code-block:: diff + + rule refine: + input: + tree = rules.tree.output.tree, + alignment = rules.align.output, + metadata = "data/metadata.tsv" + output: + tree = "results/tree.nwk", + node_data = "results/branch_lengths.json" + + params: + + clock_rate = 0.0008 + shell: + """ + augur refine \ + --tree {input.tree} \ + --alignment {input.alignment} \ + --metadata {input.metadata} \ + --timetree \ + + --clock-rate {params.clock_rate} \ + --output-tree {output.tree} \ + --output-node-data {output.node_data} + """ + + + +Confidence intervals for divergence times +========================================= + +Divergence time estimates are probabilistic and uncertain for multiple reasons, primarily because the accumulation of mutations is a probabilistic process and the rate estimate itself is not precise. +Augur/TreeTime will account for this uncertainty if the refine command is run with the flag ``--date-confidence`` and the standard deviation of the rate estimate is specified. + +.. code-block:: diff + + rule refine: + input: + tree = rules.tree.output.tree, + alignment = rules.align.output, + metadata = "data/metadata.tsv" + output: + tree = "results/tree.nwk", + node_data = "results/branch_lengths.json" + params: + clock_rate = 0.0008, + + clock_std_dev = 0.0002 + shell: + """ + augur refine \ + --tree {input.tree} \ + --alignment {input.alignment} \ + --metadata {input.metadata} \ + --timetree \ + --date-confidence \ + + --clock-rate {params.clock_rate} \ + + --clock-std-dev {params.clock_std_dev} \ + --output-tree {output.tree} \ + --output-node-data {output.node_data} + """ + +If run with these parameters, augur will save an confidence interval (e.g. ``[2014.5,2014.7]``) for each node in the tree. + +By default, augur runs TreeTime in a "covariance-aware" mode where the root-to-tip regression accounts for shared ancestry and covariance between terminal nodes. +This, however, is sometimes unstable when the temporal signal is low and can be switch off with the flag ``--no-covariance``. + + +Specifying the root of the tree +=============================== + +By default, augur/TreeTime reroots your input tree to optimize the temporal signal in the data. This is robust when there is robust temporal signal. +In other situations, you might want to specify the root explicitly, specify a rerooting mechanisms, or keep the root of the input tree. +The latter can be achieved by passing the argument ``--keep-root``. +To specify a particular strain (or the common ancestor of a group of strains), pass the name(s) of the(se) strain(s) like so: + +.. code-block:: bash + + --root strain1 [strain2 strain3 ...] + +Other available rooting mechanisms are + + * ``least-squares`` (default): minimize squared deviation of the root-to-tip regression + * ``min-dev``: essentially midpoint rooting minimizing the variance in root-to-tip distance + * ``oldest``: use the oldest strain as outgroup + + +Polytomy resolution +=================== + +if the data set contains many very similar sequences, their evolutionary relationship some times remains ambiguous resulting in zero-length branches or polytomies (that is internal nodes with more than 2 children). +Augur partially resolves those polytomies if such resolution helps the make the tree fit the temporal structure in the data. +If this is undesired, this can be switched-off using ``--keep-polytomies``. diff --git a/docs/usage/cli/refine.rst b/docs/usage/cli/refine.rst index 1eb58a5a3..7deb940a3 100644 --- a/docs/usage/cli/refine.rst +++ b/docs/usage/cli/refine.rst @@ -14,136 +14,7 @@ augur refine :prog: augur :path: refine +Guides +====== - -How we use refine in the zika tutorial -====================================== - -In the Zika tutorial we used the following basic rule to run the `refine` command: - -.. code-block:: python - - rule refine: - input: - tree = rules.tree.output.tree, - alignment = rules.align.output, - metadata = "data/metadata.tsv" - output: - tree = "results/tree.nwk", - node_data = "results/branch_lengths.json" - shell: - """ - augur refine \ - --tree {input.tree} \ - --alignment {input.alignment} \ - --metadata {input.metadata} \ - --timetree \ - --output-tree {output.tree} \ - --output-node-data {output.node_data} - """ - - -This rule will estimate the rate of the molecular clock, reroot the tree, and estimate a time tree. -The paragraphs below will detail how to exert more control on each of these steps through additional options the refine command. - - -Specify the evolutionary rate -============================= - -By default ``augur`` (through ``treetime``) will estimate the rate of evolution from the data by regressing divergence vs sampling date. -In some scenarios, however, there is insufficient temporal signal to reliably estimate the rate and the analysis will be more robust and reproducible if one fixes this rate explicitly. -This can be done via the flag ``--clock-rate `` where the implied units are substitutions per site and year. -In our zika example, this would look like this - -.. code-block:: diff - - rule refine: - input: - tree = rules.tree.output.tree, - alignment = rules.align.output, - metadata = "data/metadata.tsv" - output: - tree = "results/tree.nwk", - node_data = "results/branch_lengths.json" - + params: - + clock_rate = 0.0008 - shell: - """ - augur refine \ - --tree {input.tree} \ - --alignment {input.alignment} \ - --metadata {input.metadata} \ - --timetree \ - + --clock-rate {params.clock_rate} \ - --output-tree {output.tree} \ - --output-node-data {output.node_data} - """ - - - -Confidence intervals for divergence times -========================================= - -Divergence time estimates are probabilistic and uncertain for multiple reasons, primarily because the accumulation of mutations is a probabilistic process and the rate estimate itself is not precise. -Augur/TreeTime will account for this uncertainty if the refine command is run with the flag ``--date-confidence`` and the standard deviation of the rate estimate is specified. - -.. code-block:: diff - - rule refine: - input: - tree = rules.tree.output.tree, - alignment = rules.align.output, - metadata = "data/metadata.tsv" - output: - tree = "results/tree.nwk", - node_data = "results/branch_lengths.json" - params: - clock_rate = 0.0008, - + clock_std_dev = 0.0002 - shell: - """ - augur refine \ - --tree {input.tree} \ - --alignment {input.alignment} \ - --metadata {input.metadata} \ - --timetree \ - --date-confidence \ - + --clock-rate {params.clock_rate} \ - + --clock-std-dev {params.clock_std_dev} \ - --output-tree {output.tree} \ - --output-node-data {output.node_data} - """ - -If run with these parameters, augur will save an confidence interval (e.g. ``[2014.5,2014.7]``) for each node in the tree. - -By default, augur runs TreeTime in a "covariance-aware" mode where the root-to-tip regression accounts for shared ancestry and covariance between terminal nodes. -This, however, is sometimes unstable when the temporal signal is low and can be switch off with the flag ``--no-covariance``. - - -Specifying the root of the tree -=============================== - -By default, augur/TreeTime reroots your input tree to optimize the temporal signal in the data. This is robust when there is robust temporal signal. -In other situations, you might want to specify the root explicitly, specify a rerooting mechanisms, or keep the root of the input tree. -The latter can be achieved by passing the argument ``--keep-root``. -To specify a particular strain (or the common ancestor of a group of strains), pass the name(s) of the(se) strain(s) like so: - -.. code-block:: bash - - --root strain1 [strain2 strain3 ...] - -Other available rooting mechanisms are - - * ``least-squares`` (default): minimize squared deviation of the root-to-tip regression - * ``min-dev``: essentially midpoint rooting minimizing the variance in root-to-tip distance - * ``oldest``: use the oldest strain as outgroup - - -Polytomy resolution -=================== - -if the data set contains many very similar sequences, their evolutionary relationship some times remains ambiguous resulting in zero-length branches or polytomies (that is internal nodes with more than 2 children). -Augur partially resolves those polytomies if such resolution helps the make the tree fit the temporal structure in the data. -If this is undesired, this can be switched-off using ``--keep-polytomies``. - - +See :doc:`../../faq/refine`. From 035df23793c72f3f9062e7eccf3b5ad85f14c166 Mon Sep 17 00:00:00 2001 From: Victor Lin <13424970+victorlin@users.noreply.github.com> Date: Fri, 26 Apr 2024 12:01:51 -0700 Subject: [PATCH 2/3] Reword FAQ titles as questions Otherwise the section title doesn't make sense. --- docs/faq/clades.md | 2 +- docs/faq/faq.rst | 2 +- docs/faq/metadata.rst | 4 ++-- docs/faq/refine.rst | 6 +++--- docs/faq/skip_augur_tree.rst | 6 +++--- docs/faq/what-is-a-build.md | 2 +- 6 files changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/faq/clades.md b/docs/faq/clades.md index 17c8f50cc..abee0a101 100644 --- a/docs/faq/clades.md +++ b/docs/faq/clades.md @@ -1,4 +1,4 @@ -# Labeling `clades` +# How do I label `clades`? Clades in phylogenetic trees are often named to facilitate discussion of genetic diversity, see for example [seasonal influenza on nextstrain](https://nextstrain.org/flu). Augur has a command to determine the position of such clade labels and assign sequences to clades. diff --git a/docs/faq/faq.rst b/docs/faq/faq.rst index 67fe6a5b7..2ec1b362f 100644 --- a/docs/faq/faq.rst +++ b/docs/faq/faq.rst @@ -14,4 +14,4 @@ common questions and problems users run into. metadata clades refine - Creating a tree using your own tree builder + skip_augur_tree diff --git a/docs/faq/metadata.rst b/docs/faq/metadata.rst index cbf54850b..de22b4417 100644 --- a/docs/faq/metadata.rst +++ b/docs/faq/metadata.rst @@ -1,5 +1,5 @@ -Preparing Your Metadata -======================= +How do I prepare metadata? +========================== Analyses are vastly more interesting if the sequences or samples analyzed have rich 'meta data' wherever possible. This metadata could diff --git a/docs/faq/refine.rst b/docs/faq/refine.rst index d0c896f43..511dee346 100644 --- a/docs/faq/refine.rst +++ b/docs/faq/refine.rst @@ -1,6 +1,6 @@ -=========================== -Specifying ``refine`` rates -=========================== +================================== +How do I specify ``refine`` rates? +================================== How we use refine in the zika tutorial ====================================== diff --git a/docs/faq/skip_augur_tree.rst b/docs/faq/skip_augur_tree.rst index e6f58d81b..71616351b 100644 --- a/docs/faq/skip_augur_tree.rst +++ b/docs/faq/skip_augur_tree.rst @@ -1,6 +1,6 @@ -=========================================== -Creating a tree using your own tree builder -=========================================== +================================= +How do I use my own tree builder? +================================= The `augur tree` command is a light wrapper around tree building programs such as IQ-TREE, RAxML and FastTree. It's possible that the functionality you want isn't available in those programs, or that it is available but that `augur tree` doesn't expose the functionality you need. diff --git a/docs/faq/what-is-a-build.md b/docs/faq/what-is-a-build.md index 405c67163..83c05f172 100644 --- a/docs/faq/what-is-a-build.md +++ b/docs/faq/what-is-a-build.md @@ -1,4 +1,4 @@ -# The concept of a 'build' +# What is a "build"? Nextstrain's focus on providing a _real-time_ snapshot of evolving pathogen populations necessitates a reproducible analysis that can be rerun when new sequences are available. The individual steps necessary to repeat analysis together comprise a "build". From be37dbb740087447588bdf631f2399d2b85c4db5 Mon Sep 17 00:00:00 2001 From: Victor Lin <13424970+victorlin@users.noreply.github.com> Date: Fri, 26 Apr 2024 12:03:06 -0700 Subject: [PATCH 3/3] Update seasonal-flu link --- docs/faq/clades.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/faq/clades.md b/docs/faq/clades.md index abee0a101..313b93dd5 100644 --- a/docs/faq/clades.md +++ b/docs/faq/clades.md @@ -1,6 +1,6 @@ # How do I label `clades`? -Clades in phylogenetic trees are often named to facilitate discussion of genetic diversity, see for example [seasonal influenza on nextstrain](https://nextstrain.org/flu). +Clades in phylogenetic trees are often named to facilitate discussion of genetic diversity, see for example [seasonal influenza on nextstrain](https://nextstrain.org/seasonal-flu). Augur has a command to determine the position of such clade labels and assign sequences to clades. The definition of these clades are provided in a tab-delimited file (tsv) using the following format: ```