From 2a7b9c847d876c80e24f2c263cc5892e2f90df98 Mon Sep 17 00:00:00 2001
From: Victor Lin <13424970+victorlin@users.noreply.github.com>
Date: Fri, 26 Apr 2024 11:53:08 -0700
Subject: [PATCH 1/3] Deduplicate augur refine docs

Having a copy of the CLI reference on the FAQ page can be handy, but a
link to the actual reference page is sufficient. A similar argument can
be made for moving the guide out of the reference page.

Aligns with recent changes in "Link to new location for
filtering/subsampling docs" (8bdb0be2cd)
---
 docs/faq/faq.rst          |   2 +-
 docs/faq/refine.rst       | 134 ++++++++++++++++++++++++++++++++++++-
 docs/usage/cli/refine.rst | 135 +-------------------------------------
 3 files changed, 137 insertions(+), 134 deletions(-)
 mode change 120000 => 100644 docs/faq/refine.rst
diff --git a/docs/faq/faq.rst b/docs/faq/faq.rst
index 52edb03d2..67fe6a5b7 100644
--- a/docs/faq/faq.rst
+++ b/docs/faq/faq.rst
@@ -13,5 +13,5 @@ common questions and problems users run into.
    what-is-a-build
    metadata
    clades
-   Specifying `refine` rates <refine>
+   refine
    Creating a tree using your own tree builder <skip_augur_tree>
diff --git a/docs/faq/refine.rst b/docs/faq/refine.rst
deleted file mode 120000
index a3513aadd..000000000
--- a/docs/faq/refine.rst
+++ /dev/null
@@ -1 +0,0 @@
-../usage/cli/refine.rst
\ No newline at end of file
diff --git a/docs/faq/refine.rst b/docs/faq/refine.rst
new file mode 100644
index 000000000..d0c896f43
--- /dev/null
+++ b/docs/faq/refine.rst
@@ -0,0 +1,133 @@
+===========================
+Specifying ``refine`` rates
+===========================
+
+How we use refine in the zika tutorial
+======================================
+
+In the Zika tutorial we used the following basic rule to run the :doc:`../usage/cli/refine` command:
+
+.. code-block:: python
+
+    rule refine:
+        input:
+            tree = rules.tree.output.tree,
+            alignment = rules.align.output,
+            metadata = "data/metadata.tsv"
+        output:
+            tree = "results/tree.nwk",
+            node_data = "results/branch_lengths.json"
+        shell:
+            """
+            augur refine \
+                --tree {input.tree} \
+                --alignment {input.alignment} \
+                --metadata {input.metadata} \
+                --timetree \
+                --output-tree {output.tree} \
+                --output-node-data {output.node_data}
+            """
+
+
+This rule will estimate the rate of the molecular clock, reroot the tree, and estimate a time tree.
+The paragraphs below will detail how to exert more control on each of these steps through additional options the refine command.
+
+
+Specify the evolutionary rate
+=============================
+
+By default ``augur`` (through ``treetime``) will estimate the rate of evolution from the data by regressing divergence vs sampling date.
+In some scenarios, however, there is insufficient temporal signal to reliably estimate the rate and the analysis will be more robust and reproducible if one fixes this rate explicitly.
+This can be done via the flag ``--clock-rate <value>`` where the implied units are substitutions per site and year.
+In our zika example, this would look like this
+
+.. code-block:: diff
+
+    rule refine:
+        input:
+            tree = rules.tree.output.tree,
+            alignment = rules.align.output,
+            metadata = "data/metadata.tsv"
+        output:
+            tree = "results/tree.nwk",
+            node_data = "results/branch_lengths.json"
+    +    params:
+    +    	clock_rate = 0.0008
+        shell:
+            """
+            augur refine \
+                --tree {input.tree} \
+                --alignment {input.alignment} \
+                --metadata {input.metadata} \
+                --timetree \
+    +           --clock-rate {params.clock_rate} \
+                --output-tree {output.tree} \
+                --output-node-data {output.node_data}
+            """
+
+
+
+Confidence intervals for divergence times
+=========================================
+
+Divergence time estimates are probabilistic and uncertain for multiple reasons, primarily because the accumulation of mutations is a probabilistic process and the rate estimate itself is not precise.
+Augur/TreeTime will account for this uncertainty if the refine command is run with the flag ``--date-confidence`` and the standard deviation of the rate estimate is specified.
+
+.. code-block:: diff
+
+    rule refine:
+        input:
+            tree = rules.tree.output.tree,
+            alignment = rules.align.output,
+            metadata = "data/metadata.tsv"
+        output:
+            tree = "results/tree.nwk",
+            node_data = "results/branch_lengths.json"
+        params:
+            clock_rate = 0.0008,
+    +    	clock_std_dev = 0.0002
+        shell:
+            """
+            augur refine \
+                --tree {input.tree} \
+                --alignment {input.alignment} \
+                --metadata {input.metadata} \
+                --timetree \
+                --date-confidence \
+    +            --clock-rate {params.clock_rate} \
+    +            --clock-std-dev {params.clock_std_dev} \
+                --output-tree {output.tree} \
+                --output-node-data {output.node_data}
+            """
+
+If run with these parameters, augur will save an confidence interval (e.g. ``[2014.5,2014.7]``) for each node in the tree.
+
+By default, augur runs TreeTime in a "covariance-aware" mode where the root-to-tip regression accounts for shared ancestry and covariance between terminal nodes.
+This, however, is sometimes unstable when the temporal signal is low and can be switch off with the flag ``--no-covariance``.
+
+
+Specifying the root of the tree
+===============================
+
+By default, augur/TreeTime reroots your input tree to optimize the temporal signal in the data. This is robust when there is robust temporal signal.
+In other situations, you might want to specify the root explicitly, specify a rerooting mechanisms, or keep the root of the input tree.
+The latter can be achieved by passing the argument ``--keep-root``.
+To specify a particular strain (or the common ancestor of a group of strains), pass the name(s) of the(se) strain(s) like so:
+
+.. code-block:: bash
+
+    --root strain1 [strain2 strain3 ...]
+
+Other available rooting mechanisms are
+
+  * ``least-squares`` (default): minimize squared deviation of the root-to-tip regression
+  * ``min-dev``: essentially midpoint rooting minimizing the variance in root-to-tip distance
+  * ``oldest``: use the oldest strain as outgroup
+
+
+Polytomy resolution
+===================
+
+if the data set contains many very similar sequences, their evolutionary relationship some times remains ambiguous resulting in zero-length branches or polytomies (that is internal nodes with more than 2 children).
+Augur partially resolves those polytomies if such resolution helps the make the tree fit the temporal structure in the data.
+If this is undesired, this can be switched-off using ``--keep-polytomies``.
diff --git a/docs/usage/cli/refine.rst b/docs/usage/cli/refine.rst
index 1eb58a5a3..7deb940a3 100644
--- a/docs/usage/cli/refine.rst
+++ b/docs/usage/cli/refine.rst
@@ -14,136 +14,7 @@ augur refine
     :prog: augur
     :path: refine
 
+Guides
+======
 
-
-How we use refine in the zika tutorial
-======================================
-
-In the Zika tutorial we used the following basic rule to run the `refine` command:
-
-.. code-block:: python
-
-    rule refine:
-        input:
-            tree = rules.tree.output.tree,
-            alignment = rules.align.output,
-            metadata = "data/metadata.tsv"
-        output:
-            tree = "results/tree.nwk",
-            node_data = "results/branch_lengths.json"
-        shell:
-            """
-            augur refine \
-                --tree {input.tree} \
-                --alignment {input.alignment} \
-                --metadata {input.metadata} \
-                --timetree \
-                --output-tree {output.tree} \
-                --output-node-data {output.node_data}
-            """
-
-
-This rule will estimate the rate of the molecular clock, reroot the tree, and estimate a time tree.
-The paragraphs below will detail how to exert more control on each of these steps through additional options the refine command.
-
-
-Specify the evolutionary rate
-=============================
-
-By default ``augur`` (through ``treetime``) will estimate the rate of evolution from the data by regressing divergence vs sampling date.
-In some scenarios, however, there is insufficient temporal signal to reliably estimate the rate and the analysis will be more robust and reproducible if one fixes this rate explicitly.
-This can be done via the flag ``--clock-rate <value>`` where the implied units are substitutions per site and year.
-In our zika example, this would look like this
-
-.. code-block:: diff
-
-    rule refine:
-        input:
-            tree = rules.tree.output.tree,
-            alignment = rules.align.output,
-            metadata = "data/metadata.tsv"
-        output:
-            tree = "results/tree.nwk",
-            node_data = "results/branch_lengths.json"
-    +    params:
-    +    	clock_rate = 0.0008
-        shell:
-            """
-            augur refine \
-                --tree {input.tree} \
-                --alignment {input.alignment} \
-                --metadata {input.metadata} \
-                --timetree \
-    +           --clock-rate {params.clock_rate} \
-                --output-tree {output.tree} \
-                --output-node-data {output.node_data}
-            """
-
-
-
-Confidence intervals for divergence times
-=========================================
-
-Divergence time estimates are probabilistic and uncertain for multiple reasons, primarily because the accumulation of mutations is a probabilistic process and the rate estimate itself is not precise.
-Augur/TreeTime will account for this uncertainty if the refine command is run with the flag ``--date-confidence`` and the standard deviation of the rate estimate is specified.
-
-.. code-block:: diff
-
-    rule refine:
-        input:
-            tree = rules.tree.output.tree,
-            alignment = rules.align.output,
-            metadata = "data/metadata.tsv"
-        output:
-            tree = "results/tree.nwk",
-            node_data = "results/branch_lengths.json"
-        params:
-            clock_rate = 0.0008,
-    +    	clock_std_dev = 0.0002
-        shell:
-            """
-            augur refine \
-                --tree {input.tree} \
-                --alignment {input.alignment} \
-                --metadata {input.metadata} \
-                --timetree \
-                --date-confidence \
-    +            --clock-rate {params.clock_rate} \
-    +            --clock-std-dev {params.clock_std_dev} \
-                --output-tree {output.tree} \
-                --output-node-data {output.node_data}
-            """
-
-If run with these parameters, augur will save an confidence interval (e.g. ``[2014.5,2014.7]``) for each node in the tree.
-
-By default, augur runs TreeTime in a "covariance-aware" mode where the root-to-tip regression accounts for shared ancestry and covariance between terminal nodes.
-This, however, is sometimes unstable when the temporal signal is low and can be switch off with the flag ``--no-covariance``.
-
-
-Specifying the root of the tree
-===============================
-
-By default, augur/TreeTime reroots your input tree to optimize the temporal signal in the data. This is robust when there is robust temporal signal.
-In other situations, you might want to specify the root explicitly, specify a rerooting mechanisms, or keep the root of the input tree.
-The latter can be achieved by passing the argument ``--keep-root``.
-To specify a particular strain (or the common ancestor of a group of strains), pass the name(s) of the(se) strain(s) like so:
-
-.. code-block:: bash
-
-    --root strain1 [strain2 strain3 ...]
-
-Other available rooting mechanisms are
-
-  * ``least-squares`` (default): minimize squared deviation of the root-to-tip regression
-  * ``min-dev``: essentially midpoint rooting minimizing the variance in root-to-tip distance
-  * ``oldest``: use the oldest strain as outgroup
-
-
-Polytomy resolution
-===================
-
-if the data set contains many very similar sequences, their evolutionary relationship some times remains ambiguous resulting in zero-length branches or polytomies (that is internal nodes with more than 2 children).
-Augur partially resolves those polytomies if such resolution helps the make the tree fit the temporal structure in the data.
-If this is undesired, this can be switched-off using ``--keep-polytomies``.
-
-
+See :doc:`../../faq/refine`.

From 035df23793c72f3f9062e7eccf3b5ad85f14c166 Mon Sep 17 00:00:00 2001
From: Victor Lin <13424970+victorlin@users.noreply.github.com>
Date: Fri, 26 Apr 2024 12:01:51 -0700
Subject: [PATCH 2/3] Reword FAQ titles as questions

Otherwise the section title doesn't make sense.
---
 docs/faq/clades.md           | 2 +-
 docs/faq/faq.rst             | 2 +-
 docs/faq/metadata.rst        | 4 ++--
 docs/faq/refine.rst          | 6 +++---
 docs/faq/skip_augur_tree.rst | 6 +++---
 docs/faq/what-is-a-build.md  | 2 +-
 6 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/docs/faq/clades.md b/docs/faq/clades.md
index 17c8f50cc..abee0a101 100644
--- a/docs/faq/clades.md
+++ b/docs/faq/clades.md
@@ -1,4 +1,4 @@
-# Labeling `clades`
+# How do I label `clades`?
 
 Clades in phylogenetic trees are often named to facilitate discussion of genetic diversity, see for example [seasonal influenza on nextstrain](https://nextstrain.org/flu).
 Augur has a command to determine the position of such clade labels and assign sequences to clades.
diff --git a/docs/faq/faq.rst b/docs/faq/faq.rst
index 67fe6a5b7..2ec1b362f 100644
--- a/docs/faq/faq.rst
+++ b/docs/faq/faq.rst
@@ -14,4 +14,4 @@ common questions and problems users run into.
    metadata
    clades
    refine
-   Creating a tree using your own tree builder <skip_augur_tree>
+   skip_augur_tree
diff --git a/docs/faq/metadata.rst b/docs/faq/metadata.rst
index cbf54850b..de22b4417 100644
--- a/docs/faq/metadata.rst
+++ b/docs/faq/metadata.rst
@@ -1,5 +1,5 @@
-Preparing Your Metadata
-=======================
+How do I prepare metadata?
+==========================
 
 Analyses are vastly more interesting if the sequences or samples
 analyzed have rich 'meta data' wherever possible. This metadata could
diff --git a/docs/faq/refine.rst b/docs/faq/refine.rst
index d0c896f43..511dee346 100644
--- a/docs/faq/refine.rst
+++ b/docs/faq/refine.rst
@@ -1,6 +1,6 @@
-===========================
-Specifying ``refine`` rates
-===========================
+==================================
+How do I specify ``refine`` rates?
+==================================
 
 How we use refine in the zika tutorial
 ======================================
diff --git a/docs/faq/skip_augur_tree.rst b/docs/faq/skip_augur_tree.rst
index e6f58d81b..71616351b 100644
--- a/docs/faq/skip_augur_tree.rst
+++ b/docs/faq/skip_augur_tree.rst
@@ -1,6 +1,6 @@
-===========================================
-Creating a tree using your own tree builder
-===========================================
+=================================
+How do I use my own tree builder?
+=================================
 
 The `augur tree` command is a light wrapper around tree building programs such as IQ-TREE, RAxML and FastTree.
 It's possible that the functionality you want isn't available in those programs, or that it is available but that `augur tree` doesn't expose the functionality you need.
diff --git a/docs/faq/what-is-a-build.md b/docs/faq/what-is-a-build.md
index 405c67163..83c05f172 100644
--- a/docs/faq/what-is-a-build.md
+++ b/docs/faq/what-is-a-build.md
@@ -1,4 +1,4 @@
-# The concept of a 'build'
+# What is a "build"?
 
 Nextstrain's focus on providing a _real-time_ snapshot of evolving pathogen populations necessitates a reproducible analysis that can be rerun when new sequences are available.
 The individual steps necessary to repeat analysis together comprise a "build".

From be37dbb740087447588bdf631f2399d2b85c4db5 Mon Sep 17 00:00:00 2001
From: Victor Lin <13424970+victorlin@users.noreply.github.com>
Date: Fri, 26 Apr 2024 12:03:06 -0700
Subject: [PATCH 3/3] Update seasonal-flu link

---
 docs/faq/clades.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/faq/clades.md b/docs/faq/clades.md
index abee0a101..313b93dd5 100644
--- a/docs/faq/clades.md
+++ b/docs/faq/clades.md
@@ -1,6 +1,6 @@
 # How do I label `clades`?
 
-Clades in phylogenetic trees are often named to facilitate discussion of genetic diversity, see for example [seasonal influenza on nextstrain](https://nextstrain.org/flu).
+Clades in phylogenetic trees are often named to facilitate discussion of genetic diversity, see for example [seasonal influenza on nextstrain](https://nextstrain.org/seasonal-flu).
 Augur has a command to determine the position of such clade labels and assign sequences to clades.
 The definition of these clades are provided in a tab-delimited file (tsv) using the following format:
 ```