start docs

NeurodataWithoutBorders · Feb 18, 2024 · 11a75ee · 11a75ee
1 parent 72cdfde
commit 11a75ee
Show file tree

Hide file tree

Showing 9 changed files with 136 additions and 90 deletions.
diff --git a/README.md b/README.md
@@ -1,47 +1,8 @@
 # nwb_benchmarks
 
-Benchmark suite for NWB performances using [airspeed velocity](https://asv.readthedocs.io/en/stable/).
+Benchmark suite for NWB performances using a customization of [airspeed velocity](https://asv.readthedocs.io/en/stable/).
 
-## Getting Started
 
-To get started, clone this repo...
-
-```
-git clone https://github.com/neurodatawithoutborders/nwb_benchmarks.git
-cd nwb_benchmarks
-```
-
-Setup the environment...
-
-```
-conda env create --file environments/nwb_benchmarks.yaml --no-default-packages
-conda activate nwb_benchmarks
-```
-
-Configure tracking of our custom machine-dependent parameters by calling...
-
-```
-asv machine --yes
-python src/nwb_benchmarks/setup/configure_machine.py
-```
-
-Please note that we do not currently distinguish any configurations based on your internet; as such there may be difference observed from the same machine in the results database if that machine is a laptop that runs the testing suite on a wide variety of internet qualities.
-
-## Running Benchmarks
-
-To run the full benchmark suite, please ensure you are not running any additional heavy processes in the background to avoid interference or bottlenecks, then execute the command...
-
-```
-nwb_benchmarks run
-```
-
-Many of the current tests can take several minutes to complete; the entire suite can take 10 or more minutes. Grab some coffee, read a book, or better yet (when the suite becomes larger) just leave it to run overnight.
-
-To run only a single benchmark, use the `--bench <benchmark file stem or module+class+test function names>` flag.
-
-To contribute your results back to the project, just be sure to `git add` and `commit` the results in the main `results` folder.
-
-Note: Each result file should be single to double-digit KB in size; if we ever reach the point where this is prohibitive to store on GitHub itself, then we will investigate other upload strategies and purge the folder from the repository history.
 
 ## Building the documentation
 

diff --git a/docs/development.rst b/docs/development.rst
@@ -3,13 +3,20 @@ Development
 
 This section covers advanced details of managing the operation of the AirSpeed Velocity testing suite.
 
-- TODO: add section on environment matrices and current `python=same`
-- TODO: add section on custom network packet tracking
-- TODO: add section outlining the approach of the machine customization
-
-.. Indices and tables
-.. ==================
-..
-.. * :ref:`genindex`
-.. * :ref:`modindex`
-.. * :ref:`search`
+
+Customized Machine Header
+-------------------------
+
+
+Customized Call to Run
+----------------------
+
+
+Customized Parsing of Results
+-----------------------------
+
+
+Network Tracking
+----------------
+
+Please contact Oliver Ruebel for details.
diff --git a/docs/index.rst b/docs/index.rst
@@ -1,7 +1,7 @@
 nwb_benchmarks
 ==============
 
-This project is an effort to establish and understand, in a robust and reproducible manner, the principles underlying optimized file storage patterns for reading and writing NWB files from both local filesystems and the cloud (in particular, AWS S3).
+This project is an effort to understand, in a robust and reproducible manner, the principles underlying optimized file storage patterns for reading and writing NWB files from both local filesystems and remotely from the cloud (in particular, AWS S3 buckets).
 
 Funding is provided by NOSI ...
 
@@ -10,7 +10,7 @@ Funding is provided by NOSI ...
   :caption: Contents
 
   setup
-  using_asv
+  running_benchmarks
   writing_benchmarks
   development
 

diff --git a/docs/running_benchmarks.rst b/docs/running_benchmarks.rst
@@ -0,0 +1,52 @@
+Running the Benchmarks
+======================
+
+Before running the benchmark suite, please ensure you are not running any additional heavy processes in the background to avoid interference or bottlenecks.
+
+To run the full benchmark suite, simply call...
+
+.. code-block::
+
+    nwb_benchmarks run
+
+Many of the current tests can take several minutes to complete; the entire suite will take many times that. Grab some coffee, read a book, or better yet (when the suite becomes larger) just leave it to run overnight.
+
+
+Additional Flags
+----------------
+
+Subset of the Suite
+~~~~~~~~~~~~~~~~~~~
+
+To run only a single benchmark suite (a single file in the ``benchmarks`` directory), use the command...
+
+.. code-block::
+
+    nwb_benchmarks run --bench <benchmark file stem or module+class+test function names>
+
+For example,
+
+.. code-block::
+
+    nwb_benchmarks run --bench time_remote_slicing
+
+Debug mode
+~~~~~~~~~~
+
+If you want to get a full traceback to examine why a new test might be failing, simply add the flag...
+
+.. code-block::
+
+    nwb_benchmarks run --debug
+
+
+Contributing Results
+--------------------
+
+To contribute your results back to the project, all you have to do is `git add` and `commit` the results in the `results` folder.
+
+Then, open a PR to merge the results to the `main` branch.
+
+.. note::
+
+    Each result file should be single to double-digit KB in size; if we ever reach the point where this is prohibitive to store on GitHub itself, then we will investigate other upload strategies and purge the folder from the repository history.
diff --git a/docs/setup.rst b/docs/setup.rst
@@ -1,11 +1,22 @@
 Setup
 =====
 
-TODO: move from README
-
-.. Indices and tables
-.. ==================
-..
-.. * :ref:`genindex`
-.. * :ref:`modindex`
-.. * :ref:`search`
+To get started, clone this repo...
+
+.. code-block::
+
+    git clone https://github.com/neurodatawithoutborders/nwb_benchmarks.git
+    cd nwb_benchmarks
+
+Setup a completely fresh environment...
+
+.. code-block::
+
+    conda env create --file environments/nwb_benchmarks.yaml --no-default-packages
+    conda activate nwb_benchmarks
+
+Setup initial machine configuration values with
+
+.. code-block::
+
+    nwb_benchmarks setup
diff --git a/docs/using_asv.rst b/docs/using_asv.rst
diff --git a/docs/writing_benchmarks.rst b/docs/writing_benchmarks.rst
@@ -1,16 +1,36 @@
 Writing Benchmarks
 ==================
 
-Have an idea for how to speed up read or write from a local or remote NWB file? This section explains how to write your own benchmark to prove it robustly across platforms, architectures, and environments.
-
-- TODO: cover standard prefixes
-- TODO: cover standard setup values
-- TODO: cover custom setup/teardown trackers
-- TODO: cover params
-
-.. Indices and tables
-.. ==================
-..
-.. * :ref:`genindex`
-.. * :ref:`modindex`
-.. * :ref:`search`
+Have an idea for how to speed up read or write from a local or remote NWB file?
+
+This section explains how to write your own benchmark to prove it robustly across platforms, architectures, and environments.
+
+
+Standard Prefixes
+-----------------
+
+Just like how ``pytest`` automatically detects and runs any function or method leading with the keyphrase ``test_``, AirSpeed Velocity runs timing tests for anything prefixed with ``time_``, tracks peak memory via prefix ``peakmem_``, custom values, such as our functions for network traffic, with ``track_`` (this must return the value being tracked), and many others. Check out the full listing in the `primary AirSpeed Velocity documentation <https://asv.readthedocs.io/en/stable/index.html>`_.
+
+A single tracking function should perform only the minimal operations you wish to time. It is also capable of tracking only a single value. The philosophy for this is to avoid interference from cross-measurements; that is, the act of tracking memory of the operation may impact how much overall time it takes that process to complete, so you would not want to simultaneously measure both time and memory.
+
+
+Class Structure
+---------------
+
+A single benchmark suite is a file within the ``benchmarks`` folder. It contains one or more benchmark classes. It is not itself important that the word 'Benchmark' be in the name of the class; only the prefix on the function matters.
+
+The class has several attributes, the most important of which are ``round``, ``repeat``, and ``timeout``. All functions in a class can be repeated in round-robin fashion using ``round > 1``; the philsophy here is to 'average out' variation on the system over time and may not always be relevant to increase. Each function in a suite is repeated ``repeat`` number of times to get an estimate of the standard deviation of the operation. Every function in the suite has at most ``timout`` number of seconds to complete, otherwise it will count as a failure.
+
+Similar to ``unittest.TestCase`` classes, these have a ``setup`` and ``teardown`` method which call before and after execution of every ``round`` and every ``repeat`` for every tracking function (such as timing) in the class. ``setup`` should therefore be as light as possible since it will be repeated so often, though sometimes even a minimal setup can still take time (such as reading a large remote NWB file using a suboptimal method). In some cases, ``setup_cache`` is a method that can be defined, and runs only once per class to precompute some operation, such as the creation of a fake dataset for testing on local disk.
+
+.. note::
+
+    Be careful to assign objects fetched by operations within the tracking functions; otherwise, you may unintentionally track the garbage collection step triggered when the reference count of the return value reaches zero in the namespace. For relatively heavy I/O operations this can be non-negligible.
+
+Finally, you can leverage ``params`` and ``param_names`` to perform a structured iteration over many inputs to the operations. ``param_names`` is a list of length equal to the number of inputs you wish to pass to an operation. ``params`` is a list of lists; the outer list being of equal length to the number of inputs, and each inner list being equal in length to the number of different cases to iterate over.
+
+.. note::
+
+    This structure for ``params`` can be very inconvenient to specify; if you desire a helper function that would instead take a flat list of dictionaries to serve as keyword arguments for all the iteration cases, please request it on our issues board.
+
+For more advanced details, refer to the `primary AirSpeed Velocity documentation <https://asv.readthedocs.io/en/stable/index.html>`_.
diff --git a/src/nwb_benchmarks/command_line_interface.py b/src/nwb_benchmarks/command_line_interface.py
@@ -13,7 +13,7 @@
 )
 
 
-def main():
+def main() -> None:
     """Simple wrapper around `asv run` for convenience."""
     # TODO: swap to click
     if len(sys.argv) <= 1:
@@ -28,10 +28,19 @@ def main():
     if bench_mode:
         specific_benchmark_pattern = flags_list[flags_list.index("--bench") + 1]
 
-    if command == "run":
+    default_asv_machine_file_path = pathlib.Path.home() / ".asv-machine.json"
+    if command == "setup":
+        if default_asv_machine_file_path.exists():
+            ensure_machine_info_current(file_path=default_asv_machine_file_path)
+            return
+
+        process = subprocess.Popen(["asv", "machine", "--yes"], stdout=subprocess.PIPE)
+        process.wait()
+
+        customize_asv_machine_file(file_path=default_asv_machine_file_path)
+    elif command == "run":
         commit_hash = subprocess.check_output(["git", "rev-parse", "--short", "HEAD"]).decode("ascii").strip()
 
-        default_asv_machine_file_path = pathlib.Path.home() / ".asv-machine.json"
         if default_asv_machine_file_path.exists():
             ensure_machine_info_current(file_path=default_asv_machine_file_path)
         else:

diff --git a/src/nwb_benchmarks/setup/_configure_machine.py b/src/nwb_benchmarks/setup/_configure_machine.py
@@ -112,9 +112,6 @@ def ensure_machine_info_current(file_path: pathlib.Path):
     machine_info_from_file.pop("machine")
     machine_info_from_file.pop("custom")
 
-    with open(file=file_path.parent / "test.json", mode="w") as io:
-        json.dump(fp=io, obj=current_machine_info, indent=4)
-
     if machine_info_from_file == current_machine_info:
         return