Skip to content

Commit

Permalink
start docs
Browse files Browse the repository at this point in the history
  • Loading branch information
CodyCBakerPhD committed Feb 18, 2024
1 parent 72cdfde commit 11a75ee
Show file tree
Hide file tree
Showing 9 changed files with 136 additions and 90 deletions.
41 changes: 1 addition & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,8 @@
# nwb_benchmarks

Benchmark suite for NWB performances using [airspeed velocity](https://asv.readthedocs.io/en/stable/).
Benchmark suite for NWB performances using a customization of [airspeed velocity](https://asv.readthedocs.io/en/stable/).

## Getting Started

To get started, clone this repo...

```
git clone https://github.com/neurodatawithoutborders/nwb_benchmarks.git
cd nwb_benchmarks
```

Setup the environment...

```
conda env create --file environments/nwb_benchmarks.yaml --no-default-packages
conda activate nwb_benchmarks
```

Configure tracking of our custom machine-dependent parameters by calling...

```
asv machine --yes
python src/nwb_benchmarks/setup/configure_machine.py
```

Please note that we do not currently distinguish any configurations based on your internet; as such there may be difference observed from the same machine in the results database if that machine is a laptop that runs the testing suite on a wide variety of internet qualities.

## Running Benchmarks

To run the full benchmark suite, please ensure you are not running any additional heavy processes in the background to avoid interference or bottlenecks, then execute the command...

```
nwb_benchmarks run
```

Many of the current tests can take several minutes to complete; the entire suite can take 10 or more minutes. Grab some coffee, read a book, or better yet (when the suite becomes larger) just leave it to run overnight.

To run only a single benchmark, use the `--bench <benchmark file stem or module+class+test function names>` flag.

To contribute your results back to the project, just be sure to `git add` and `commit` the results in the main `results` folder.

Note: Each result file should be single to double-digit KB in size; if we ever reach the point where this is prohibitive to store on GitHub itself, then we will investigate other upload strategies and purge the folder from the repository history.

## Building the documentation

Expand Down
27 changes: 17 additions & 10 deletions docs/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,20 @@ Development

This section covers advanced details of managing the operation of the AirSpeed Velocity testing suite.

- TODO: add section on environment matrices and current `python=same`
- TODO: add section on custom network packet tracking
- TODO: add section outlining the approach of the machine customization

.. Indices and tables
.. ==================
..
.. * :ref:`genindex`
.. * :ref:`modindex`
.. * :ref:`search`

Customized Machine Header
-------------------------


Customized Call to Run
----------------------


Customized Parsing of Results
-----------------------------


Network Tracking
----------------

Please contact Oliver Ruebel for details.
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
nwb_benchmarks
==============

This project is an effort to establish and understand, in a robust and reproducible manner, the principles underlying optimized file storage patterns for reading and writing NWB files from both local filesystems and the cloud (in particular, AWS S3).
This project is an effort to understand, in a robust and reproducible manner, the principles underlying optimized file storage patterns for reading and writing NWB files from both local filesystems and remotely from the cloud (in particular, AWS S3 buckets).

Funding is provided by NOSI ...

Expand All @@ -10,7 +10,7 @@ Funding is provided by NOSI ...
:caption: Contents

setup
using_asv
running_benchmarks
writing_benchmarks
development

Expand Down
52 changes: 52 additions & 0 deletions docs/running_benchmarks.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
Running the Benchmarks
======================

Before running the benchmark suite, please ensure you are not running any additional heavy processes in the background to avoid interference or bottlenecks.

To run the full benchmark suite, simply call...

.. code-block::
nwb_benchmarks run
Many of the current tests can take several minutes to complete; the entire suite will take many times that. Grab some coffee, read a book, or better yet (when the suite becomes larger) just leave it to run overnight.


Additional Flags
----------------

Subset of the Suite
~~~~~~~~~~~~~~~~~~~

To run only a single benchmark suite (a single file in the ``benchmarks`` directory), use the command...

.. code-block::
nwb_benchmarks run --bench <benchmark file stem or module+class+test function names>
For example,

.. code-block::
nwb_benchmarks run --bench time_remote_slicing
Debug mode
~~~~~~~~~~

If you want to get a full traceback to examine why a new test might be failing, simply add the flag...

.. code-block::
nwb_benchmarks run --debug
Contributing Results
--------------------

To contribute your results back to the project, all you have to do is `git add` and `commit` the results in the `results` folder.

Then, open a PR to merge the results to the `main` branch.

.. note::

Each result file should be single to double-digit KB in size; if we ever reach the point where this is prohibitive to store on GitHub itself, then we will investigate other upload strategies and purge the folder from the repository history.
27 changes: 19 additions & 8 deletions docs/setup.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,22 @@
Setup
=====

TODO: move from README

.. Indices and tables
.. ==================
..
.. * :ref:`genindex`
.. * :ref:`modindex`
.. * :ref:`search`
To get started, clone this repo...

.. code-block::
git clone https://github.com/neurodatawithoutborders/nwb_benchmarks.git
cd nwb_benchmarks
Setup a completely fresh environment...

.. code-block::
conda env create --file environments/nwb_benchmarks.yaml --no-default-packages
conda activate nwb_benchmarks
Setup initial machine configuration values with

.. code-block::
nwb_benchmarks setup
11 changes: 0 additions & 11 deletions docs/using_asv.rst

This file was deleted.

46 changes: 33 additions & 13 deletions docs/writing_benchmarks.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,36 @@
Writing Benchmarks
==================

Have an idea for how to speed up read or write from a local or remote NWB file? This section explains how to write your own benchmark to prove it robustly across platforms, architectures, and environments.

- TODO: cover standard prefixes
- TODO: cover standard setup values
- TODO: cover custom setup/teardown trackers
- TODO: cover params

.. Indices and tables
.. ==================
..
.. * :ref:`genindex`
.. * :ref:`modindex`
.. * :ref:`search`
Have an idea for how to speed up read or write from a local or remote NWB file?

This section explains how to write your own benchmark to prove it robustly across platforms, architectures, and environments.


Standard Prefixes
-----------------

Just like how ``pytest`` automatically detects and runs any function or method leading with the keyphrase ``test_``, AirSpeed Velocity runs timing tests for anything prefixed with ``time_``, tracks peak memory via prefix ``peakmem_``, custom values, such as our functions for network traffic, with ``track_`` (this must return the value being tracked), and many others. Check out the full listing in the `primary AirSpeed Velocity documentation <https://asv.readthedocs.io/en/stable/index.html>`_.

A single tracking function should perform only the minimal operations you wish to time. It is also capable of tracking only a single value. The philosophy for this is to avoid interference from cross-measurements; that is, the act of tracking memory of the operation may impact how much overall time it takes that process to complete, so you would not want to simultaneously measure both time and memory.


Class Structure
---------------

A single benchmark suite is a file within the ``benchmarks`` folder. It contains one or more benchmark classes. It is not itself important that the word 'Benchmark' be in the name of the class; only the prefix on the function matters.

The class has several attributes, the most important of which are ``round``, ``repeat``, and ``timeout``. All functions in a class can be repeated in round-robin fashion using ``round > 1``; the philsophy here is to 'average out' variation on the system over time and may not always be relevant to increase. Each function in a suite is repeated ``repeat`` number of times to get an estimate of the standard deviation of the operation. Every function in the suite has at most ``timout`` number of seconds to complete, otherwise it will count as a failure.

Similar to ``unittest.TestCase`` classes, these have a ``setup`` and ``teardown`` method which call before and after execution of every ``round`` and every ``repeat`` for every tracking function (such as timing) in the class. ``setup`` should therefore be as light as possible since it will be repeated so often, though sometimes even a minimal setup can still take time (such as reading a large remote NWB file using a suboptimal method). In some cases, ``setup_cache`` is a method that can be defined, and runs only once per class to precompute some operation, such as the creation of a fake dataset for testing on local disk.

.. note::

Be careful to assign objects fetched by operations within the tracking functions; otherwise, you may unintentionally track the garbage collection step triggered when the reference count of the return value reaches zero in the namespace. For relatively heavy I/O operations this can be non-negligible.

Finally, you can leverage ``params`` and ``param_names`` to perform a structured iteration over many inputs to the operations. ``param_names`` is a list of length equal to the number of inputs you wish to pass to an operation. ``params`` is a list of lists; the outer list being of equal length to the number of inputs, and each inner list being equal in length to the number of different cases to iterate over.

.. note::

This structure for ``params`` can be very inconvenient to specify; if you desire a helper function that would instead take a flat list of dictionaries to serve as keyword arguments for all the iteration cases, please request it on our issues board.

For more advanced details, refer to the `primary AirSpeed Velocity documentation <https://asv.readthedocs.io/en/stable/index.html>`_.
15 changes: 12 additions & 3 deletions src/nwb_benchmarks/command_line_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
)


def main():
def main() -> None:
"""Simple wrapper around `asv run` for convenience."""
# TODO: swap to click
if len(sys.argv) <= 1:
Expand All @@ -28,10 +28,19 @@ def main():
if bench_mode:
specific_benchmark_pattern = flags_list[flags_list.index("--bench") + 1]

if command == "run":
default_asv_machine_file_path = pathlib.Path.home() / ".asv-machine.json"
if command == "setup":
if default_asv_machine_file_path.exists():
ensure_machine_info_current(file_path=default_asv_machine_file_path)
return

process = subprocess.Popen(["asv", "machine", "--yes"], stdout=subprocess.PIPE)
process.wait()

customize_asv_machine_file(file_path=default_asv_machine_file_path)
elif command == "run":
commit_hash = subprocess.check_output(["git", "rev-parse", "--short", "HEAD"]).decode("ascii").strip()

default_asv_machine_file_path = pathlib.Path.home() / ".asv-machine.json"
if default_asv_machine_file_path.exists():
ensure_machine_info_current(file_path=default_asv_machine_file_path)
else:
Expand Down
3 changes: 0 additions & 3 deletions src/nwb_benchmarks/setup/_configure_machine.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,6 @@ def ensure_machine_info_current(file_path: pathlib.Path):
machine_info_from_file.pop("machine")
machine_info_from_file.pop("custom")

with open(file=file_path.parent / "test.json", mode="w") as io:
json.dump(fp=io, obj=current_machine_info, indent=4)

if machine_info_from_file == current_machine_info:
return

Expand Down

0 comments on commit 11a75ee

Please sign in to comment.