Skip to content

Commit

Permalink
Fix #24 add docs for network tracking
Browse files Browse the repository at this point in the history
  • Loading branch information
oruebel committed Mar 5, 2024
1 parent fcba83f commit 83fa7bf
Show file tree
Hide file tree
Showing 4 changed files with 72 additions and 6 deletions.
5 changes: 1 addition & 4 deletions docs/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,4 @@ which is also indented for improved human readability and line-by-line GitHub tr
If this ``results`` folder eventually becomes too large for Git to reasonably handle, we will explore options to share via other data storage services.


Network Tracking
----------------

Stay tuned https://github.com/NeurodataWithoutBorders/nwb_benchmarks/issues/24
.. include:: network_tracking.rst
24 changes: 24 additions & 0 deletions docs/network_tracking.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
.. _network-tracking:

Network Tracking
----------------

The network tracking is implemented as part of the `nwb_benchmarks.core` module and consists of the following main components:

* ``CaptureConnections`` : This class uses the ``psutils`` library to capture network connections and map the connections to process IDs (PIDs). This information is then used downstream to allow filtering of network traffic packets by PID to allow us to distinguish between network traffic generated by us versus other processes running on the same system. See `core/_capture_connections.py <https://github.com/NeurodataWithoutBorders/nwb_benchmarks/blob/main/src/nwb_benchmarks/core/_capture_connections.py>`_
* ``NetworkProfiler`` : This class uses the ``tshark`` command line tool (and ``pyshark`` package) to capture the network traffic (packets) generated by all processes on the system. In combination with `CaptureConnections` we can then filter the captured packets to retrieve the packets generated by a particular PID via the ``get_packets_for_connections`` function. See `core/_network_profiler.py <https://github.com/NeurodataWithoutBorders/nwb_benchmarks/blob/main/src/nwb_benchmarks/core/_network_profiler.py>`_
* ``NetworkStatistics`` : This class provides functions for processing the network packets captured by the ``NetworkProfiler`` to compute basic network statistics, such as, the number of packets send/received or the size of the data up/downloaded. The ``get_statistics`` function provides a convenient method to retrieve all the metrics via a single function call. See `core/_network_statistics.py <https://github.com/NeurodataWithoutBorders/nwb_benchmarks/blob/main/src/nwb_benchmarks/core/_network_statistics.py>`_
* ``NetworkTracker`` and ``network_activity_tracker`` : The ``NetworkTracker`` class, and corresponding ``network_activity_tracker`` context manager, built on the functionality implemented in the above modules to make it easy to track and compute network statistics for a given time during the execution of a code.

.. note::

``CaptureConnections`` uses `psutil.net_connections() <https://psutil.readthedocs.io/en/latest/#psutil.net_connections>`_, which requires sudo/root access on macOS and AIX.

.. note::

Running the network tracking generates additional threads/processes in order to capture traffic while the main code is running: **1)** ``NetworkProfiler.start_capture`` generates a ``subprocess`` for running the `tshark` command line tool, which is then being terminated when ``NetworkProfiler.stop_capture`` is called and **2)** ``CaptureConnections`` implements a ``Thread`` that is being run in the background. The ``NetworkTracker`` automatically starts and terminates these processs/threads, so a user typically does not need to manage these processes/threads directly.

Typical usage
^^^^^^^^^^^^^

In most cases, users will use the ``NetworkTracker`` or ``network_activity_tracker`` to track network traffic and statistics as illustrated in :ref:`network-tracking-benchmarks`.
4 changes: 3 additions & 1 deletion docs/running_benchmarks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ use `psutil net_connections <https://psutil.readthedocs.io/en/latest/#psutil.net
sudo nwb_benchmarks run
Or drop the ``sudo`` if on Windows. Running on Windows may also require you to set the ``TSHARK_PATH`` environment variable beforehand, which should be the absolute path to the ``tshark.exe`` on your system.
Or drop the ``sudo`` if on Windows.

When running on Windows or if ``tshark`` is not installed on the path, then may also need to set the ``TSHARK_PATH`` environment variable beforehand, which should be the absolute path to the ``tshark`` executable (e.g., ``tshark.exe``) on your system.

Many of the current tests can take several minutes to complete; the entire suite will take many times that. Grab some coffee, read a book, or better yet (when the suite becomes larger) just leave it to run overnight.

Expand Down
45 changes: 44 additions & 1 deletion docs/writing_benchmarks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -116,4 +116,47 @@ Notice how the ``read_hdf5_nwbfile_remfile`` function (which reads an HDF5-backe
nwbfile = io.read()
return (nwbfile, io, file, byte_stream)
and so we managed to save ~5 lines of code for every occurence of this logic in the benchmarks. Good choices of function names are critical to effectively communicating the actions being undertaken. Thorough annotation of signatures is likewise critical to understanding input/output relationships for these functions.
and so we managed to save ~5 lines of code for every occurrence of this logic in the benchmarks. Good choices of function names are critical to effectively communicating the actions being undertaken. Thorough annotation of signatures is likewise critical to understanding input/output relationships for these functions.


.. _network-tracking-benchmarks:


Writing a network tracking benchmark
------------------------------------

Functions that require network access ---such as reading a file from S3--- are often a black box, with functions in other libraries (e.g., `h5py`, `fsspec` etc.) managing the access to the remote resources. The runtime performance of such functions is often inherently driven by how these functions utilize the network to access the resources. It is, hence, important that we can profile the network traffic that is being generated to better understand, e.g., the amount of data that is being downloaded and uploaded, the number of requests that are being sent/received, and others.

To simplify the implementation of benchmarks for tracking network statistics, we implemented in the `nwb_benchmarks.core` module various helper classes and functions. The network tracking functionality is designed to track the network traffic generated by the main Python process that our tests are running during a user-defined period of time. The `network_activity_tracker` context manager can be used to track the network traffic generated by the code within the context. A basic network benchmark, then looks as follows:

.. code-block:: python
from nwb_benchmarks import TSHARK_PATH
from nwb_benchmarks.core import network_activity_tracker
import requests # Only used here for illustration purposes
class SimpleNetworkBenchmark:
def track_network_activity_uri_request():
with network_activity_tracker(tshark_path=TSHARK_PATH) as network_tracker:
x = requests.get('https://nwb-benchmarks.readthedocs.io/en/latest/setup.html')
return network_tracker.asv_network_statistics
In cases where a context manager may not be sufficient, we can alternatively use the `NetworkTracker` class directly to explicitly control when to start and stop the tracking.

.. code-block:: python
from nwb_benchmarks import TSHARK_PATH
from nwb_benchmarks.core import NetworkTracker
import requests # Only used here for illustration purposes
class SimpleNetworkBenchmark:
def track_network_activity_uri_request():
tracker = NetworkTracker()
tracker.start_network_capture(tshark_path=TSHARK_PATH)
x = requests.get('https://nwb-benchmarks.readthedocs.io/en/latest/setup.html')
tracker.stop_network_capture()
return tracker.asv_network_statistics
By default, the `NetworkTracker` and `network_activity_tracker` track the network activity of the current process ID (i.e., ``os.getpid()``), but the PID to track can also be set explicitly if a different process needs to be monitored.

0 comments on commit 83fa7bf

Please sign in to comment.