You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of this long-term goal, we want to elaborate a benchmarking tool that, given two Consensus versions, it compares the cost of performing the 5 main ledger operations between said versions. These 5 ledger operations are:
Forecast.
Header tick.
Header application.
Block tick.
Block application.
These operations combined constitute the bulk of the time used for block adoption.
We want this tool to be usable in the development process.
Motivation
We want to provide a means for the Consensus and Ledger developers, as well as the release engineers, to be able to spot performance regressions early on.
Definition of done
Produce a tool that allows to compare the cost of the main ledger operations across two Consensus versions. This comparison can be carried out by inspecting the following artefacts produced by the tool. No automation in the detection of performance regression is required.
The tool should:
Allow to specify the versions of Consensus to compare.
Allow to specify the GHC to build a given Consensus version (to be compared)
Allow to specify the RTS options to run db-analyser.
Produce a plot per ledger operation, which shows the execution time of both versions (see this example).
TODO: Produce a report/table with <which values?> and <which format?>.
Make each report traceable by storing data like "build information".
Be properly documented so that other developers can use it.
Yield results that are consistent with the system-level benchmarks.
Additionally, we should:
Provide the developers with infrastructure (eg AWS instances) and data that they can use to run the benchmark comparison tool.
As future steps, we could consider running these benchmarks on CI, if that adds value.
We’ve developed the existing prototype into an automatable, self-contained benchmark called beacon, as well as systematized workloads and run structure for it. Moreover, we’ve demonstrated the usefulness of the metrics and their reproducibility and identified domains that are viable for QTAs with system-level benchmarks.
#161 created a tool for comparing benchamrks. We can use that as a starting point. Additional improvements to this tool include (in no particular order):
Make analyseFromSlot and numBlocksToProcess optional.
Add support for command line argument parsing.
Replace A and B in the plot title with the names of versions A and B.
Render output data in a more legible format (eg markdown).
Round benchmarking metrics to two or three decimals.
Compute the distance between the metrics vectors (per each data point).
Perform statistical analysis of the outliers detected during the first benchmarking pass.
Background
As part of this long-term goal, we want to elaborate a benchmarking tool that, given two Consensus versions, it compares the cost of performing the 5 main ledger operations between said versions. These 5 ledger operations are:
These operations combined constitute the bulk of the time used for block adoption.
We want this tool to be usable in the development process.
Motivation
We want to provide a means for the Consensus and Ledger developers, as well as the release engineers, to be able to spot performance regressions early on.
Definition of done
Produce a tool that allows to compare the cost of the main ledger operations across two Consensus versions. This comparison can be carried out by inspecting the following artefacts produced by the tool. No automation in the detection of performance regression is required.
The tool should:
db-analyser
.Additionally, we should:
As future steps, we could consider running these benchmarks on CI, if that adds value.
Subtasks
db-analyser
ledger ops result. #223beacon
, as well as systematized workloads and run structure for it. Moreover, we’ve demonstrated the usefulness of the metrics and their reproducibility and identified domains that are viable for QTAs with system-level benchmarks.SlotDataPoint
andMetadata
. #918beacon
upgrade #919The text was updated successfully, but these errors were encountered: