-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor measurements evaluations #351
Refactor measurements evaluations #351
Conversation
31b8f87
to
38ac87a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code wise this looks good.
I have just one question about the output. I see that there is:
and
I'm wondering if we could change the output so that the individual metric names are same in both metric measurement description and metric evaluation description, that is e.g:
or ideally (I'm aware that this would require metric name to human readable name translation):
It is probably out of scope of this MR and could be done separately. |
this should be simple so i'll do at least that in this PR |
41d45d8
to
5567455
Compare
5567455
to
ad36ad5
Compare
cf97742
to
d274332
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
added one more commit that makes the OvSDPDKPvPRecipe and VhostNetPvPRecipe use hostids consistent with the rest of the enrt recipes Tested the OvSDPDKRecipe in |
3a79273
to
7646480
Compare
This will be helpful for automation and easier implementation of classes that use the results - we can now write generic code that evaluates ANY result class by simply iterating the metrics to evaluate. Signed-off-by: Ondrej Lichtner <[email protected]>
…sults class This is a better place for the generic flow description generation, and is more consistent with the cpu measurement and cpu measurement results as well. Signed-off-by: Ondrej Lichtner <[email protected]>
Now that MeasurementResults export a `metrics` property, Baseline evaluation can be refactored and unified to work with almost any MeasurementResult in a much simpler manner. This commit does just that, additionally there are two additional related changes: * MetricComparison class is extended to contain more information so that we can unify it and remove the use of a "comparison dictionary" * BaselineEvaluationResult class is added to store all of the grouped MetricComparisons and to generate a nice description for them. This does change the Result description format as it was previously more customized to the individual MeasurementResults, but this IMO actually is more accurate although maybe a little less nice to look at. Finally this gives us a bunch more power to actually machine post process the baseline evaluations so I've also removed the "good direction -> improvement warning" override which IMO should be done as a post processing step and not directly as a part of the evaluation method. The unification of the BaselineEvaluator means that we can completely remove: * BaselineFlowAverageEvaluator * BaselineRDMABandwidthAverageEvaluator * BaselineTcRunAverageEvaluator We need to keep the BaselineCPUAverageEvaluator as this does still override some methods wrt. how filtering and result grouping works for CPU Measurements. Signed-off-by: Ondrej Lichtner <[email protected]>
This parameter logically doesn't fit into the generic upstream BaseEnrtRecipe class. The reason for this is that from the upstream point of view it doesn't actually do anything - it only recognized the "none" and "nonzero" values which... are special use cases and the implementation was a bit clunky. We could refactor this to make it make a bit more sense in the upstream and for it to be extensible - I think this should be an extensible Mixin class that handles evaluator registration for enrt perf measurements... But I think it doesn't have a valid use case in upstream yet so I'm removing it for now. We may want to revisit this later. Signed-off-by: Ondrej Lichtner <[email protected]>
Signed-off-by: Ondrej Lichtner <[email protected]>
Signed-off-by: Ondrej Lichtner <[email protected]>
These two tests are using hostids that are different than every other ENRT recipe for no reason. Making them consistent should make the data migration that we need to do for our measurement database easier. Signed-off-by: Ondrej Lichtner <[email protected]>
This function was written back in 2015 when we couldn't use python3 yet and statistics module was added in 3.4. At the same time, the formula used is "incorrect", this calculates the POPULATION std deviation... whereas we really should be using a SAMPLE standard deviation formula. Signed-off-by: Ondrej Lichtner <[email protected]>
7646480
to
3d07e4e
Compare
Description
This implements some much needed refactoring of MeasurementResults and the BaselineEvaluator and as such brings some reduction and unification of the relevant code.
Tests
Tested locally with some custom changes to use the "current result -> baseline result" and "threshold = 1" code adjustments with the following recipe:
After running this this is what the evaluation result descriptions look like:
Additionally if you remove the "cpu evaluation filters" this is what we get:
so an interleaving of "result.description" and "comparison.description"...