-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-15940 Added kolmogorov-Smirnov statistic method to H2OBinomialModelMetrics #16353
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Shashank1202. Thanks for your contribution! Please add a test case to test the new functionality you added.
Hey @maurever . Thanks for your suggestion and I have added testing regarding this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model = H2OGradientBoostingEstimator(ntrees=1, gainslift_bins=5) | ||
model.train(x=["Origin", "Distance"], y="IsDepDelayed", training_frame=airlines) | ||
ks = model.kolmogorov_smirnov() | ||
ks = model.kolmogorov_smirnov(thresholds=[0.01, 0.5, 0.99]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to keep both cases:
- call the method without thresholds
- call the method with thresholds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @maurever
This was my first open-source contribution, and after your review, I have detailedly checked and made the necessary changes accordingly. Could you please go through it once more to ensure everything is in order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Shashank1202. I am happy you are trying to contribute to our open-source library. 👍
But your code is not working. I went through your code again and found out the goal is not to add thresholds as a parameter but to implement the kolmogorow_smirnov method on the performance object. So, we can call:
model.model_performance(data).kolmogorov_smirnov()
No thresholds are needed. The KS metric is calculated with different thresholds (same for gains lift) than other metrics such as AUC, etc.
…2OBinomialModelMetrics
…into add-ks-method
@@ -976,3 +976,26 @@ def thresholds_and_metric_scores(self): | |||
if 'thresholds_and_metric_scores' in self._metric_json: | |||
return self._metric_json['thresholds_and_metric_scores'] | |||
return None | |||
|
|||
def kolmogorov_smirnov(self, thresholds= None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def kolmogorov_smirnov(self, thresholds= None): | |
def kolmogorov_smirnov(self): |
... validation_frame = valid) | ||
>>> cars_gbm.kolmogorov_smirnov() | ||
""" | ||
return self.metric("ks", thresholds=thresholds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is something like this:
return self.metric("ks", thresholds=thresholds) | |
return max(self.gains_lift()["kolmogorov_smirnov"]) |
# Test with specific thresholds | ||
model = H2OGradientBoostingEstimator(ntrees=1, gainslift_bins=5) | ||
model.train(x=["Origin", "Distance"], y="IsDepDelayed", training_frame=airlines) | ||
ks = model.kolmogorov_smirnov(thresholds=[0.01, 0.5, 0.99]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried to run the test? This is not working even if you add the method to binomial.py. The goal is not change and test model.kolmogorov_smirnov()
but add and test model.model_performance(data).kolmogorov_smirnov()
ks = model.kolmogorov_smirnov(thresholds=[0.01, 0.5, 0.99]) |
This PR address issue #15940
Added a "kolmogorov_smirnov()" method to the model_performance object in the H2O library.
Previously, users could retrieve the Kolmogorov-Smirnov (KS) statistic using model.kolmogorov_smirnov(), which provided the value on the training sample or out-of-bag (OOB) estimate for DRF models.
This enhancement improves usability by providing a more straightforward way to compute the KS statistic on different datasets.