Wrong response type for given model parameters #15513
Unanswered
hasithjp
asked this question in
Technical Notes
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Motivation
This technical note was inspired by the following question from a Data Science user:
GBM with Bernoulli distribution
ERR: _distribution: Binomial requires the response to be a 2-class categorical....
any suggestion on this?
Discussion
Certain combinations of distributions/families and response types don't make sense together. For example, if you specify the model to be built using a "multinomial" distribution, then the response must be categorical (i.e, "cat", "dog", "mouse), and not numerical (i.e., -19...15).
All supervised H2O models (GLM/GBM/DRF/DL) require the user to specify a "correct" response type for the given set of parameters. If not explicit model type parameter is available, as for DRF, then the type of problem is inferred from the response type. For classification problems, for example, the response must be converted explicitly to a categorical column. Note that simply specifying an integer column with N unique values is not sufficient, as it could also be a numerical column with very few non-integer "outliers" that are absent in the training data (and during test time, there might be a new non-integer number, etc.). To avoid these kinds of issues, H2O uses the response column type as the highest-priority indicator for the problem type.
The only exception to this rule is GLM, which will accept any strictly binary response for the "binomial" family.
Example
JIRA Issue Migration Info
Jira Issue: TN-2
Assignee: Arno Candel
Reporter: Arno Candel
State: Resolved
Beta Was this translation helpful? Give feedback.
All reactions