You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should add a warning for the python/r method h2o.cor(), that tells users the method is only intended for numeric columns, if they try to pass a categorical column.
we should also add a runit/pyunit test to test what happens if a user passes a categorical. Right now it seems that we return NA for categorical columns with more than two levels.
{code}
library(h2o)
h2o.init()
create a categorical column called k2 with 5 levels and 20 values
We should add a warning for the python/r method h2o.cor(), that tells users the method is only intended for numeric columns, if they try to pass a categorical column.
we should also add a runit/pyunit test to test what happens if a user passes a categorical. Right now it seems that we return NA for categorical columns with more than two levels.
{code}
library(h2o)
h2o.init()
create a categorical column called k2 with 5 levels and 20 values
k2 = rep(c('her', 'him', 'cat', 'mouse', 'dog'),4)
create a categorical column with two levels
k = rep(c('her', 'him'),10)
#create a numeric column with 20 values
n <- 20
h <- runif(n)
see what happens if you try to calculate the correlation of a numeric with a binary categorical
h2o.cor(as.h2o(k),as.h2o(h))
0.07981525
see what happens when you try to calculate the correlation of a numeric with a multi-level categorical
h2o.cor(as.h2o(k2),as.h2o(h))
NA
{code}
The text was updated successfully, but these errors were encountered: