Releases: mayer79/missRanger
CRAN release 2.6.0
Major bug fix
Fixes a major bug, by which responses would be used as covariates in the random forests. Thanks for reporting @flystar233, see #78.
You can expect different and better imputations.
Major feature
Out-of-sample application is now possible! Thanks to @jeandigitale for pushing the idea in #58.
This means you can run imp <- missRanger(..., keep_forests = TRUE)
and then apply its models to new data via predict(imp, newdata)
. The "missRanger" object can be saved/loaded as binary file, e.g, via saveRDS()
/readRDS()
for later use.
Note that out-of-sample imputation works best for rows in newdata
with only one
missing value (counting only missings in variables used as covariates in random forests). We call this the "easy case". In the "hard case",
even multiple iterations (set by iter
) can lead to unsatisfactory results.
The out-of-sample algorithm works as follows:
- Impute univariately all relevant columns by randomly drawing values
from the original unimputed data. This step will only impact "hard case" rows. - Replace univariate imputations by predictions of random forests. This is done
sequentially over variables, where the variables are sorted to minimize the impact
of univariate imputations. Optionally, this is followed by predictive mean matching (PMM). - Repeat Step 2 for "hard case" rows multiple times.
Possibly breaking changes
- Columns of special type like date/time can't be imputed anymore. You will need to convert them to numeric before imputation.
pmm()
is more picky:xtrain
andxtest
must both be either numeric, logical, or factor (with identical levels).
Minor changes in output object
- Add original data as
data_raw
. - Renamed
visit_seq
toto_impute
.
Other changes
- Now requires ranger >= 0.16.0.
- More compact vignettes.
- Better examples and README.
- Many relevant
ranger()
arguments are now explicit arguments inmissRanger()
to improve tab-completion experience:- num.trees = 500
- mtry = NULL
- min.node.size = NULL
- min.bucket = NULL
- max.depth = NULL
- replace = TRUE
- sample.fraction = if (replace) 1 else 0.632
- case.weights = NULL
- num.threads = NULL
- save.memory = FALSE
- For variables that can't be used, more information is printed.
- If
keep_forests = TRUE
, the argumentdata_only
is set toFALSE
by default. - "missRanger" object now stores
pmm.k
. verbose
argument is passed toranger()
as well.
CRAN release 2.5.0
Bug fixes
- Since Release 2.3.0, unintentionally, negative formula terms haven't been dropped, see #62. This is fixed now.
Enhancements
- The vignette on multiple imputations has been revised, and a larger number of donors in predictive mean matching is being used in the example.
CRAN release 2.4.0
Future Output API
- New argument
data_only = TRUE
to control if only the imputed data should be returned (default), or an object of class "missRanger". This object contains the imputed data and infos like OOB prediction errors, fixing #28. The valueFALSE
will later becoming the default in {missRanger 3.0.0}. This will be announced via deprecation cycle.
Enhancements
- New argument
keep_forests = FALSE
. Should the random forests of the best iteration (the one that generated the final imputed data) be added to the "missRanger" object? Note that this will use a lot of memory. Only relevant ifdata_only = FALSE
. This solves #54.
Bug fixes
- In case the algorithm did not converge, the data of the last iteration was returned instead of the current one. This has been fixed.
CRAN release 2.3.0
Major improvements
missRanger()
now works with syntactically wrong variable names like "1bad:variable". This solves an old issue, recently popping up in this new issue.missRanger()
now works with any number of features, as long as the formula is left at its default, i.e.,. ~ .
. This solves this issue.
Other changes
- Documentation improvement.
ranger()
is now called via the x/y interface, not the formula interface anymore.
CRAN release 2.2.1
- Switch from
importFrom
to::
code style - Documentation improved
CRAN release 2.2.0
missRanger 2.2.0
Less dependencies
- Removed {mice} from "suggested" packages.
- Removed {dplyr} from "suggested" packages.
- Removed {survival} from "suggested" packages.
Maintenance
- Adding Github pages.
- Introduction of Github actions.
Release 2.1.5
A maintenance release, mainly improving the package structuring.
CRAN release 2.1.3
cran submission 2.1.3