Skip to content

Releases: mayer79/missRanger

CRAN release 2.6.0

17 Aug 16:07
f3cd709
Compare
Choose a tag to compare

Major bug fix

Fixes a major bug, by which responses would be used as covariates in the random forests. Thanks for reporting @flystar233, see #78.
You can expect different and better imputations.

Major feature

Out-of-sample application is now possible! Thanks to @jeandigitale for pushing the idea in #58.

This means you can run imp <- missRanger(..., keep_forests = TRUE) and then apply its models to new data via predict(imp, newdata). The "missRanger" object can be saved/loaded as binary file, e.g, via saveRDS()/readRDS() for later use.

Note that out-of-sample imputation works best for rows in newdata with only one
missing value (counting only missings in variables used as covariates in random forests). We call this the "easy case". In the "hard case",
even multiple iterations (set by iter) can lead to unsatisfactory results.

The out-of-sample algorithm works as follows:

  1. Impute univariately all relevant columns by randomly drawing values
    from the original unimputed data. This step will only impact "hard case" rows.
  2. Replace univariate imputations by predictions of random forests. This is done
    sequentially over variables, where the variables are sorted to minimize the impact
    of univariate imputations. Optionally, this is followed by predictive mean matching (PMM).
  3. Repeat Step 2 for "hard case" rows multiple times.

Possibly breaking changes

  • Columns of special type like date/time can't be imputed anymore. You will need to convert them to numeric before imputation.
  • pmm() is more picky: xtrain and xtest must both be either numeric, logical, or factor (with identical levels).

Minor changes in output object

  • Add original data as data_raw.
  • Renamed visit_seq to to_impute.

Other changes

  • Now requires ranger >= 0.16.0.
  • More compact vignettes.
  • Better examples and README.
  • Many relevant ranger() arguments are now explicit arguments in missRanger() to improve tab-completion experience:
    • num.trees = 500
    • mtry = NULL
    • min.node.size = NULL
    • min.bucket = NULL
    • max.depth = NULL
    • replace = TRUE
    • sample.fraction = if (replace) 1 else 0.632
    • case.weights = NULL
    • num.threads = NULL
    • save.memory = FALSE
  • For variables that can't be used, more information is printed.
  • If keep_forests = TRUE, the argument data_only is set to FALSE by default.
  • "missRanger" object now stores pmm.k.
  • verbose argument is passed to ranger() as well.

CRAN release 2.5.0

13 Jul 07:30
20247e3
Compare
Choose a tag to compare

Bug fixes

  • Since Release 2.3.0, unintentionally, negative formula terms haven't been dropped, see #62. This is fixed now.

Enhancements

  • The vignette on multiple imputations has been revised, and a larger number of donors in predictive mean matching is being used in the example.

CRAN release 2.4.0

19 Nov 09:56
fa1775c
Compare
Choose a tag to compare

Future Output API

  • New argument data_only = TRUE to control if only the imputed data should be returned (default), or an object of class "missRanger". This object contains the imputed data and infos like OOB prediction errors, fixing #28. The value FALSE will later becoming the default in {missRanger 3.0.0}. This will be announced via deprecation cycle.

Enhancements

  • New argument keep_forests = FALSE. Should the random forests of the best iteration (the one that generated the final imputed data) be added to the "missRanger" object? Note that this will use a lot of memory. Only relevant if data_only = FALSE. This solves #54.

Bug fixes

  • In case the algorithm did not converge, the data of the last iteration was returned instead of the current one. This has been fixed.

CRAN release 2.3.0

20 Oct 19:32
ac24bae
Compare
Choose a tag to compare

Major improvements

  • missRanger() now works with syntactically wrong variable names like "1bad:variable". This solves an old issue, recently popping up in this new issue.
  • missRanger() now works with any number of features, as long as the formula is left at its default, i.e., . ~ .. This solves this issue.

Other changes

  • Documentation improvement.
  • ranger() is now called via the x/y interface, not the formula interface anymore.

CRAN release 2.2.1

28 Apr 16:48
300e9fd
Compare
Choose a tag to compare
  • Switch from importFrom to :: code style
  • Documentation improved

CRAN release 2.2.0

25 Mar 08:46
382bb46
Compare
Choose a tag to compare

missRanger 2.2.0

Less dependencies

  • Removed {mice} from "suggested" packages.
  • Removed {dplyr} from "suggested" packages.
  • Removed {survival} from "suggested" packages.

Maintenance

  • Adding Github pages.
  • Introduction of Github actions.

Release 2.1.5

29 Jan 11:44
e65a394
Compare
Choose a tag to compare

A maintenance release, mainly improving the package structuring.

CRAN release 2.1.3

10 Apr 14:05
Compare
Choose a tag to compare
cran submission 2.1.3