README

PanCake is a Python package that allows users to stack scikit-learn models over a number of folds and train stacker models using out-of-sample predictions of input models.

The stacking tool provides the construction of a stacking module composed of in-layer (models being stacked) and out-layer (stacker models) models. The output is a list or matrix of predictions from training of the module, which can either be used as the final results, or fed into a different module.

Installation

After cloning the repository, install from the directory of the package by

pip install .

Usage

Initiating stacker

stacker = Stacker(X, y, splitter, evalMetric, family)

where X is the data matrix (numpy array), y is target vector (numpy array), splitter is a scikit-learn cross-validation generator (KFold or StratifiedKFold), evalMetric is the metric to be maximized during training, and family is the type of the problem (currently "regression" or "binary").

Adding models (in-layer):

Add a scikit-learn model modelObj to in-layer by

stacker.addModelIn(modelObj, trainable, hyperParameters)

If trainable is set to True then the model will be trained across folds using the hyperParameters which is a dictionary of hyper-parameter grid for the model (check scikit-learn's documentation for the model). If it is set to False then the model is assumed fixed and is only fitted across folds.

Adding stacker models (out-layer):

Add a scikit-learn model modelObj to out-layer by

stacker.addModelOut(modelObj, hyperParameters)

Again, hyperParameters is a dictionary containig the grid of hyper-parameters for the model.

Training and Predictions:

To train the model and get predictions on the training data, use

predsTrain = stacker.stackTrain(matrixOut)

which yields final predictions for each out-layer model as a list when matrixOut is set to False. When it is set to True, predictions for each out-model is appended as column vectors is a an array.

For predictions on the test set, use:

predsTest = stacker.stackTest(X_ts, matrixOut)

where X_ts is the test data and matrixOut is the same as above.

Summary, Saving and Loading:

To get a summary on CV scores, fit and training times for each in-layer and out-layer model, use

stacker.summary()

To save the trained stacker for later use, call

saveModel(stacker, savePath)

To load a trained model from disk, call

stacker = loadModel(savePath)

Examples

Jupyter notebooks analyzing the Boston Housing data is included in the repo:

TODO

Multi-class classification problems
Parallelization at the model and/or hyper-parameter level

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Pancake		Pancake
docs		docs
examples		examples
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Installation

Usage

Initiating stacker

Adding models (in-layer):

Adding stacker models (out-layer):

Training and Predictions:

Summary, Saving and Loading:

Examples

TODO

About

Releases

Packages

Contributors 2

Languages

License

bhimmetoglu/Pancake

Folders and files

Latest commit

History

Repository files navigation

README

Installation

Usage

Initiating stacker

Adding models (in-layer):

Adding stacker models (out-layer):

Training and Predictions:

Summary, Saving and Loading:

Examples

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages