Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where should I add ILOGMR sklearn compatible? #50

Open
jgrizou opened this issue Oct 27, 2015 · 5 comments
Open

Where should I add ILOGMR sklearn compatible? #50

jgrizou opened this issue Oct 27, 2015 · 5 comments

Comments

@jgrizou
Copy link
Member

jgrizou commented Oct 27, 2015

Hi guys,

I started using ILOGMR to compare it with various regression algortihm in sci-kit learn. For this I had to create a class wrapping the ilo_gmm from explauto to an ILOGMR estimator in sklearn.

Should I had this file in the explauto repo? If yes, where should it land?

Currently the class is as follow:

import numpy

from sklearn.base import BaseEstimator

from explauto.sensorimotor_model import ilo_gmm


class ILOGMR(BaseEstimator):

    def __init__(self, conf, n_components=3, n_neighbors=100, random_state=None):
        self.conf = conf
        self.explauto_ilo_gmm = ilo_gmm.IloGmm(conf, n_components)
        self.explauto_ilo_gmm.n_neighbors = n_neighbors
        self.random_state = random_state

    def fit(self, X, y):
        self.explauto_ilo_gmm.dataset.reset()
        for n in range(X.shape[0]):
            self.explauto_ilo_gmm.update(X[n, :], y[n, :])

    def compute_conditional_gmm(self, x):
        return self.explauto_ilo_gmm.compute_conditional_gmm(
            self.conf.m_dims, self.conf.s_dims, x)

    def predict(self, X):
        y_pred = numpy.zeros((X.shape[0], len(self.explauto_ilo_gmm.s_dims)))
        for n in range(X.shape[0]):
            gmm = self.compute_conditional_gmm(X[n, :])
            y_pred[n, :] = numpy.sum(gmm.means_.T * gmm.weights_)
        return y_pred

This use the new interface for ilo_gmm.py that is currently under pull request #49

It still lacks the set_param, get_parm method so we can use it with the really convenient GridSearchCV but it will come at some point.

@jgrizou
Copy link
Member Author

jgrizou commented Oct 27, 2015

To give you an idea, here is a comparison with Support Vector Regression (SVR) on a simple problem.

ilogmr_sinus_predict

Note that prediction with ILOGMR is always a bit tricky because it outputs a GMM. For the best guess, I choosed to use the best guess definition from the original paper: numpy.sum(gmm.means_.T * gmm.weights_).

Note that the variance of the data point increase with x. Whereas SVR can not model this variance, ILOGMR does it nicely. We can see it on the probablility map below, whose probablity distribution is smoother when x increases.

ilogmr_sinus_probability

The figure above should be read per column. For each x, we infer the conditonal probability of the local GMM model, we then plot the probability associated with each y.

@clement-moulin-frier
Copy link
Contributor

Nice job as well :)

I don't have a clear opinion on this but here is one:

ilo-gmm (including you wrapper) could actually be in the models package (instead of sensorimotor_model now). models is supposed to regroup various algorithms that can be used in other packages.

If we do this, what will be in sensorimotor_model would be only the wrapper of ilo-gmm as a SensorimotorModel subclass (without the actual implementation of the algorithm, which would be in models).

What do you think?

@jgrizou
Copy link
Member Author

jgrizou commented Oct 30, 2015

ilo-gmm (including you wrapper) could actually be in the models package (instead of sensorimotor_model now). models is supposed to regroup various algorithms that can be used in other packages.

If we do this, what will be in sensorimotor_model would be only the wrapper of ilo-gmm as a SensorimotorModel subclass (without the actual implementation of the algorithm, which would be in models).

Yes I think that would be best. Having the part in models compatible with sklearn would be really nice to compare with the various algorithms of sci-kit learn using all their tools.

About the naming, I tend to think it should be called ILOGMR rather than ILOGMM but I am not sure. This is again to be in line with the sklearn interface. We are more likely to use the algortihm for regression, all regressor in sklearn have a predict function that returns a Y given a X. In sklearn GMM is rather seen (i.e. as the interface of) a classifier. Which is that GMM.predict(X) returns a class, i.e. the id of the Gaussian the sample X "belongs" to, and not a predicted Y. Well the naming is not that important.

Also, do you think it would be useful to rely on the NearestNeighbors tools from sklearn? Other than self.dataset.nn_x and self.dataset.nn_y, do you use other functionnalites from the Dataset class?

Let's see when we find time.

@clement-moulin-frier
Copy link
Contributor

Regarding GMM vs GMR, I think the approach taken in Explauto is more general than the one in sklearn. sklearn focuses on regression and classification (input -> output), whereas Explauto focuses on general inference, ie. both X->Y (predication) and Y->X (inverse model), and more generally any A->B inference, where A and B are disjoint subsets of X^Y (see our ICDL2014 abstract). Since GMM are particularly well suited for general inference, I would keep GMM instead of GMR (but why note coding a restricted class for sklearn compatibility).

Regarding NN, we actually rely on code of the models library from @humm (Fabien). One drawback of it (but I think it is also the case of the sklearn implementation), is that it recomputes the whole model (a kd-tree) each time you insert a new point. An online version of approximate NNs would be awesome (and probably exists somewhere).

Unfortunately it will be complicated for me to find time to work on all of this, but I'm happy to provide all the necessary information when needed.

@oudeyer
Copy link
Member

oudeyer commented Nov 6, 2015

Hi, I think indeed keeping GMM and general inference is a good thing, and why not make a specific class for GMR.

For incremental kd-trees, one interesting thread: http://stackoverflow.com/questions/4274218/incremental-nearest-neighbor-algorithm-in-python
and a fast implementation in C with python binding: http://www.cs.ubc.ca/research/flann/

Le 6 nov. 2015 à 16:15, Clément Moulin-Frier [email protected] a écrit :

Regarding GMM vs GMR, I think the approach taken in Explauto is more general than the one in sklearn. sklearn focuses on regression and classification (input -> output), whereas Explauto focuses on general inference, ie. both X->Y (predication) and Y->X (inverse model), and more generally any A->B inference, where A and B are disjoint subsets of X^Y (see our ICDL2014 abstract). Since GMM are particularly well suited for general inference, I would keep GMM instead of GMR (but why note coding a restricted class for sklearn compatibility).

Regarding NN, we actually rely on code of the models library from @humm (Fabien). One drawback of it (but I think it is also the case of the sklearn implementation), is that it recomputes the whole model (a kd-tree) each time you insert a new point. An online version of approximate NNs would be awesome (and probably exists somewhere).

Unfortunately it will be complicated for me to find time to work on all of this, but I'm happy to provide all the necessary information when needed.


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants