-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where should I add ILOGMR sklearn compatible? #50
Comments
To give you an idea, here is a comparison with Support Vector Regression (SVR) on a simple problem. Note that prediction with ILOGMR is always a bit tricky because it outputs a GMM. For the best guess, I choosed to use the best guess definition from the original paper: Note that the variance of the data point increase with x. Whereas SVR can not model this variance, ILOGMR does it nicely. We can see it on the probablility map below, whose probablity distribution is smoother when x increases. The figure above should be read per column. For each x, we infer the conditonal probability of the local GMM model, we then plot the probability associated with each y. |
Nice job as well :) I don't have a clear opinion on this but here is one: ilo-gmm (including you wrapper) could actually be in the If we do this, what will be in What do you think? |
Yes I think that would be best. Having the part in About the naming, I tend to think it should be called ILOGMR rather than ILOGMM but I am not sure. This is again to be in line with the sklearn interface. We are more likely to use the algortihm for regression, all regressor in sklearn have a predict function that returns a Y given a X. In sklearn GMM is rather seen (i.e. as the interface of) a classifier. Which is that GMM.predict(X) returns a class, i.e. the id of the Gaussian the sample X "belongs" to, and not a predicted Y. Well the naming is not that important. Also, do you think it would be useful to rely on the NearestNeighbors tools from sklearn? Other than Let's see when we find time. |
Regarding GMM vs GMR, I think the approach taken in Explauto is more general than the one in sklearn. sklearn focuses on regression and classification (input -> output), whereas Explauto focuses on general inference, ie. both X->Y (predication) and Y->X (inverse model), and more generally any A->B inference, where A and B are disjoint subsets of X^Y (see our ICDL2014 abstract). Since GMM are particularly well suited for general inference, I would keep GMM instead of GMR (but why note coding a restricted class for sklearn compatibility). Regarding NN, we actually rely on code of the models library from @humm (Fabien). One drawback of it (but I think it is also the case of the sklearn implementation), is that it recomputes the whole model (a kd-tree) each time you insert a new point. An online version of approximate NNs would be awesome (and probably exists somewhere). Unfortunately it will be complicated for me to find time to work on all of this, but I'm happy to provide all the necessary information when needed. |
Hi, I think indeed keeping GMM and general inference is a good thing, and why not make a specific class for GMR. For incremental kd-trees, one interesting thread: http://stackoverflow.com/questions/4274218/incremental-nearest-neighbor-algorithm-in-python Le 6 nov. 2015 à 16:15, Clément Moulin-Frier [email protected] a écrit :
|
Hi guys,
I started using ILOGMR to compare it with various regression algortihm in sci-kit learn. For this I had to create a class wrapping the ilo_gmm from explauto to an ILOGMR estimator in sklearn.
Should I had this file in the explauto repo? If yes, where should it land?
Currently the class is as follow:
This use the new interface for ilo_gmm.py that is currently under pull request #49
It still lacks the set_param, get_parm method so we can use it with the really convenient GridSearchCV but it will come at some point.
The text was updated successfully, but these errors were encountered: