Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scikit-learn compatible API #67

Open
koaning opened this issue Nov 11, 2022 · 4 comments
Open

scikit-learn compatible API #67

koaning opened this issue Nov 11, 2022 · 4 comments

Comments

@koaning
Copy link

koaning commented Nov 11, 2022

Is there a reason why the library doesn't offer a scikit-learn compatbile API? A class that can work via the fit_transform() API?

@akshayka
Copy link
Member

akshayka commented Dec 7, 2022

Hi! Thanks for raising this issue, and sorry for the delay in my response.

I'm happy to consider adding an API that's compatible with scikit-learn.

I'm assuming you're talking about scikit-learn's estimator and transform APIs (fit, transform, and fit_transform).

Off the top of my head:

We could have versions of preserve_neighbors and preserve_distances that implemented this API. That makes sense to me, because these functions take raw vector data and preprocess it (conceptually, fit). The transform method would actually compute the embedding.

Would that be helpful?

@koaning
Copy link
Author

koaning commented Dec 7, 2022

I'm assuming you're talking about scikit-learn's estimator and transform APIs (fit, transform, and fit_transform).

Yep! That's the one! I'm interested in such an API because it might help users in my bulk labelling interface.

In terms of implementation, maybe the neatest way is to add a class, maybe something like:

import pymde
from pymde import PyMDE

component = PyMDE(method="preserve_neighbors", constraint=pymde.Standardized())

If you want to go the extra mile, I may even go as far as having a constraint-parameter as a string and allowing keyword arguments to pass through. That way, if folks want to use GridSearchCV they can still get nice output. Strings/numbers work a bit better in summary tables than Python objects. But I think just having a scikit-learn compatible class, even if it's just using standard parameters, will also go a long way to have more people try out your library.

@koaning
Copy link
Author

koaning commented Dec 7, 2022

ps. I'm also a huge fan of cvxpy by the way!

@akshayka
Copy link
Member

akshayka commented Dec 9, 2022

Okay, great! I'd love for PyMDE to be useful for bulk, which looks awesome, by the way.

Thanks for the code snippet --- something like that could definitely work. I'll put something together in the coming weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants