-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GridSearch for Walking and Sampling Strategies #107
Comments
Hi Chris, Thank you for your suggestion, I agree that it would be a nice addition! However, it might be really difficult/expensive to run this, especially given the fact that some of these strategies have their own hyper-parameters as well. The hyper-parameters of the Embedding techniques (e.g. Word2Vec) might change per strategy as well (and even the hyper-parameters of the downstream ML model), and the possible combinations of walking + sampling strategies quickly grows. |
I see, thanks for that. I have have a few other questions/remarks/wishes:
Best regards |
Yes, we tried to support that through the Embedder interface. You can implement your own embedding model (e.g. the fast-text implementation we provide)
I agree that this indeed would be nice! However, how would one define most_similar? Cosine distance comes to mind as a metric, but perhaps other people would want other metrics? This should be made configurable. But nevertheless, it is indeed an interesting suggestion!
Do you mean plotting during the training procedure? That should be possible with gensim models indeed.
We don't have a optimal fix yet, however, you can specify the paths to extract this numerical information from the KG. You could then embed this numerical data to your embeddings before fitting a downstream ML model on it (or taking these numbers into account for your similarity search as well).
A mechanism for updating the model has been implemented, If you set
No, these are optional, but they might be useful to extract numerical information (per your comment above)
|
|
The feature requests are noted. I'll take a look if I ever find bandwidth :), any PRs are more than welcome as well of course. Literals are needed to use numerical information for your ML model. Word2Vec internally uses a BoW representation for its tokens. |
🚀 Feature
Hi guys, first of all, thanks for this amazing package, i love it.
While doing some stuff with it, i asked myself what the best walking / sampling strategy would be.
To evaluate that, two things would be missing:
GridSearch
kind of class to run all of the different options.Additional context
Solution
GridSearch Class a la sklearn would be awesome, is that possible?
The text was updated successfully, but these errors were encountered: