-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working Workflow Example #36
base: master
Are you sure you want to change the base?
Conversation
…ded customizable CUR utility intruction
…ative to uncentered train kernel
…ative to uncentered train kernel
Commiting to rebase Edits from Felix Edits from Felix
Edits from Felix
W --> PTY or PKY in NB4 W --> PTY or PKY in NB4
Replace utility classes with sklearn classes in NB2
Removed all references to the utility classes and referenced the appr…
Co-authored-by: rosecers <[email protected]>
Replace utility classes with sklearn classes in NB4
Replace utility classes with sklearn classes in NB3
Some notes: - because this is living in the kernel-tutorials repo, I expanded the sections on methods covered _in_ kernel tutorials, but left the methods not covered in kernel-tutorials somewhat cursory. In other words, cells for librascal and chemiscope are somewhat succinct, although this can be brought up for debate. - I've left the notebook unstripped so that you don't have to run it to see the output. We can strip upon merge. - There is the resulting chemiscope file for viewing under `notebooks/kpcovr-chemiscope.json`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mostly small comments, as well as a higher level one.
This notebook is very good as a "you have done all of kernel-tutorials, now here is a end-to-end example of how you could use it" notebook, which is a different use case from the "here is how you get from a structure to a chemiscope input" which I had in mind for lab-cosmo/chemiscope#114.
In particular, the use of (PCov-)FPS, (PCov-)CUR, and sparse methods is adding a lot of complexity for people that might just want to try out chemiscope quickly. The notebook also assumes knowledge of SOAP and descriptors, and do not introduce the reason behind dimensionality reduction.
I think the question here is how much do we want to couple this to the rest of kernels-tutorials/skcosmo. Should someone be able to read/use this notebook without reading the rest of kernel-tutorials?
"from sklearn.linear_model import RidgeCV\n", | ||
"\n", | ||
"\n", | ||
"def calc_rmse(y1, y2, y_scaler, rounding=4):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somehow self-explanatory, but a short comment/docstring for parameters would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in most recent commit!
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"**Because we are demonstrating with an environment-centred property, we pull the properties from the frame.arrays object.**" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"**Because we are demonstrating with an environment-centred property, we pull the properties from the frame.arrays object.**" | |
"**Because we are using environment-centred properties in this demonstration, we pull the properties from the frame.arrays object.**" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in most recent commit!
"# Read the first N frames of CSD-500\n", | ||
"frames = np.asarray(io.read(input_file, index=\":{}\".format(N)), dtype=object)\n", | ||
"\n", | ||
"# Extract chemical shifts\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment do not match the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in most recent commit!
} | ||
], | ||
"source": [ | ||
"# Compute SOAPs (from librascal tutorial)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a link to said tutorial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in most recent commit!
"# # Atomistic structure manipulation\n", | ||
"# from ase import io\n", | ||
"\n", | ||
"# # Import SOAP vectors from numpy file\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"# # Import SOAP vectors from numpy file\n", | |
"# # read/load SOAP vectors from numpy file\n", |
import
has a separate meaning in Python, so I think something like read or load would remove a possible source of confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in most recent commit!
"normalization_ = np.dot(U / np.sqrt(S), V)\n", | ||
"components_ = X_active.copy()\n", | ||
"\n", | ||
"Phi_train = np.dot(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the code is not very clear. Amongst other things, Phi is not defined, the 1e-12
in S = np.maximum(S, 1e-12)
is a bit arbitrary, and the use of SVD to build a normalization matrix is not explained. Is this talked about in the previous notebooks?
"kpcovr = KPCovR(kernel=\"linear\", center=True, **kpcovr_hypers)\n", | ||
"kpcovr.fit(\n", | ||
" Phi_train, Y_train, \n", | ||
" Yhat=Yp_train # This will throw an error if Y has not been estimated.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what the comment means. How can I know/make sure Y has been estimated, and what does "estimate Y" means in this context?
If this is all material covered in previous notebooks, then my question becomes how much do we want to couple this notebook to previous ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in most recent commit!
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Exporting to a Chemiscope" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"# Exporting to a Chemiscope" | |
"# Exporting to chemiscope" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in most recent commit!
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We'll write one output combining the training set and the testing set." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We'll write one output combining the training set and the testing set." | |
"Chemiscope is an online interactive visualizer that allows you to look at the projection in relation with the structures. Here, we'll create a chemiscope input file combining the training set and the testing set." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in most recent commit!
"\n", | ||
"chemiscope_properties = {\n", | ||
" **{\n", | ||
" f\"kpcovr_{i}\": {\"target\": target, \"values\": T[:, i]}\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done automatically in chemiscope.write_input if you pass
chemiscope_properties = {
"kpcovr": {
"target": "atom",
"values": T[:, :kpcovr.n_components]
},
"cur_feature": {
"target": "atom",
"values": cur_features[:, :n_cur]
},
**{
prop_name: {"target": target, "values": Y[:, i]}
for i, prop_name in enumerate(properties)
},
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in most recent commit!
@Luthaf As for your larger comment, I think that's the conversation you, me, @max-veit, and @ceriottm need to have. If we want a front-to-back ase-to-chemiscope pipeline, where should it live? Is it meant to be internal to the group or publicly available? I made this notebook 1) to start this conversation, 2) because I think regardless of the answer, it's a nice "bow" to K-T, and 3) because until we can figure out a more centralized example, it can serve as a "until-then" solution for group members wanting a front-to-back solution. What do you think? |
Fully agree on all points, and we can move forward with this notebook as a "bow" for kernels-tutorials regardless of the outcome of the other discussion! Personally, I think having a separate "this is the simplest way to create a chemiscope input" example would be very valuable; and could point to the notebook introduced in this PR as a good point of "here is how you can improve the projection a lot". |
f215822
to
a083353
Compare
9c4576d
to
7e0716d
Compare
Some notes:
notebooks/kpcovr-chemiscope.json