Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working Workflow Example #36

Open
wants to merge 204 commits into
base: master
Choose a base branch
from
Open

Working Workflow Example #36

wants to merge 204 commits into from

Conversation

rosecers
Copy link
Contributor

@rosecers rosecers commented Feb 8, 2021

Some notes:

  • because this is living in the kernel-tutorials repo, I expanded the sections on methods covered in kernel tutorials, but left the methods not covered in kernel-tutorials somewhat cursory. In other words, cells for librascal and chemiscope are somewhat succinct, although this can be brought up for debate.
  • I've left the notebook unstripped so that you don't have to run it to see the output. We can strip upon merge.
  • There is the resulting chemiscope file for viewing under notebooks/kpcovr-chemiscope.json

ceriottm and others added 30 commits February 9, 2020 14:33
Commiting to rebase

Edits from Felix

Edits from Felix
Edits from Felix
W --> PTY or PKY in NB4

W --> PTY or PKY in NB4
PKGuo and others added 20 commits November 18, 2020 00:34
Replace utility classes with sklearn classes in NB2
Removed all references to the utility classes and referenced the appr…
Replace utility classes with sklearn classes in NB4
Replace utility classes with sklearn classes in NB3
Some notes:
- because this is living in the kernel-tutorials repo, I expanded the sections on methods covered _in_ kernel tutorials, but left the methods not covered in kernel-tutorials somewhat cursory. In other words, cells for librascal and chemiscope are somewhat succinct, although this can be brought up for debate.
- I've left the notebook unstripped so that you don't have to run it to see the output. We can strip upon merge.
- There is the resulting chemiscope file for viewing under `notebooks/kpcovr-chemiscope.json`
@rosecers rosecers requested a review from Luthaf February 8, 2021 12:38
Copy link
Contributor

@Luthaf Luthaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mostly small comments, as well as a higher level one.

This notebook is very good as a "you have done all of kernel-tutorials, now here is a end-to-end example of how you could use it" notebook, which is a different use case from the "here is how you get from a structure to a chemiscope input" which I had in mind for lab-cosmo/chemiscope#114.

In particular, the use of (PCov-)FPS, (PCov-)CUR, and sparse methods is adding a lot of complexity for people that might just want to try out chemiscope quickly. The notebook also assumes knowledge of SOAP and descriptors, and do not introduce the reason behind dimensionality reduction.

I think the question here is how much do we want to couple this to the rest of kernels-tutorials/skcosmo. Should someone be able to read/use this notebook without reading the rest of kernel-tutorials?

"from sklearn.linear_model import RidgeCV\n",
"\n",
"\n",
"def calc_rmse(y1, y2, y_scaler, rounding=4):\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somehow self-explanatory, but a short comment/docstring for parameters would be nice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in most recent commit!

"cell_type": "markdown",
"metadata": {},
"source": [
"**Because we are demonstrating with an environment-centred property, we pull the properties from the frame.arrays object.**"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"**Because we are demonstrating with an environment-centred property, we pull the properties from the frame.arrays object.**"
"**Because we are using environment-centred properties in this demonstration, we pull the properties from the frame.arrays object.**"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in most recent commit!

"# Read the first N frames of CSD-500\n",
"frames = np.asarray(io.read(input_file, index=\":{}\".format(N)), dtype=object)\n",
"\n",
"# Extract chemical shifts\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment do not match the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in most recent commit!

}
],
"source": [
"# Compute SOAPs (from librascal tutorial)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a link to said tutorial?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in most recent commit!

"# # Atomistic structure manipulation\n",
"# from ase import io\n",
"\n",
"# # Import SOAP vectors from numpy file\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"# # Import SOAP vectors from numpy file\n",
"# # read/load SOAP vectors from numpy file\n",

import has a separate meaning in Python, so I think something like read or load would remove a possible source of confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in most recent commit!

"normalization_ = np.dot(U / np.sqrt(S), V)\n",
"components_ = X_active.copy()\n",
"\n",
"Phi_train = np.dot(\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the code is not very clear. Amongst other things, Phi is not defined, the 1e-12 in S = np.maximum(S, 1e-12) is a bit arbitrary, and the use of SVD to build a normalization matrix is not explained. Is this talked about in the previous notebooks?

"kpcovr = KPCovR(kernel=\"linear\", center=True, **kpcovr_hypers)\n",
"kpcovr.fit(\n",
" Phi_train, Y_train, \n",
" Yhat=Yp_train # This will throw an error if Y has not been estimated.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what the comment means. How can I know/make sure Y has been estimated, and what does "estimate Y" means in this context?

If this is all material covered in previous notebooks, then my question becomes how much do we want to couple this notebook to previous ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in most recent commit!

"cell_type": "markdown",
"metadata": {},
"source": [
"# Exporting to a Chemiscope"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"# Exporting to a Chemiscope"
"# Exporting to chemiscope"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in most recent commit!

"cell_type": "markdown",
"metadata": {},
"source": [
"We'll write one output combining the training set and the testing set."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"We'll write one output combining the training set and the testing set."
"Chemiscope is an online interactive visualizer that allows you to look at the projection in relation with the structures. Here, we'll create a chemiscope input file combining the training set and the testing set."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in most recent commit!

"\n",
"chemiscope_properties = {\n",
" **{\n",
" f\"kpcovr_{i}\": {\"target\": target, \"values\": T[:, i]}\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done automatically in chemiscope.write_input if you pass

chemiscope_properties = {
    "kpcovr": {
        "target": "atom", 
        "values": T[:, :kpcovr.n_components]
    },
    "cur_feature": {
        "target": "atom", 
        "values": cur_features[:, :n_cur]
   },
    **{
        prop_name: {"target": target, "values": Y[:, i]}
        for i, prop_name in enumerate(properties)
    },
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in most recent commit!

@rosecers
Copy link
Contributor Author

rosecers commented Feb 8, 2021

@Luthaf As for your larger comment, I think that's the conversation you, me, @max-veit, and @ceriottm need to have. If we want a front-to-back ase-to-chemiscope pipeline, where should it live? Is it meant to be internal to the group or publicly available?
As I said, I went heavy on the scikit-cosmo-like functionalities because we're in kernel-tutorials. A possible solution is to have three notebooks all doing the same thing, but with different emphases (librascal, scikit-cosmo/KT, chemiscope), living in the three packages. This is a bit dangerous though, due to the constantly updating nature of the packages.

I made this notebook 1) to start this conversation, 2) because I think regardless of the answer, it's a nice "bow" to K-T, and 3) because until we can figure out a more centralized example, it can serve as a "until-then" solution for group members wanting a front-to-back solution. What do you think?

@Luthaf
Copy link
Contributor

Luthaf commented Feb 8, 2021

Fully agree on all points, and we can move forward with this notebook as a "bow" for kernels-tutorials regardless of the outcome of the other discussion!

Personally, I think having a separate "this is the simplest way to create a chemiscope input" example would be very valuable; and could point to the notebook introduced in this PR as a good point of "here is how you can improve the projection a lot".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants