-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic identification of rings in sisl.Geometry() #688
Comments
Thanks for this, indeed we should think about this, and your proposals incorporate ideas from #687 #202 and other friends. One thing to think about in such a case is whether an atom in a ring, can be part of another ring. But perhaps that can be refined in a post-processing of the categories. My idea here is to use categories, imagine it could also be used to discover defects 7-5-7 etc.Hmmm. lets keep this in mind with #687 |
I have often come up with cases when I don't want a category to tell me if an atom is from that category or not, but instead I want to classify atoms based on some property. E.g. on the number of neighbours they have or, in this case, on the ring they belong to. |
Could you elaborate? Couldn't categories be used for this as well, e.g. the neighbour category can tell which atoms has a specific neighbor environment. |
Yes, but what if I have three different environments, and I want an array telling me which environment each atom belongs to. With the current situation I can do it, but I always have to write some extra code to apply each category separately and then merge the categorizations. |
That is true. cat1 = Category(...)
cat2 = Category(...)
(cat1 or cat2).categorize(geometry) # returns only cat1 or cat2
cat1.categorize(geometry) and cat2.categorize(geometry) # returns and or category the ambiguity of the categories is also that they return them-selves. Which is a bit of a head-ache, you never know if you are using a result category, or a just created category. A possible solution would be if Then I can foresee that |
@pfebrer what do you need from the categories, are they too unflexible, what should we do to make them simpler and, more importantly, intuiitive to use. |
First time seeing these categories/categorize in action, so I am bit lost. Just let me mention that for the hexagons example I made earlier, one should be able to have atoms in different hexagons, whatever the way they are classified, because in a GNR or NPG, hexagons share atoms with their neighboring hexagons, so... besides, is it somewhere explained what these lines do (or are supposed to do)? cat1 = Category(...)
cat1.categorize(geometry) Just to understand what all the Category philosophy is about, then probably I may provide better ideas |
currently, we haven't really publicised them too much... The idea is that one can categorize an atom based on some criteria. The actual code can be found here. For instance one can get all carbon atoms via: cat_C = sisl.geom.AtomZ(6) then one can categorize atoms based on that criteria: result = cat_C.categorize(geometry)
for ia, cat in enumerate(result):
print(f"atom {ia} is a Carbon atom") The true gain is when one starts combining categories: from sisl.geom import *
cat_C = AtomZ(6) # match all Z == 6
neighbor = AtomNeighbor(3, R=(1, 1.44)) # match all atoms with 3 atoms within the [1; 1.44] distance
C_neighbor = cat_C & neighbor
C_neighbor.categorize(geometry) in this way one can combine pretty complicated requirements to match a certain atom (there are many other options as well). |
Just to elaborate a bit more on the ideas we briefly discussed yesterday in Discord. If fragment categories were possible, one could easily do the atom classification that @pfebrer was talking about in #689. For example, and staying within the topic of this issue, we could have a from sisl.geom import *
cat_arylRing = Ring('C', 6) # match all 6-membered rings made of carbon atoms
aryl_rings = cat_arylRing.categorize(geometry) # list of all aryl rings in the geometry
# now we may loop through all aryl rings and get associated info from
# each category result (i.e. each aryl_ring instance)
for aryl in aryl_rings :
print(aryl.atom_ids) # e.g.: 0, 4, 5, 8, 1, 6
print(aryl.id) # e.g.: 4
print(aryl.dihedral) # e.g.: 55.7 degs Here of course I've gone a bit too far with the possible attributes that the category results from I perfectly understand if you think this is a bit too much/far for now. But I felt this could be a possible way to allow categories to go a bit crazier and do significantly more complex things than simply identifying atoms with a particular characteristic. |
A note on the implementation here, my guess is that this would be much faster and simpler by using Voronoi diagrams of coordinates to extract probable positions of rings. import sisl as si
import matplotlib.pyplot as plt
import numpy as np
from scipy.spatial import Voronoi, voronoi_plot_2d
graphene = si.geom.graphene() * (5, 5, 1)
vor = Voronoi(graphene.xyz[:, :2])
fig = voronoi_plot_2d(vor)
plt.show() this might be much simpler to parse, since with the Voronoi points one can use a single Note, that this can easily be generalized for |
hNPG_str.fdf.txt I am not sure if these issues could be handled by having specific inputs to the Voronoi function (e.g. through |
thanks for checking this out @ialcon ! My approach would be something like this:
The above I think could be pretty generic. |
I find some issues though... First, as you may see with the NPG example, one also gets points of "rings" that are not really aryl rings - the nanopore in NPG is the most clear example of that. Therefore, there should be a way to discard the Voronoi points that do not belong to the targeted So, I think that providing a With regards to the bi-layer, some questions here: bilayerGr = geom.bilayer(..., bottom_atoms=C, top_atoms=C)
top_layer = bilayerGr.top
bottom_layer = bilayerGr.bottom
# GET RINGS
vor_top = Voronoi(top_layer.xyz[:, :2])
vor_bottom = Voronoi(bottom_layer.xyz[:, :2])
... b) Realize that if we provide the 3D-coordinates, we will get Voronoi points between the two graphene layers. So these out-of-plane Voronoi points should also be discarded somehow. This could be done via a fine-tuned So, to me, I find that some inputs should be provided by the user because, otherwise, it would be hard for |
Another approach to find regular polygons, which I think would be more straightforward but maybe slower would be to:
|
Well, there are multiple use-cases. :) I think the What I think is important, is if this method finds all aryl rings and other rings as well. In this case I think it does exactly what it is supposed to. :) That is why I suggest to first use the method to create a base-class that encapsulates the ring property (this includes pores etc.) Then later, more specialized, ring finders can use the atoms on the rings to figure out if they are rings or not, and discard them if not. For instance, a
The voronoi algorithm still works for 3d coordinates, simply pass And it would actually separate the layers by using my method described above. So this might be an even simpler method to separate layers. E.g. a tilted bi-layer, a double walled nanotube, etc.
Well, I am pretty convinced that Voronoi can fill the gap to a large extend, then finetuning the results can narrow the search.
I think the Voronoi seems extremely generic and it will also be very efficient in terms of searching things since we can narrow the search space by the cells. I think creating the graph here isn't really necessary. One would always have to post-process the coordinates found by any algorithm to ensure the group of ring atoms forms an aryl ring. |
How would the Voronoi points allow you to separate the layers? By analyzing the distance of atoms to the Voronoi points? Or distance between Voronoi points? What I do not have fully clear as of now is your points 3 and 4. Regarding 3, how do you assign atoms that are sitting between two Voronoi points? This might be particularly tricky when you have rings of different radius (e.g. a meta-aryl-ring and the pore close to it). Also, note that pores do not have a regular circular "radius". How would you point 3 select only those atoms that define the nanopore? Intuitively I would say that @pfebrer's approach should be more stable (though maybe less efficient computationally) because there you are already working with the graph obtained from the atomic connections (the "neighborhood"). Your Voronoi approach, on the contrary, starts without a clue about the connections, and only looks at the atomic distribution of atoms. This makes it more general, but I think it may complicate the post-processing and more easily generate errors. It might as well be that I need to play more with the Voronoi function to get around it... |
Hmm, I need to test this more rigorously.
My idea is not to only use Voronoi. Voronoi has a good classification of regions. But we still require to post-analyse the regions as per the regular approach. So Voronoi is the initial segregator of regions of space, subsequent things needs to be done on the same rigourous way. The nice thing about Voronoi is that is separates parts, regardless of bond-lengths etc. So it is un-biased for elements etc. This is my main motivation for it, it is general in this sense. So, point 3 would do it as @pfebrer suggests. And the
@pfebrer neighbor requires a correct radius input, and say an MD simulation where the radius' are slightly out of bounds would result in problems. At least here one could define them based on some other measures rather than aryl ring radius. Well, it of course needs to be explored how it should be done... |
I also was thinking that if one can convert the structure to an image (where for example bonds are drawn in white over a black background, then one can apply typical algorithms to find contours (e.g. skimage's In any case, handling periodicity is not trivial, no? Would you expand to the auxiliary cell to run the algorithm? |
This is actually a very good point which, by the way, would be solved if we used graphs instead of Voronoi points (because periodicity is already accounted for in the neighbours list). Though, as @pfebrer suggests, maybe one could expand to the auxiliary cell to include inter-cell Voronoi points (i.e. inter-cell
I guess the question then is what is more robust (to, for e.g., a molecular dynamics): bond-lengths or ring-radius? |
I think handling periodicity in the graph case is also not trivial. But maybe it is simpler, I don't know :) |
By the way, your original goal (separating two ribbons in NPG) I think could be very easily solved by building the graph and then applying spectral clustering to separate the two parts (which you could do with sklearn: https://scikit-learn.org/stable/modules/clustering.html). You don't need to detect rings 😅 |
Well, the GNR separation in NPGs would be one usage for That being said, the approach of @zerothi is even more general than that (detection of any type of |
But inter-cell bonds will be present in the |
There should probably be some differentiation between nodes of different cells, so that for example the searching algorithms don't see it as the same node. I think in the end one would need to create the graph for the auxiliary cell as well. |
The idea of this feature would be to have an automated search of rings of a particular number of atoms (that is, squares, pentagons, hexagons, etc) within a sisl.Geometry() object. Such a feature would be extremely useful to further analyze/categorize the inner fragments of a given geometry. For example, say you have a nanoporous graphene structure, such as in this open-access paper (Fig. 1 in https://onlinelibrary.wiley.com/doi/full/10.1002/adfm.202104031), which is formed as a 2D array of (1D) graphene nanoribbons (GNRs). By getting the hexagons within the NPG, one could probably find a relatively simple (and general) manner to classify the atoms within the NPG as belonging to one GNR or the other, which would be something much harder to do (in a general way) without having that prior classification of atoms as within hexagons.
The way that I implemented this thing (outside sisl, of course) was focused exclusively on hexagonal rings in organic systems (i.e. benzene or aryl rings). I find that using the connections (or bonds) information is the most general way to do this. I attach a snipped of that implementation I made, where this is done within a polymer class that I made myself, so the "self" is a polymer instance.
rings_snipped.py.txt
As you will see, this is just 7 for loops over all atomic indices, and for each index we loop over the bonds of that atom (via "for id_3 in self.bonds[id_2]", for example) and we keep this going on until we find a closed 6-atoms ring. Of course, this is not a very efficient implementation, but for small unit cells of a few hundreds of atoms this takes absolutely nothing in a regular laptop. Of course, such implementation would be more problematic for large-scale structures, I guess, but for regular periodic or molecular structures this works fine.
Another important question is how to treat the hexagon (or pentagon, or whatever) structure (i.e. atomic indices): that is, if you treat them as a list of atoms, or as a geometry object in itself, or, if I understand it correctly, as a geometry.category - obviously, the more things we could do with that object the better, however, the complexity of the thing could explode quite easily. But that might be subject for another issue/discussion.
The text was updated successfully, but these errors were encountered: