-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First try at automated data CLI #801
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #801 +/- ##
==========================================
- Coverage 87.07% 86.93% -0.14%
==========================================
Files 400 405 +5
Lines 51657 51785 +128
==========================================
+ Hits 44978 45018 +40
- Misses 6679 6767 +88 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to play a bit more with this after vacation. Thanks!
# List that keeps track of the data that the CLI has generated. | ||
# When chaining commands, the first command will generate data and | ||
# the subsequent commands will be methods to apply on that data. | ||
GENERATED = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is sub-optimal, imagine one parses a grid, and a geometry (from two sources) and then writes them out, then one has to figure out the order.
I think click contexts are better here, with a dict or something, then sub-commands can also change data to other data-types.
I guess we just need some way of distinguishing data, and also enable the possibility of adding more of the same data-type. E.g. comparing two DOS is not that unreasonable, and here it would be, perhaps better, to have a data-container that contains origin of data. So one can query where it came from.
import sys | ||
from typing import Callable, Optional | ||
|
||
import rich_click as click |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably more in favor of importing click, and if available importing rich-click.
Some general comments here. I don't particularly fancy the special data-type specification. I can see why it may be useful, but I kind of find it a bit duplicate because it builds on the current state of things and adds a layer between simple numpy arrays. And it is never know exactly when you might need what.... Your How, in this framework, would you envision how things comes out of generic stuff. For instance, I could imagine reading a Hamiltonian, and then post-processing it for a DOS, but also PDOS and then plotting it? What I do like about this is that it does not become depending on the sile it self... Perhaps I should play a bit more with this. I will have a look at how this could be generalized a bit, towards simpler data-structures. E.g. the |
@pfebrer would you mind rebasing this one? It should not interfere! |
I have some time now to be able to work on this. Would you be available to review and merge? I think it is really important so that the plotting CLI can be build, which would be really useful.
Duplication of what? I just took your current argument parser. Once there are multiple entry points that do something similar one can always generalize, of course.
How would you centralize all the ways of computing a density of states? DOSData can also be private, and be simply a tool for centralizing all the functionality.
I think initially the CLI doesn't need to do very complex stuff, just a CLI that allows doing the same analysis (e.g. DOS) from different entry points is useful enough I think. An advanced user can always write their own code to do custom stuff if needed. From my perspective, most users on a daily basis need simpler things than a generic interface to compute many properties at once. Having a more specific CLI for a particular popular analysis I think it is good for widespread adoption. As an example, you can just take the confusion of @ialcon with the We can always add later another command to the CLI to do more generic stuff... |
As discussed with Nick, this is a minimal first attempt for the automatic generation of a CLI to postprocess data.
The aim is to organize things in a way that new methods to generate data can be simply registered to data classes and automatically added to the CLI, without needing to maintain CLI options all across sisl.
The automatic generation of a CLI requires of course some boilerplate code, but the amount of boilerplate code is almost constant when we add new functionality.
The approach taken:
sisl.data
that should help gathering all the possible ways in which the same data can be generated. For now I only implementedDOSData
.DOSData
fromeigSileSiesta
. This functions should be type annotated to be automatically supported by the CLI.sisl.cli.sdata
. For now it can be ran withpython -m sisl.cli.sdata
. That file is hyper documented, and there I explain all the parts that play a role in generating the CLI.This standarization should not only help to create sisl's CLI but also facilitate the creation of other interfaces to sisl down the line (including the GUI).
To use it, for now, you need to install
nodify
's dev version,click
andrich_click
:The CLI can work in two ways:
Specify the type of data that you want and how to get it.
Main:
DOS data:
DOS data from the EIG file:
Pass a file, then only specify which data you want to get
Main:
DOS data: