First try at automated data CLI #801

pfebrer · 2024-06-26T21:27:30Z

As discussed with Nick, this is a minimal first attempt for the automatic generation of a CLI to postprocess data.

The aim is to organize things in a way that new methods to generate data can be simply registered to data classes and automatically added to the CLI, without needing to maintain CLI options all across sisl.

The automatic generation of a CLI requires of course some boilerplate code, but the amount of boilerplate code is almost constant when we add new functionality.

The approach taken:

Created data classes in sisl.data that should help gathering all the possible ways in which the same data can be generated. For now I only implemented DOSData.
Then functions to create that data should be registered. For now I have only registered the function that creates DOSData from eigSileSiesta. This functions should be type annotated to be automatically supported by the CLI.
The CLI is created independently in sisl.cli.sdata. For now it can be ran with python -m sisl.cli.sdata. That file is hyper documented, and there I explain all the parts that play a role in generating the CLI.

This standarization should not only help to create sisl's CLI but also facilitate the creation of other interfaces to sisl down the line (including the GUI).

To use it, for now, you need to install nodify's dev version, click and rich_click:

pip install [email protected]:pfebrer/nodify.git click rich_click

The CLI can work in two ways:

Specify the type of data that you want and how to get it.

Main:

DOS data:

DOS data from the EIG file:

Pass a file, then only specify which data you want to get

Main:

DOS data:

codecov · 2024-06-26T21:35:03Z

Codecov Report

Attention: Patch coverage is 31.78295% with 88 lines in your changes missing coverage. Please review.

Project coverage is 86.93%. Comparing base (49d4398) to head (f4bd621).

Files	Patch %	Lines
src/sisl/cli/sdata.py	0.00%	68 Missing ⚠️
src/sisl/io/siesta/eig.py	22.22%	14 Missing ⚠️
src/sisl/data/data.py	75.00%	4 Missing ⚠️
src/sisl/data/_singledispatch.py	93.75%	1 Missing ⚠️
src/sisl/data/dos.py	88.88%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #801      +/-   ##
==========================================
- Coverage   87.07%   86.93%   -0.14%     
==========================================
  Files         400      405       +5     
  Lines       51657    51785     +128     
==========================================
+ Hits        44978    45018      +40     
- Misses       6679     6767      +88

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zerothi

I'll have to play a bit more with this after vacation. Thanks!

zerothi · 2024-07-13T18:38:19Z

src/sisl/cli/sdata.py

+# List that keeps track of the data that the CLI has generated.
+# When chaining commands, the first command will generate data and
+# the subsequent commands will be methods to apply on that data.
+GENERATED = []


I think this is sub-optimal, imagine one parses a grid, and a geometry (from two sources) and then writes them out, then one has to figure out the order.

I think click contexts are better here, with a dict or something, then sub-commands can also change data to other data-types.
I guess we just need some way of distinguishing data, and also enable the possibility of adding more of the same data-type. E.g. comparing two DOS is not that unreasonable, and here it would be, perhaps better, to have a data-container that contains origin of data. So one can query where it came from.

zerothi · 2024-07-13T18:39:28Z

src/sisl/cli/sdata.py

+import sys
+from typing import Callable, Optional
+
+import rich_click as click


I am probably more in favor of importing click, and if available importing rich-click.

zerothi · 2024-08-07T18:55:15Z

Some general comments here.

I don't particularly fancy the special data-type specification. I can see why it may be useful, but I kind of find it a bit duplicate because it builds on the current state of things and adds a layer between simple numpy arrays. And it is never know exactly when you might need what....

Your dos_from_eig is also duplication, I had imagined something more generic for something that reads an eigenvalue spectrum. For instance when you also do a dos_from_eig(vasp_eig) then the arguments needs to be duplicated. I guess the only way to do this generically would be to have some kind of dos_from_eig(eig, weights, Erange, dE, distribution, smearing) and then do dos_from_vasp(sile, **<extend interface with dos_from_eig>).

How, in this framework, would you envision how things comes out of generic stuff.

For instance, I could imagine reading a Hamiltonian, and then post-processing it for a DOS, but also PDOS and then plotting it?

What I do like about this is that it does not become depending on the sile it self... Perhaps I should play a bit more with this.

I will have a look at how this could be generalized a bit, towards simpler data-structures. E.g. the DOSdata seems like it will quickly explode in terms of how things gets implemented, we need specific classes for a huge number of things... Perhaps it wuold be smarter to be data-type agnostic in some form, but rely on some simpler metadata.

zerothi · 2024-08-07T18:56:05Z

@pfebrer would you mind rebasing this one? It should not interfere!

pfebrer · 2024-10-04T10:26:57Z

I have some time now to be able to work on this. Would you be available to review and merge?

I think it is really important so that the plotting CLI can be build, which would be really useful.

Your dos_from_eig is also duplication, I had imagined something more generic for something that reads an eigenvalue spectrum.

Duplication of what? I just took your current argument parser. Once there are multiple entry points that do something similar one can always generalize, of course.

E.g. the DOSdata seems like it will quickly explode in terms of how things gets implemented, we need specific classes for a huge number of things... Perhaps it wuold be smarter to be data-type agnostic in some form, but rely on some simpler metadata.

How would you centralize all the ways of computing a density of states?

DOSData can also be private, and be simply a tool for centralizing all the functionality.

How, in this framework, would you envision how things comes out of generic stuff.

For instance, I could imagine reading a Hamiltonian, and then post-processing it for a DOS, but also PDOS and then plotting it?

I think initially the CLI doesn't need to do very complex stuff, just a CLI that allows doing the same analysis (e.g. DOS) from different entry points is useful enough I think.

An advanced user can always write their own code to do custom stuff if needed. From my perspective, most users on a daily basis need simpler things than a generic interface to compute many properties at once. Having a more specific CLI for a particular popular analysis I think it is good for widespread adoption. As an example, you can just take the confusion of @ialcon with the MonkhorstPack object.

We can always add later another command to the CLI to do more generic stuff...

pfebrer added 2 commits June 26, 2024 23:04

First try at automated data CLI

caf5f84

Forgot to pass sile

f4bd621

zerothi requested changes Jul 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First try at automated data CLI #801

First try at automated data CLI #801

pfebrer commented Jun 26, 2024

codecov bot commented Jun 26, 2024 •

edited

Loading

zerothi left a comment

zerothi Jul 13, 2024

zerothi Jul 13, 2024

zerothi commented Aug 7, 2024

zerothi commented Aug 7, 2024

pfebrer commented Oct 4, 2024 •

edited

Loading

First try at automated data CLI #801

Are you sure you want to change the base?

First try at automated data CLI #801

Conversation

pfebrer commented Jun 26, 2024

codecov bot commented Jun 26, 2024 • edited Loading

Codecov Report

zerothi left a comment

Choose a reason for hiding this comment

zerothi Jul 13, 2024

Choose a reason for hiding this comment

zerothi Jul 13, 2024

Choose a reason for hiding this comment

zerothi commented Aug 7, 2024

zerothi commented Aug 7, 2024

pfebrer commented Oct 4, 2024 • edited Loading

codecov bot commented Jun 26, 2024 •

edited

Loading

pfebrer commented Oct 4, 2024 •

edited

Loading