-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Challenge 31 - Advance user capabilities to handle data constraints when using CDSAPI #9
Comments
Hi! What do the constraints look like? Also, can two datasets from the same family (like ERA5, for example) have different constraints? |
Hi Catalin,
Constraints are represented as JSON files. In principle, a constraints file (one per dataset) is a list of dictionaries having:
Here is an example (for use in the context of this challenge only):
corresponding to this dataset. Beware constraints can evolve in time though. Each such dictionary (i.e. a constraint) represents a complete data cube, i.e. all possible combinations of widget values in it correspond to existing data granules. The list of all constraints covers the universe of available data for a dataset. Take for example the last constraint in the example above (i.e. last dictionary). Source, version, variable and year are the widget names/dimensions. The available data granules are
Yes. And that is typically the case, i.e. one constraint file per dataset. They can vary a lot:
If you have any other questions, please let us know. Have a nice day! Petrut COBARZAN & the team |
For example for this dataset https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels?tab=form
By |
Yes. That is a cleverly constructed example (that can be inferred without knowing the specific constraints for this dataset). Very good! The initial request would be broken into (at least) these two sub-requests, which might be themselves broken into more fine-grained sub-requests (if necessary, and not necessarily in this order). The ultimate objective is to determine (and then submit for execution) a set of sub-requests for which the entire corresponding data cube is available. Ideally, the union of this set of sub-requests would be equal to the intersection between the client's initial request/selection and the set of constraints/available data cubes. Also, the set would be pairwise disjoint (so that no data granule is covered more than once). Ultimately, the set would be as small as possible (so that we perform the minimum possible number of requests). However, the size of each individual request should be (generally) small enough so that the CDS engine does not get clogged with large requests. |
The implementation of the solution will be integrated into the cdsapi? |
Yes (subject to the quality of the resulting solution, of course). Development could be carried in a fork of the repository or as a totally independent solution. |
In this dataset, I think the "Pressure level" section is optional. How are these optional keys defined in a constraint? |
In situations where the constraints/data cubes vary in terms of dimensionality, some widgets/dimensions are not required in certain selection combinations. In the example above, the pressure level is only relevant for multi-level variables. In such cases, the constraints concerning single-level variables would not contain the pressure level dimension, while the ones concerning the multi-level variables might. |
Challenge 31 - Advance user capabilities to handle data constraints when using CDSAPI
Goal
Create a python library that will allow users to embed additional intelligence onto their scripts to handle CDS Dataset constraints improving the accuracy of submitted requests via cds-api.
Mentors and skills
https://cds.climate.copernicus.eu/
https://cds.climate.copernicus.eu/api-how-to
Challenge description
Problem: Currently constraints are just functional to users when using the web interactive download form. Constraints manage the availability of different combinations when user is filling the form, guiding users towards requests which are valid by activating or deactivating available options in the widgets. These constraints are exposed via cdsapi but hidden to users and not documented. Because of that CDS process many requests from users which are wrong in scope and finally fail. This is not good for the users, neither for the system.
Data/System to be used: To do this challenge, it is only required a Python development environment, and account on CDS (https://cds.climate.copernicus.eu/) and the cdsapi (https://cds.climate.copernicus.eu/api-how-to).
Solution: A python library that is able to access the constraints definition for a given dataset via CDSAPI, and decoded it on the client side allowing user to perform different actions:
- Get information about the scope and definition of a dataset scope via api (variables, time ranges, ...)
- Automatise the definition of a valid set of requests before submission via api.
- Implement automatic checks of data availability to trigger submission of requests (eg. data updated periodically)
- Check the validity of a request before submission via api.
Ideas for implementation: these have been introduced on previous paragraphs. Mentors will help to configure their accounts and cdsapi, understand the constraints definition file (json), facilitate the understanding of the system, provide guide on datasets and polish the functional scope of requirements.
Resulting libraries will be put on the hands of cdsapi users as to have broader visibility on the real availability of data allowing more accuracy on the submitted requests. On one hand this will benefit user efficiency accessing the system and in the other will reduce unnecessary traffic of requests to the system. This feature will extend the capabilities of the new CDS Engine and API.
The text was updated successfully, but these errors were encountered: