-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Searching the catalog for a dataset #203
Comments
I agree with you, this is confusing. We should discuss that in our next meeting. |
We'll need to check the current state of the catalog vis-a-vis completeness of the variable column - I think there are some experiments that won't have that defined. |
I think if it's not defined, its somewhat ok. The issue here is that you search for a variable and return a datastore, but the datastore isn't filtered by the variable you've searched for. If you can't refine by variable when searching for a datastore, it doesn't matter. The some logic applies to the other columns too btw (principally frequency, although also short_name / long_name). |
I'm not sure whether you're talking about the issue described in the documentation I linked above, or the fact that Intake-ESM only reduces returned datasets by queries on the |
Ah the first one, although I would like the second one to change too :-) Can/could/should |
It's pretty clunky because it only works nicely when the columns in the catalog and datastore are the same, which isnt guaranteed. Otherwise it throws warnings. |
I think this relates to search functionality in the intake-dataframe-catalogue. I'm working on trying to better understand the search functionality there - just commenting so I can come back and find this more easily. |
Is your feature request related to a problem? Please describe.
Currently, to retrieve a dataset you need to do two search operations - one to find an intake-esm datastore, and then search the datastore for a variable of interest. This is confusing, as the catalog can be searched for variables, however the resulting datastore contains all variables, instead of only the variables searched for:
i.e.
Describe the feature you'd like
I would like to be able to get from a catalog search directly to a dataset
e.g.:
cat.search(name='025deg_jra55_ryf9091_gadi', variable='aice_m').to_dask()
Returns an xarray dataset
I would like for catalog searches to return a datastore search if possible:
e.g.
cat.search(name='025deg_jra55_ryf9091_gadi', variable='aice_m').to_source()
Returns the same result as :
cat['025deg_jra55_ryf9091_gadi'].search( variable='aice_m').to_dask()
Describe alternatives you've considered
No change - train users in the current implementation
Additional context
I haven't thought about how this might apply to the CMIP6 datastores - which are formatted / handled a bit differently
The text was updated successfully, but these errors were encountered: