-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manipulation of coordinages do not materialize to kerchunk refs #281
Comments
Testing the same as above but modifying another variable: vds_modified = vds.set_coords(['vertices_longitude'])
vds_modified.virtualize.to_kerchunk(
'testing.parquet', format="parquet"
)
ds_reopened = xr.open_dataset(
'testing.parquet',
engine='kerchunk',
backend_kwargs={
'storage_options':{"remote_options":{'anon':True}}
}
)
ds_reopened and one more time vds_modified = vds.set_coords(['lev_bnds'])
vds_modified.virtualize.to_kerchunk(
'testing.parquet', format="parquet"
)
ds_reopened = xr.open_dataset(
'testing.parquet',
engine='kerchunk',
backend_kwargs={
'storage_options':{"remote_options":{'anon':True}}
}
)
ds_reopened Both give which is the same output as above. It is also the same output if I do not modify the coordinates at all! vds_modified = vds
vds_modified.virtualize.to_kerchunk(
'testing.parquet', format="parquet"
)
ds_reopened = xr.open_dataset(
'testing.parquet',
engine='kerchunk',
backend_kwargs={
'storage_options':{"remote_options":{'anon':True}}
}
)
ds_reopened So I think this might be a combination of #189 and a broken correspondence between the data_variables/coordinates order of the virtual dataset in memory and the ref on disk (or the way xarray is reading that back in). |
Coordinates don't exist in zarrs model, so when Xarray opens a zarr store (or a kerchunk references representation of one), my understanding of how it determines zarr arrays should be set as coordinates is that it
(would be great if you could confirm this @dcherian) I believe right now VirtualiZarr handles (1) correctly, (2) has a bug (#189), and (3) it doesn't even try to do yet. Ayush's PR just solves (2), but didn't get finished as it is without tests. I tried to solve both (2) and (3) together in my PR by calling the same logic that Xarray uses when it does CF decoding. This is a bit of a rabbit hole though, and it would probably be better to just fix one thing at a time. It would be great if one of you could pick up Ayush's (small!) PR and see if that solves your issue. |
This is not a burning priority for the meeting as far as I can tell right now. Def struggling to get stuff sorted for the ESGF meeting next week, but please ping me after if there is still a need! |
@norlandrhagen and I just came across what we believe is a bug when I manually set variables as coordinates on a virtual dataset.
To recreate I am taking a single CMIP6 output file and virtualize it:
Works great, but there are some coordinates declared as variables (maybe this is related to #189? ). Either way if I try to correct this on the virtualized dataset everything seems fine
Now I expected that these modifications would be saved when I materialize and reload the dataset
but somehow I am getting another variable as a coordinate? Note that
'longitude'
is now a coordinate all the sudden...The text was updated successfully, but these errors were encountered: