Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle case where each variable is represented as a separate granule #32

Open
hrodmn opened this issue Oct 15, 2024 · 2 comments
Open

Comments

@hrodmn
Copy link
Contributor

hrodmn commented Oct 15, 2024

Right now titiler-cmr can handle the case where granules are defined by a distinct time point and where each granule has the same set of variables. There many datasets in CMR where each granule represents the same timestep but have a different variable. For example, the Regridded Harmonized World Soil Database v1.2 dataset has 27 granules that each contain estimates of a different soil property.

To handle this case we could use the bands_regex parameter for the xarray backend case that would filter the granule results down to one that matches the regex. We would need to change the format of the mosaic_assets in this case since the ZarrReader can't handle the dictionary of {band: url} keys get_assets produces when you provide band_regex.

I hacked a solution together just to see if it is feasible, here are some tiles from the soil PH layer of that dataset:
image

$ git diff titiler/.
diff --git a/titiler/cmr/backend.py b/titiler/cmr/backend.py
index 50307e7..1d2a676 100644
--- a/titiler/cmr/backend.py
+++ b/titiler/cmr/backend.py
@@ -231,12 +231,21 @@ class CMRBackend(BaseBackend):
             access=s3_auth_config.access,
             bands_regex=bands_regex,
         )
-
         if not mosaic_assets:
             raise NoAssetFoundError(
                 f"No assets found for tile {tile_z}-{tile_x}-{tile_y}"
             )

+        # reformat the mosaic_assets to match expectation for xarray backend
+        # would only want to do this for the backend="xarray" case...
+        if bands_regex:
+            asset = mosaic_assets[0]
+            if len(asset) > 1:
+                raise ValueError("bands_regex returned multiple assets!")
+            url = list(asset["url"].values())[0]
+            provider = asset["provider"]
+            mosaic_assets = [{"url": url, "provider": provider}]
+
         def _reader(asset: Asset, x: int, y: int, z: int, **kwargs: Any) -> ImageData:
             if (
                 s3_auth_config.strategy == "environment"
diff --git a/titiler/cmr/factory.py b/titiler/cmr/factory.py
index e10d3af..8e7b73d 100644
--- a/titiler/cmr/factory.py
+++ b/titiler/cmr/factory.py
@@ -121,7 +121,9 @@ def parse_reader_options(

     if reader_params.backend == "xarray":
         reader = ZarrReader
-        read_options = {}
+        read_options = {
+            "bands_regex": rasterio_params.bands_regex,
+        }

         options = {
             "variable": zarr_params.variable,
@vincentsarago
Copy link
Member

🤯 there are too many ways to handle

  • COG
  • COG but stored as multiple assets (file per band)
  • zarr/netcdf
  • netcdf stored as multiple assets (file per variable)

First, I think we should rename bands_regex -> assets_regex

I think I lost tracks but for xarray dataset we need a Variable= option, right? I'm not sure why we need to pass bands_regex to the reader (with read_options), the variable should be one asset from the list of assets returned

@hrodmn
Copy link
Contributor Author

hrodmn commented Oct 15, 2024

The mixed xarray/rasterio logic is starting to get a bit messy with conditional checks in the single CMRBackend class. Maybe we are at the point where it would be cleaner to have several backends: CMRRasterioBackend and CMRXarrayBackend. There could be some shared utility functions but this structure might make it easier to do the right thing for each of these cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants