Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exactextract Python, no coverage_fraction? #125

Open
ZZMitch opened this issue Jun 24, 2024 · 3 comments
Open

exactextract Python, no coverage_fraction? #125

ZZMitch opened this issue Jun 24, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@ZZMitch
Copy link

ZZMitch commented Jun 24, 2024

Hello,

I have a shapefile (GeoDataFrame in Python) that I want to convert to raster (e.g., xarray in Python) with a certain pixel grid with each pixel quantifying the % coverage of that shapefile for the pixel area. After some searching, I came across exactextract.

I was able to able to install the Python version of exactextract using pip install exactextract and confirmed it is functioning properly:

import exactextract as ee
ee.exact_extract(rast = xras, vec = gdf, ops = 'mean')

Where xras is an xarray DataArray with the pixel grid I want to match and gdf is the GeoDataFrame where I want the % coverage. I know this test code does not get that result, but it does show the package is installed and working.

From what I can tell, there is no equivalent of coverage_fraction for the current Python version of exactextract?

I know they Python version is pretty new, so wanted to confirm this functionality is not yet available. If not, does anyone have any options for achieving this result currently in Python? I know about rasterio's rasterize but it is classifies pixels based on centroids or intersection ("all_touched"), rather than % coverage.

Thanks!

@dbaston
Copy link
Member

dbaston commented Jun 24, 2024

At the moment, this function only exists on the R side. It would not be difficult to expose in Python, but I haven't done so yet.

Here are some questions that come to mind on the implementation.

A limitation of the R function is that it returns a list of in-memory rasters, which is simple but greatly limits the size of the data you can process. Should Python do the same thing, and if so, what format? A list of GDAL in-memory rasters? A list of NumPy matrices with associated extents?

Alternatively (or additionally) should it write the outputs directly to disk to avoid memory limitations? As individual rasters - how would they be named? Or as a multiband raster with one band per input feature?

@dbaston dbaston added the enhancement New feature or request label Jun 24, 2024
@ZZMitch
Copy link
Author

ZZMitch commented Jun 25, 2024

Hey @dbaston, thanks for the information!

I am not sure how it would be applied, but I enjoy using xarray(which is backed by dask and numpy) to work with larger-than-memory and multi-dimensional raster files. On the vector side there is dask support for geopandas.

For larger-then-memory operations, could there be some process that runs on chunks of the input vector file and produces raster tiles - saved on disk... that could later be combined with xarray operations? I am not sure about naming, generally I am interested in rasterizing to a single layer (e.g., not different %s for different vector attributes).

Alternatively - if the operation will use too much memory the function could just error out, and it be up the user to chunk up their vector file to a size that will fit in their memory and combine the rasterized outputs later?

@dcherian
Copy link

dcherian commented Sep 11, 2024

A list of NumPy matrices with associated extents?

This could be OK but the structure of the output seems to most naturally fit a nD sparse array (e.g. https://sparse.pydata.org/en/stable/) i.e. we could add a "polygon" dimension - slices along this dimension contain fractional weights for that polygon. I bet we could construct this sparse arrays if you returned dense numpy arrays and associated extents from the C++ layer. Potentially it'd be better to return dense arrays that mapped to COO, CSR, or CSC representations of sparse arrays.

Then you could imagine chunking along the "polygon" dimension that maps to a chunked vector of geometries (e.g. dask-geopandas).

PS: your talk at FOSS4G-NA was quite nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants