Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make include_nodata argument available for built-in operations #147

Open
dbaston opened this issue Aug 26, 2024 · 3 comments
Open

Make include_nodata argument available for built-in operations #147

dbaston opened this issue Aug 26, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@dbaston
Copy link
Member

dbaston commented Aug 26, 2024

Would make count(include_nodata=true) available as a more legible alternative to e.g. count(default_value=0). Not sure if it's worth it.

@dbaston dbaston added the enhancement New feature or request label Aug 26, 2024
@theroggy
Copy link
Contributor

Not sure what the effort is, so difficult to judge the cost/benefits, but at least it is quite a bit cleaner. In quite some cases it might even be impossible to use the work around.

Finding a values that is surely not being used next to the actual nodata value will at least be a fuss, and in some cases it will even be impossible without actually recoding the data, or using a fake value if that works?

E.g. data that has been encoded to a byte, often all values are "taken", so choosing a default_value different than the actual nodata value can be "difficult". Maybe it is possible to just pick any in value... but anyway it is a hassle.
Not sure why it is called default_value anyway instead of nodata or a variant.

@dbaston
Copy link
Member Author

dbaston commented Aug 26, 2024

Maybe it is possible to just pick any in value... but anyway it is a hassle.

I was thinking of "count", where it doesn't matter if the value is "taken." But I guess it's also useful for "unique" and "frac".

Not sure why it is called default_value anyway instead of nodata or a variant.

I would expect nodata to specify a value to be ignored, whereas this is specifying a value to use in place of nodata. I think of it like an SQL COALESCE(value, default_value). For example, for a population raster that uses NaN for ocean cells, you would want these to be considered as 0 for most stats. The problem would really best be solved with a GDAL VRT, but it's unfortunately a bit cumbersome to construct one for this scenario.

@theroggy
Copy link
Contributor

theroggy commented Aug 26, 2024

Not sure why it is called default_value anyway instead of nodata or a variant.

I would expect nodata to specify a value to be ignored, whereas this is specifying a value to use in place of nodata. I think of it like an SQL COALESCE(value, default_value). For example, for a population raster that uses NaN for ocean cells, you would want these to be considered as 0 for most stats. The problem would really best be solved with a GDAL VRT, but it's unfortunately a bit cumbersome to construct one for this scenario.

True... clear names (for everyone) are sometimes difficult to find. Once explained it does make sense... I also misunderstood how it worked, now I understand.

Adding a keyword will be clearer, but I suppose just documenting it properly with some examples should be ok as well? Keywords for many different use cases where you could use it but actually mapping to the same thing might make the API even less understandable in end, even though for this one case this is not the case yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants