Make include_nodata argument available for built-in operations #147

dbaston · 2024-08-26T12:51:01Z

Would make count(include_nodata=true) available as a more legible alternative to e.g. count(default_value=0). Not sure if it's worth it.

The text was updated successfully, but these errors were encountered:

theroggy · 2024-08-26T14:18:23Z

Not sure what the effort is, so difficult to judge the cost/benefits, but at least it is quite a bit cleaner. In quite some cases it might even be impossible to use the work around.

Finding a values that is surely not being used next to the actual nodata value will at least be a fuss, and in some cases it will even be impossible without actually recoding the data, or using a fake value if that works?

E.g. data that has been encoded to a byte, often all values are "taken", so choosing a default_value different than the actual nodata value can be "difficult". Maybe it is possible to just pick any in value... but anyway it is a hassle.
Not sure why it is called default_value anyway instead of nodata or a variant.

dbaston · 2024-08-26T14:54:32Z

Maybe it is possible to just pick any in value... but anyway it is a hassle.

I was thinking of "count", where it doesn't matter if the value is "taken." But I guess it's also useful for "unique" and "frac".

Not sure why it is called default_value anyway instead of nodata or a variant.

I would expect nodata to specify a value to be ignored, whereas this is specifying a value to use in place of nodata. I think of it like an SQL COALESCE(value, default_value). For example, for a population raster that uses NaN for ocean cells, you would want these to be considered as 0 for most stats. The problem would really best be solved with a GDAL VRT, but it's unfortunately a bit cumbersome to construct one for this scenario.

theroggy · 2024-08-26T17:23:39Z

Not sure why it is called default_value anyway instead of nodata or a variant.

I would expect nodata to specify a value to be ignored, whereas this is specifying a value to use in place of nodata. I think of it like an SQL COALESCE(value, default_value). For example, for a population raster that uses NaN for ocean cells, you would want these to be considered as 0 for most stats. The problem would really best be solved with a GDAL VRT, but it's unfortunately a bit cumbersome to construct one for this scenario.

True... clear names (for everyone) are sometimes difficult to find. Once explained it does make sense... I also misunderstood how it worked, now I understand.

Adding a keyword will be clearer, but I suppose just documenting it properly with some examples should be ok as well? Keywords for many different use cases where you could use it but actually mapping to the same thing might make the API even less understandable in end, even though for this one case this is not the case yet?

dbaston added the enhancement New feature or request label Aug 26, 2024

theroggy mentioned this issue Aug 29, 2024

DOC: small improvements to docs #150

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make include_nodata argument available for built-in operations #147

Make include_nodata argument available for built-in operations #147

dbaston commented Aug 26, 2024

theroggy commented Aug 26, 2024

dbaston commented Aug 26, 2024

theroggy commented Aug 26, 2024 •

edited

Loading

Make include_nodata argument available for built-in operations #147

Make include_nodata argument available for built-in operations #147

Comments

dbaston commented Aug 26, 2024

theroggy commented Aug 26, 2024

dbaston commented Aug 26, 2024

theroggy commented Aug 26, 2024 • edited Loading

theroggy commented Aug 26, 2024 •

edited

Loading