Review validation and shapes in demography #342

davidorme · 2024-10-25T16:33:56Z

Description

This PR:

Creates some new validation tools in demography.core:
- _validate_demography_array_arguments which is intended to replace demography.t_model_functions._validate_t_model_args and demography.crown._validate_z_qz_args. These two overlap a lot. The function will check that:
  - trait arguments (the columns in demography data arrays) and 'size' arguments like height and stem diameter (the rows in demography data arrays) are congruent.
  - that additional "at size" arguments are then congruent with the trait and size data. An example use case here is allocating GPP values to existing allometries for stems at a given size - users need to provide congruently shaped GPP inputs.
- _enforce_2D, which will be used to ensure a common return dimensionality from functions: various legal inputs can give rise to 0, 1 or 2D results from calculations and it seems easier to fix the single return array.
- Adds tests for these tools
Updates validation and shapes in demography:
- StemTraits is now validated when created directly, but Flora.get_stem_traits() bypasses validation because the inputs are guaranteed correct.
- The functions in t_model_functions now use _validate_demography_array_arguments and have new validation flag arguments. The StemAllometry and StemAllocation classes now also optionally validate the inputs once and then the inputs should be congruent so the individual function validation is skipped.
- The same validation has then been rolled out across the other modules: crown, community and canopy. The tests are updated to match and the docs issues are fixed.

Dimensionality and validation

There's an issue with differing approaches to validation here.

Most of the validation is about checking the array shapes of inputs are congruent with each other. The _enforce_2D function is about ensuring that the output dimensions are consistent with the sizes vs stems array size model.
The first drafts of this PR (and the functions it is replacing/overhauling) use check_input_shapes within the validation. This is a pyrealm.core function that was setup for use with the PModel parts of the code and it insists that arrays are not simply broadcastable, but are identically shaped or scalar.
This actually leads to one of the features of the current broader pyrealm API that I dislike and would like to get rid of: users have to broadcast inputs to get shapes to match to get things to run. I suspect we could relax check_input_shapes and simply apply np.broadcast_shapes as a validation tool. The existing code should then handle broadcasting correctly and users would not need to pre-broadcast data. They would still have to know which axis is which and allowing xarray inputs would make this easier.
But - specifically if we use check_input_shapes here - then the internals of the demography are going to start insisting on identical shapes where values are actually sanely broadcastable. And that ends up with insisting that all the initial demography inputs are identically shaped - which is a significant usability issue and makes for uglier and more convoluted code.

So - I'm going to try and get the demography working with the less constraining 'broadcastable' checking. I think that's something we should try and adopt more widely, but there is a difference in approach for now.

Fixes #317

Type of change

New feature (non-breaking change which adds functionality)
Optimization (back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)

Key checklist

Make sure you've run the pre-commit checks: $ pre-commit run -a
All tests pass: $ poetry run pytest

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

…raphy

davidorme · 2024-10-25T16:40:01Z

pyrealm/demography/t_model_functions.py

+        _validate_demography_array_arguments(
+            trait_args={"h_max": h_max, "a_hd": a_hd}, size_args={"dbh": dbh}
+        )

-    return h_max * (1 - np.exp(-a_hd * dbh / h_max))
+    return _enforce_2D(h_max * (1 - np.exp(-a_hd * dbh / h_max)))


@j-emberton and @MarionBWeinzierl

I just wanted to pause here and get a sanity check before rolling this out further. Basically, there are two axes here: traits of a cohort (columns) and some measure of size for which something is being calculated. All of the functions accept arrays with varying dimensions, and the calculations can give rise to output arrays with 0, 1 or 2 dimensions. The calculations all work, but then the response dimensionality is unpredictable. My plan here is to promote all outputs to (n_traits, n_sizes), so that there is consistent outputs.

Does that seem sane?

Yes I think the utility of having a consistent shape so that all downstream calcs can be simplified to assume this shape is reasonable.

Oh god. This actually opens a bit of a can of worms. I've added some discussion to the PR description.

…nts and _enforce_2D

…idation in StemAllometry

…raphy

codecov-commenter · 2024-10-31T21:39:10Z

Codecov Report

Attention: Patch coverage is 99.00990% with 1 line in your changes missing coverage. Please review.

Project coverage is 96.09%. Comparing base (1f315ba) to head (6e8423c).
Report is 442 commits behind head on develop.

Files with missing lines	Patch %	Lines
pyrealm/demography/t_model_functions.py	97.82%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #342      +/-   ##
===========================================
+ Coverage    95.29%   96.09%   +0.80%     
===========================================
  Files           28       35       +7     
  Lines         1720     2766    +1046     
===========================================
+ Hits          1639     2658    +1019     
- Misses          81      108      +27

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tests/unit/demography/test_t_model_functions.py

MarionBWeinzierl

Overall, it looks sensible to me. I agree that the _enforce_2D is not ideal, but I cannot think of a more elegant way of doing this right now.

MarionBWeinzierl

I think the error messages in two of the tests might not be complete, see above.

…raphy

davidorme added 6 commits October 24, 2024 21:29

Draft core validator function, applying to StemTraits with tests

46fc32e

Suppress validation in get_stem_traits, test suppression

da7ff07

Updating tests, adding _enforce_2D

f1cf17b

Removed duplicate function

e22b809

Minor stuff

fd4c4de

Merge branch 'develop' into 317-review-validation-and-shapes-in-demog…

d79d41d

…raphy

davidorme linked an issue Oct 25, 2024 that may be closed by this pull request

Review validation and return shapes from demography objects #317

Closed

davidorme commented Oct 25, 2024

View reviewed changes

davidorme added 14 commits October 29, 2024 15:40

Converting t_model_functions to use _validate_demography_array_argume…

78e0622

…nts and _enforce_2D

Added validation to crown shape functions and StemAllometry, test val…

87541c7

…idation in StemAllometry

Extended _validate_demography_array_arguments to include at_size_args

5e0da94

Extend _validate_demography_array_arguments core tests

3a5334c

Extending validation to StemAllocation

7773ede

Move validation from identical shape to broadcastable

566798d

Updating validation testing on flora and tmodel modules

496abe1

Updating crown module testing and validation

6a3d06e

Handle add and drop cohorts with 2D inputs

be8899a

Updating canopy module

e724fff

Moar test fixes

329e2c0

Testing added for get_crown_xy, doc issues fixed

548b2aa

Reverting bugfixing changes in conftest

2725ef8

Merge branch 'develop' into 317-review-validation-and-shapes-in-demog…

8445ccd

…raphy

davidorme marked this pull request as ready for review October 31, 2024 21:40

davidorme requested review from j-emberton and MarionBWeinzierl October 31, 2024 21:41

MarionBWeinzierl reviewed Nov 1, 2024

View reviewed changes

tests/unit/demography/test_t_model_functions.py Show resolved Hide resolved

tests/unit/demography/test_t_model_functions.py Show resolved Hide resolved

MarionBWeinzierl reviewed Nov 5, 2024

View reviewed changes

MarionBWeinzierl requested changes Nov 5, 2024

View reviewed changes

MarionBWeinzierl approved these changes Nov 5, 2024

View reviewed changes

Merge branch 'develop' into 317-review-validation-and-shapes-in-demog…

6e8423c

…raphy

davidorme merged commit 1b41b80 into develop Nov 6, 2024
23 checks passed

davidorme deleted the 317-review-validation-and-shapes-in-demography branch November 6, 2024 10:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review validation and shapes in demography #342

Review validation and shapes in demography #342

davidorme commented Oct 25, 2024 •

edited

Loading

davidorme Oct 25, 2024

j-emberton Oct 29, 2024

davidorme Oct 31, 2024

codecov-commenter commented Oct 31, 2024 •

edited

Loading

MarionBWeinzierl left a comment

MarionBWeinzierl left a comment

Review validation and shapes in demography #342

Review validation and shapes in demography #342

Conversation

davidorme commented Oct 25, 2024 • edited Loading

Description

Dimensionality and validation

Type of change

Key checklist

Further checks

davidorme Oct 25, 2024

Choose a reason for hiding this comment

j-emberton Oct 29, 2024

Choose a reason for hiding this comment

davidorme Oct 31, 2024

Choose a reason for hiding this comment

codecov-commenter commented Oct 31, 2024 • edited Loading

Codecov Report

MarionBWeinzierl left a comment

Choose a reason for hiding this comment

MarionBWeinzierl left a comment

Choose a reason for hiding this comment

davidorme commented Oct 25, 2024 •

edited

Loading

codecov-commenter commented Oct 31, 2024 •

edited

Loading