Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review validation and shapes in demography #342

Merged
merged 21 commits into from
Nov 6, 2024

Conversation

davidorme
Copy link
Collaborator

@davidorme davidorme commented Oct 25, 2024

Description

This PR:

  • Creates some new validation tools in demography.core:

    • _validate_demography_array_arguments which is intended to replace demography.t_model_functions._validate_t_model_args and demography.crown._validate_z_qz_args. These two overlap a lot. The function will check that:
      • trait arguments (the columns in demography data arrays) and 'size' arguments like height and stem diameter (the rows in demography data arrays) are congruent.
      • that additional "at size" arguments are then congruent with the trait and size data. An example use case here is allocating GPP values to existing allometries for stems at a given size - users need to provide congruently shaped GPP inputs.
    • _enforce_2D, which will be used to ensure a common return dimensionality from functions: various legal inputs can give rise to 0, 1 or 2D results from calculations and it seems easier to fix the single return array.
    • Adds tests for these tools
  • Updates validation and shapes in demography:

    • StemTraits is now validated when created directly, but Flora.get_stem_traits() bypasses validation because the inputs are guaranteed correct.
    • The functions in t_model_functions now use _validate_demography_array_arguments and have new validation flag arguments. The StemAllometry and StemAllocation classes now also optionally validate the inputs once and then the inputs should be congruent so the individual function validation is skipped.
    • The same validation has then been rolled out across the other modules: crown, community and canopy. The tests are updated to match and the docs issues are fixed.

Dimensionality and validation

There's an issue with differing approaches to validation here.

  1. Most of the validation is about checking the array shapes of inputs are congruent with each other. The _enforce_2D function is about ensuring that the output dimensions are consistent with the sizes vs stems array size model.
  2. The first drafts of this PR (and the functions it is replacing/overhauling) use check_input_shapes within the validation. This is a pyrealm.core function that was setup for use with the PModel parts of the code and it insists that arrays are not simply broadcastable, but are identically shaped or scalar.
  3. This actually leads to one of the features of the current broader pyrealm API that I dislike and would like to get rid of: users have to broadcast inputs to get shapes to match to get things to run. I suspect we could relax check_input_shapes and simply apply np.broadcast_shapes as a validation tool. The existing code should then handle broadcasting correctly and users would not need to pre-broadcast data. They would still have to know which axis is which and allowing xarray inputs would make this easier.
  4. But - specifically if we use check_input_shapes here - then the internals of the demography are going to start insisting on identical shapes where values are actually sanely broadcastable. And that ends up with insisting that all the initial demography inputs are identically shaped - which is a significant usability issue and makes for uglier and more convoluted code.

So - I'm going to try and get the demography working with the less constraining 'broadcastable' checking. I think that's something we should try and adopt more widely, but there is a difference in approach for now.

Fixes #317

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@davidorme davidorme linked an issue Oct 25, 2024 that may be closed by this pull request
Comment on lines +87 to +91
_validate_demography_array_arguments(
trait_args={"h_max": h_max, "a_hd": a_hd}, size_args={"dbh": dbh}
)

return h_max * (1 - np.exp(-a_hd * dbh / h_max))
return _enforce_2D(h_max * (1 - np.exp(-a_hd * dbh / h_max)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j-emberton and @MarionBWeinzierl

I just wanted to pause here and get a sanity check before rolling this out further. Basically, there are two axes here: traits of a cohort (columns) and some measure of size for which something is being calculated. All of the functions accept arrays with varying dimensions, and the calculations can give rise to output arrays with 0, 1 or 2 dimensions. The calculations all work, but then the response dimensionality is unpredictable. My plan here is to promote all outputs to (n_traits, n_sizes), so that there is consistent outputs.

Does that seem sane?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think the utility of having a consistent shape so that all downstream calcs can be simplified to assume this shape is reasonable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh god. This actually opens a bit of a can of worms. I've added some discussion to the PR description.

@codecov-commenter
Copy link

codecov-commenter commented Oct 31, 2024

Codecov Report

Attention: Patch coverage is 99.00990% with 1 line in your changes missing coverage. Please review.

Project coverage is 96.09%. Comparing base (1f315ba) to head (6e8423c).
Report is 442 commits behind head on develop.

Files with missing lines Patch % Lines
pyrealm/demography/t_model_functions.py 97.82% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #342      +/-   ##
===========================================
+ Coverage    95.29%   96.09%   +0.80%     
===========================================
  Files           28       35       +7     
  Lines         1720     2766    +1046     
===========================================
+ Hits          1639     2658    +1019     
- Misses          81      108      +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@davidorme davidorme marked this pull request as ready for review October 31, 2024 21:40
Copy link
Collaborator

@MarionBWeinzierl MarionBWeinzierl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks sensible to me. I agree that the _enforce_2D is not ideal, but I cannot think of a more elegant way of doing this right now.

Copy link
Collaborator

@MarionBWeinzierl MarionBWeinzierl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error messages in two of the tests might not be complete, see above.

@davidorme davidorme merged commit 1b41b80 into develop Nov 6, 2024
23 checks passed
@davidorme davidorme deleted the 317-review-validation-and-shapes-in-demography branch November 6, 2024 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Review validation and return shapes from demography objects
4 participants