Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recipe modules with injectable kws should still be importable (use defaults?) #587

Closed
cisaacstern opened this issue Aug 30, 2023 · 7 comments · Fixed by #588
Closed

Recipe modules with injectable kws should still be importable (use defaults?) #587

cisaacstern opened this issue Aug 30, 2023 · 7 comments · Fixed by #588

Comments

@cisaacstern
Copy link
Member

We currently expect recipes to leave certain injectable kws empty, e.g. StoreToZarr.target_root, with the understanding that pangeo-forge-runner will inject values for these kws at deploy time. (Better documentation of this is forthcoming.)

StoreToZarr.target_root is required, meaning modules leaving it unassigned are not importable, raising

TypeError: __init__() missing 1 required positional argument: 'target_root'

This is suboptimal for at least two reasons:

  • Linters/type-checkers don't like it
  • We can't import this module to do other things with it outside the pangeo-forge-runner deployment context. An example (which motivated this opening issue for me) would be to import functions defined in the module for unit testing.

I see a few possible fixes:

  • ❌ Use kw_only dataclasses to allow the ZarrWriterMixin (from which StoreToZarr inherits target_root) to define an inheritable default for target_root. This is my favorite option but its not tractable because this feature is only available in Python 3.10 and not easily back-ported into 3.9.
  • ❌ Set a default target_root on ZarrWriterMixin and give all args on its child classes defaults. I don't like this because some of the child class args are required (and therefore should not have defaults).
  • ✅ Move target_root out of ZarrWriterMixin and just require that child classes declare this arg themselves. Possibly enforceable via a __post_init__ on ZarrWriterMixin? This means the only value of the mixin is to define the get_forge_target method, which still seems worthwhile to me.

In the meantime, the workaround for this would be to just give a placeholder value to StoreToZarr.target_root in the recipe, under the assumption that pangeo-forge-runner will still overwrite it. 🤔

@yuvipanda
Copy link
Contributor

I actually think the solution is to instead support python 3.10 and up only. dataflow used to require 3.9 but that's not been true for a while.

@yuvipanda
Copy link
Contributor

We already test 3.10 in this repo, and #567 tests on 3.11 (and the tests pass). And pangeo-forge/pangeo-forge-runner#90 enables us running on all python versions.

@yuvipanda
Copy link
Contributor

And this is a good motivating reason to drop 3.9 support :)

@cisaacstern
Copy link
Member Author

And this is a good motivating reason to drop 3.9 support :)

Is this something we should be motivated to do? If 3.9 works (and people are used to using it), why drop now just for this feature?

In the meantime, the workaround for this would be to just give a placeholder value to StoreToZarr.target_root in the recipe, under the assumption that pangeo-forge-runner will still overwrite it. 🤔

I discovered this is not a workaround. Setting something like:

recipe = (
    ...
    | StoreToZarr(
         ...
         target_root="placeholder",
    )

results in pangeo-forge-runner's rewriter throws SyntaxError: keyword argument repeated: target_root:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/bin/pangeo-forge-runner", line 8, in <module>
    sys.exit(main())
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_runner/cli.py", line 28, in main
    app.start()
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_runner/cli.py", line 23, in start
    super().start()
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/traitlets/config/application.py", line 464, in start
    return self.subapp.start()
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_runner/commands/bake.py", line 201, in start
    recipes = feedstock.parse_recipes()
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_runner/feedstock.py", line 81, in parse_recipes
    recipes[r["id"]] = self._import(r["object"])
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_runner/feedstock.py", line 66, in _import
    exec(compile(source=rewritten_ast, filename=filename, mode="exec"), ns)
  File "feedstock/recipe.py", line 85
    | StoreToZarr(
      ^
SyntaxError: keyword argument repeated: target_root

but this issue is avoided by using a default arg; e.g. #588 does work, as tested in pangeo-forge/aqua-modis-feedstock@2ac8ca7, which successfully deployed to Dataflow while replacing the default target_root.

This makes sense, because it's equivalent to:

def something(foo="bar"):
   pass

something(foo="baz")

which is fine. As opposed to:

def something(foo):
   pass

something(foo="bar", foo="baz")

which is not 😄 .

@yuvipanda
Copy link
Contributor

Is this something we should be motivated to do? If 3.9 works (and people are used to using it), why drop now just for this feature?

The way I think of it, 3.9 is actually holding us back. For example, we can not actually use the most recent version of the pangeo/forge image because it's no longer on 3.9, but on 3.10. And that would bring us newer version of beam, but we can't do that because it's tied to version of 3.10.

In all the systems I maintain, the only python 3.9 still present is because of the pangeo/forge pin. Nowhere else. This also actually bit @rsignell-usgs when we were trying to get flink running, as the default python installed in his environment was not actually 3.9. These are all the problems of the pangeo/forge pin ofc rather than supporting 3.9 - but mostly saying that I don't think anyone using this right now wants to use 3.9. Our pangeo/forge pin forces them to, and that's why they are.

So I think I would like to frame the question instead as:

Supporting Python 3.9 takes effort, so we should do it if there are people currently using it who are on python 3.9 but can not migrate for some reason.

@yuvipanda
Copy link
Contributor

Separately, that seems to be a bug in the rewriter - if the target argument is already present, I think the rewriter should 'do nothing'. Can you open an issue there?

@cisaacstern
Copy link
Member Author

Your context for the state of 3.9 use in production is much broader than mine, and this argument is convincing that we should move to drop that.

As noted in #588 (comment), #588 is perhaps less concise than it might be with 3.10, but it does fix for this specific issue of importability.

What would you say to use merging #588 to close this issue, and also opening a new issue to deprecate 3.9 support, which notes the discussion here insofar as deprecating 3.9 may motivate us to adjust the implementation detail of where #588 defines the default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants