Extend path-like keys in the catalog #4309
Replies: 5 comments
-
Hi @PetitLepton, we haven't heard this request before, but we'd be happy to accept a contribution for it 🙂 |
Beta Was this translation helpful? Give feedback.
-
Coming back to this. The offending code is kedro/kedro/framework/context/context.py Lines 91 to 92 in e8f1bfd We spoke about this at length when debating #2965 and related "Kedro as a library" issues. The current hardcoded list is less than ideal, but this "magic" behavior is deeply ingrained in Kedro at the moment, and the only realistic way to move past it is to introduce variables in the filepaths so that the user is in control: whatever:
type: my_project.extras.datasets.templated_sql_dataset.TemplatedSQLQueryDataSet
filepath: ${kedro:project_root}/data/01_raw/template.sql I think the proposed solution, namely
Goes in the opposite direction of expanding this behavior. If you want to avoid mangling there's a workaround which is not using This probably deserves some more discussion. |
Beta Was this translation helpful? Give feedback.
-
Hi @PetitLepton, @astrojuanlu - I believe #3561 resolves this issue and gets rid of the hardcoding. We now use relative paths across the board, does the example above still have issues? |
Beta Was this translation helpful? Give feedback.
-
I think this issue is still there: kedro/kedro/framework/context/context.py Lines 93 to 94 in f54c6fb Recently I saw a user doing lots of
so it's a pattern that our users are putting in the wild now. Instead of "let's extend or make configurable the keys that receive a special treatment", I think we should find ways to support something like this: whatever:
type: my_project.extras.datasets.templated_sql_dataset.TemplatedSQLQueryDataSet
filepath: ${kedro:project_root}/data/01_raw/template.sql which would also solve @PetitLepton original request in #1934 (comment). This essentially goes back to option 3 in our original debate #2924 (comment), imitating what Intake does https://intake.readthedocs.io/en/latest/catalog.html#yaml-format, and @yetudada's question here #2819 (comment) which we addressed by adding more docs #3247. |
Beta Was this translation helpful? Give feedback.
-
Not much has changed since my last comment. I'm personally not happy about this bit of magic that continues to bite users but no solutions have been proposed. About this particular proposal of expanding the path-like keys, I'm turning this into a Discussion to signal it's an enhancement proposal. |
Beta Was this translation helpful? Give feedback.
-
Description
In the current implementation of the parsing of the context, the function
_convert_paths_to_absolute_posix
— which converts relative to absolute paths — restrict the conversion to three types of keys, namelyThis is very restrictive and can lead to unexpected issues while implementing custom data set as explained in the example below.
Context
Let's imagine that I want to extend the
SQLQueryDataSet
by using a Jinja template and feed it with parameters. I would have done something like the followingWhile this works when running a pipeline or using
kedro ipython
, it potentially breaks when used in a notebook undernotebooks
with a catalog entry likebecause the relative paths are not properly parsed when loading the context.
One can solve this particular example by replacing the two paths by
filepath
andpath
for example which belong to the list of parsed keys mentioned at the top of this issue. In terms of user experience, it is nevertheless pretty unintuitive. Another possibility is to have nested elements like"template": {"filepath": ...}
.Possible Implementation
Could we use something like a regex containing
path
instead of a rigid list?Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions