Replies: 5 comments
-
Thank you for reporting this, we will have a look at this soon. This example looks artificial, did you encounter this problem when you are using certain libraries? |
Beta Was this translation helpful? Give feedback.
-
Potentially a bug of #2161 |
Beta Was this translation helpful? Give feedback.
-
no, my use case is I create a parameterized sampler which have pretty complicated behavior where each iteration depends on the status of previous iteration, and I don't know how many iteration it will generate in advance, so it make sense to implement it as a iterator, which I have several down stream task to consume the sampler and generate analytics |
Beta Was this translation helpful? Give feedback.
-
I think generator functions only work for passing to datasets, but can't be passed through to other nodes (via other datasets)? This is something I expected to work, but ran into an issue with, in https://github.com/deepyaman/partitioned-dataset-demo/tree/dd2d05f14fac0d2ff7fb4a949e8aac062dc70431: (kedro) deepyaman@deepyaman-mac new-kedro-project % kedro run
[05/31/24 13:48:45] INFO Kedro project new-kedro-project session.py:324
[05/31/24 13:48:46] INFO Using synchronous mode for loading and saving data. Use the --async flag for potential performance gains. sequential_runner.py:64
https://docs.kedro.org/en/stable/nodes_and_pipelines/run_a_pipeline.html#load-and-save-asynchronously
INFO Loading data from params:n (MemoryDataset)... data_catalog.py:483
INFO Running node: generate_emails([params:n]) -> [emails] node.py:361
INFO Saving data to emails (PartitionedDataset)... data_catalog.py:525
INFO Completed 1 out of 4 tasks sequential_runner.py:90
INFO Loading data from emails (PartitionedDataset)... data_catalog.py:483
INFO Running node: capitalize_content([emails]) -> [capitalized_emails] node.py:361
INFO Saving data to capitalized_emails (PartitionedDataset)... data_catalog.py:525
INFO Completed 2 out of 4 tasks sequential_runner.py:90
INFO Loading data from capitalized_emails (PartitionedDataset)... data_catalog.py:483
INFO Running node: extract_content([capitalized_emails]) -> [contents] node.py:361
INFO Saving data to contents (MemoryDataset)... data_catalog.py:525
INFO Saving data to contents (MemoryDataset)... data_catalog.py:525
INFO Saving data to contents (MemoryDataset)... data_catalog.py:525
INFO Completed 3 out of 4 tasks sequential_runner.py:90
INFO Loading data from contents (MemoryDataset)... data_catalog.py:483
INFO Running node: tokenize([contents]) -> [tokens] node.py:361
INFO Saving data to tokens (PartitionedDataset)... data_catalog.py:525
INFO Completed 4 out of 4 tasks sequential_runner.py:90
INFO Pipeline execution completed successfully. runner.py:119 Would expect |
Beta Was this translation helpful? Give feedback.
-
Turning this into a discussion. |
Beta Was this translation helpful? Give feedback.
-
Description
when you define a node to return an iterator, kedro pipeline will iterate over the iterator
Context
I am aware kedro support generator function in node, it is good, however, kedro shouldn't assume what user want to do with the generator, and should pass it as what it is by default, rather than implicitly guess when people want to execute the generator, and should provide explicit argument in pipeline.node whether user want to keep it as it is or execute the generator.
in general, people expect the pipeline to work as pure function by default, and other behavior should be explicit
currently, there's no way to specify that I want to keep the iterator as it is as output, and the iterator is always exhausted when running the node.
Steps to Reproduce
in nodes.py:
in pipeline.py:
Expected Result
res should be 55
Actual Result
res is 0
pip show kedro
orkedro -V
): kedro, version 0.18.11python -V
): Python 3.10.12Beta Was this translation helpful? Give feedback.
All reactions