Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pipelines] Design - Pipelines language should be accessible to new users #1004

Open
Jason94 opened this issue Feb 27, 2024 · 0 comments
Open
Labels
Pipelines Project This issue is for the Pipelines project

Comments

@Jason94
Copy link
Collaborator

Jason94 commented Feb 27, 2024

The Pipelines project targets two user groups. One of them are new users to Python, who might be familiar with basic SQL but don't know programming. We'll call them new users It's critical to the mission of Pipelines that it's accessible to new users.

New users should be able to easily:

  • Use pipelines to perform ETL tasks
  • Take advantage of advanced pipelines features like data orchestration (Prefect integration as an example) "out of the box"

New users do not need to be able to perform more advanced tasks like:

  • Writing new pipes
  • Extending the pipelines framework with new functionality, like adding data orchestration plugins.

To focus the conversation, below is a copy of the current example script in the pipelines branch (as of 2/27/2024). I believe it captures the surface area of functionality new users will be expected to engage with.

    clean_year = CompoundPipe(
        filter_rows("{Year} is not None"),
        convert("Year", int)
    )

    load_after_1975 = Pipeline(
        "Load after 1975",
        load_from_csv("deniro.csv"),
        clean_year(),
        filter_rows("{Year} > 1975"),
        write_csv("after_1975.csv")
    )
    split_on_1980 = Pipeline(
        "Split on 1980",
        load_from_csv("deniro.csv"),
        clean_year(),
        split_data("'gte_1980' if {Year} >= 1980 else 'lt_1980'"),
        for_streams({
            "lt_1980": write_csv("before_1980.csv"),
            "gte_1980": write_csv("after_1979.csv")
        })
    )

    save_lotr_books = Pipeline(
        "Save LOTR Books",
        load_lotr_books_from_api(),
        write_csv("lotr_books.csv")
    )

    after_1990_and_all_time = Pipeline(
        "Copy into streams test",
        load_from_csv("deniro.csv"),
        clean_year(),
        copy_data_into_streams("0", "1"),
        for_streams({
            "0": CompoundPipe(
                filter_rows("{Year} > 1990"),
                write_csv("after_1990.csv")
            )(),
            "1": write_csv("all_years.csv")
        })
    )

    dashboard = Dashboard(
        load_after_1975,
        split_on_1980,
        save_lotr_books,
        after_1990_and_all_time,
    )
    dashboard.run()
@Jason94 Jason94 added the Pipelines Project This issue is for the Pipelines project label Feb 27, 2024
@Jason94 Jason94 changed the title [Pipelines] Design Conversation [Pipelines] Design - Pipelines language should be accessible to new users Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pipelines Project This issue is for the Pipelines project
Projects
None yet
Development

No branches or pull requests

1 participant