Este lab esta diseñado para servir como metodo de aprendizaje de Kedro + Great expectations
Contiene 2 datasets
- Iris data
- Netflix Tittles
Inicializemos el ambiente con
great_expectations init
Luego debemos confirmar el tipo de dataset, path y nombre que queremos darle.
Confirmemos para hacer el profiling:
================================================================================
Would you like to profile new Expectations for a single data asset within your new Datasource? [Y/n]: y
Would you like to:
1. choose from a list of data assets in this datasource
2. enter the path of a data file
: 1
Which data would you like to use?
1. iris (file)
2. netflix_titles (file)
Don't see the name of the data asset in the list above? Just type it
: 1
Name the new Expectation Suite [iris.warning]:
Great Expectations will choose a couple of columns and generate expectations about them
to demonstrate some examples of assertions you can make about your data.
Great Expectations will store these expectations in a new Expectation Suite 'iris.warning' here:
file:///workspace/kedro_ge/great_expectations/expectations/iris/warning.json
Would you like to proceed? [Y/n]:
Generating example Expectation Suite...
Done generating example Expectation Suite
================================================================================
Would you like to build Data Docs? [Y/n]:
The following Data Docs sites will be built:
- local_site: file:///workspace/kedro_ge/great_expectations/uncommitted/data_docs/local_site/index.html
Would you like to proceed? [Y/n]:
Building Data Docs...
Done building Data Docs
Would you like to view your new Expectations in Data Docs? This will open a new browser window. [Y/n]:
para agregar un nuevo dataset basta que corramos:
great_expectations datasource new
y volveremos a pasar por el menu para agregar un dataset.
great_expetations datasource profile <my_dataset>
great_expectations suite edit
great_expectations suite new
great_expectations suite list
great_expectations suite delete
great_expectations docs build
great_expectations docs clean
great_expectations checkpoint new
great_expectations checkpoint list
great_expectations checkpoint run
great_expectations checkpoint script
great_expectations datasource list
great_expectations datasource profile
great_expectations datasource delete
great_expectations validation-operator run
great_expectations init
https://docs.greatexpectations.io/en/latest/reference/core_concepts.html#key-ideas
https://docs.greatexpectations.io/en/latest/reference/glossary_of_expectations.html
Existe un Kernel error cuando se edita la Suite desde
great_expectations suite edit <suite_name>
para remediar esto, es mejor abrir una sesion de Kedro con un parametro para abrir el puerto:
kedro jupyter notebook --NotebookApp.allow_origin=\'$(gp url 8888)\'