This package includes macros and scripts to be used within a dbt project to flatten json and array documents. It is designed to work with data from Fivetran's MongoDB connector "packed mode" described by this documentation.
To use this dbt package, you must have the following:
- At least one Fivetran MongoDB connector syncing data into your destination.
- A SQL Server destination.
By default, this package runs using your destination and the mongo
schema. If this is not where your MongoDB data is (for example, if your MongoDD schema is named mongo_fivetran
), add the following configuration to your root dbt_project.yml
file:
vars:
mongo_database: your_database_name
mongo_schema: your_schema_name
In order to use the macros included in this package, you will need to have a properly configured MongoDB source named mongo
in your own dbt project. An example of a properly configured mongo source yml can be found in the src_mongo.yml
file in integration_tests. This integration_tests folder is just for testing purposes - your source file will need to be in the models folder of your root dbt project. You are welcome to copy/paste the example source configuration into your src_mongo.yml
file and modify for your use case.
In particular, you will need the following sources defined in your src_mongo.yml
file:
version: 2
sources:
- name: mongo # It would be best to keep this named mongo
database: {{ var('mongo_database', target.database) }} # Configure the database where your MongoDB data resides if different from the target database. Eg. 'my_mongo_database'. By default the target.database is used.
schema: "{{ var('mongo_schema', 'mongo') }}" # Configure the mongo_schema var from your dbt_project.yml (alternatively you can hard-code the schema here if only using one MongoDB connector)
tables:
- name: my_first_table # Duplicate this section, configure the table name and keep the columns section as is
columns:
- name: id
description: System-generated unique ID of the active volume measurement.
- name: data
description: the packed data
- name: my_second_table
columns:
- name: id
description: System-generated unique ID of the active volume measurement.
- name: data
description: the packed data
## Any other source tables you are creating models for should be defined here as well. They aren't required, but it is best organizational practice and allows Fivetran to compile data lineage graphs
If you would like your model to flatten the entire document at once related to your source table then you will need to:
- Create a new file in your models folder and name it
your_table_name_here.sql
(e.g.customer.sql
; this is not necessary but recommended as best practice). - Add the below snippet calling the
generate_flatten_json
macro into the file. Update thesource_table
argument to be the source table name for which you are generating the model (e.g.customer
). Thoughmode
is explicitly included here, it isflatten_array
by default.
{{
generate_flatten_json(
model_name = source('mongo', 'source_table'),
json_column = 'data',
mode = 'flatten_array'
)
}}
If you would like your model to flatten the document but store arrays in singular fields then you will need to:
- Create a new file in your models folder and name it
your_table_name_here.sql
(e.g.customer.sql
; this is not necessary but recommended as best practice). - Add the below snippet calling the
generate_flatten_json
macro into the file. Update thesource_table
argument to be the source table name for which you are generating the model (e.g.customer
).
{{
generate_flatten_json(
model_name = source('mongo', 'source_table'),
json_column = 'data',
mode = 'do_not_flatten_array'
)
}}
If you would like your model to flatten arrays (generated by Option 2) then you will need to:
- Create a new file in your models folder and name it
your_array_table_name_here.sql
(e.g.customer_addresses.sql
; this is not necessary but recommended as best practice). - Add the below snippet calling the
generate_flatten_array
macro into the file.- Update the
source_model
argument to be the source model name for which you are flattening the array (e.g. model name from Step 2). - Update the
column_name
argument to be the column name containing the array to flatten(e.g.addresses
).
- Update the
{{
generate_flatten_array(
model_name = ref('source_model'),
json_column = 'column_name'
)
}}
Once you have created all your desired models and copied/modified the sql snippet into each model you will execute dbt run
to generate the models.
Expand to view details
Fivetran offers the ability for you to orchestrate your dbt project through Fivetran Transformations for dbt Core™. Learn how to set up your project for orchestration through Fivetran in our Transformations for dbt Core setup guides.
generate_flatten_json (source)
This macro flattens the packed document (json) generated by the MongoDB connector to the desired table.
Usage:
{{
generate_flatten_json(
model_name = source('mongo', 'source_table'),
json_column = 'data',
mode = 'flatten_array'
)
}}
Args:
source_table
(required): The table containing the data to flatten.json_column
(required): The table containing the packed data. Set to data when using "Packed/Standard mode"mode
(optional, default =flatten_array
): Defines whether arrays are flattened or stored in a single column. Supported values areflatten_array
anddo_not_flatten_array
generate_flatten_array (source)
This macro flattens the packed document (json) generated by the MongoDB connector to the desired table.
Usage:
{{
generate_flatten_array(
model_name = ref('source_model'),
json_column = 'column_name'
)
}}
Args:
source_model
(required): The model containing the array to flatten.column_name
(required): The name of the column containing the array
No
There is no maintenance of this package
Contributions are welcome
- Have questions or want to just say hi? Raise an issue