Working with ERA5 using Dask and AWS Fargate

This example uses AWS CloudFormation to create an Amazon SageMaker Jupyter Notebook and AWS Fargate cluster for using Dask for distributed computation over large data volumes.

The Jupyter notebook shows an example of how to use Dask to load netcdf files directly from S3. The mean and standard deviation of the loaded data are then computed to demonstrate how Dask can be used to accelerate computations over large data volumes. Finally, time series are pulled from the loaded data to demonstrate how to select specific locations in a raster field.

Getting started

Launch the stack, by default it will be in the us-east-1 region (since that's where the ERA5 data is) but you can change it to any region you prefer.
On the Parameters page, enter your DaskWorkerGitToken which is a GitHub OAuth Token. See below for how to get one if you don't have it. You can leave all the other parameters alone for now.
Hit next twice, agree that you know this will create IAM resources.
Wait for the stack to create, and then navigate to the Outputs tab for the link to your Jupyter Notebook.

Github OAuth Token

The AWS services require a GitHub OAuth token to be able to build the Docker container image for the Dask worker & scheduler nodes. To generate the token go to https://github.com/settings/tokens. It is enough for the token to only have public_repo permissions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Working with ERA5 using Dask and AWS Fargate

Getting started

Github OAuth Token

Architecture

Files

README.md

Latest commit

History

README.md

File metadata and controls

Working with ERA5 using Dask and AWS Fargate

Getting started

Github OAuth Token

Architecture