Skip to content

zflamig/dask-era5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Working with ERA5 using Dask and AWS Fargate

This example uses AWS CloudFormation to create an Amazon SageMaker Jupyter Notebook and AWS Fargate cluster for using Dask for distributed computation over large data volumes.

The Jupyter notebook shows an example of how to use Dask to load netcdf files directly from S3. The mean and standard deviation of the loaded data are then computed to demonstrate how Dask can be used to accelerate computations over large data volumes. Finally, time series are pulled from the loaded data to demonstrate how to select specific locations in a raster field.

Getting started

cloudformation-launch-stack

  1. Launch the stack, by default it will be in the us-east-1 region (since that's where the ERA5 data is) but you can change it to any region you prefer.
  2. On the Parameters page, enter your DaskWorkerGitToken which is a GitHub OAuth Token. See below for how to get one if you don't have it. You can leave all the other parameters alone for now.
  3. Hit next twice, agree that you know this will create IAM resources.
  4. Wait for the stack to create, and then navigate to the Outputs tab for the link to your Jupyter Notebook.

Github OAuth Token

The AWS services require a GitHub OAuth token to be able to build the Docker container image for the Dask worker & scheduler nodes. To generate the token go to https://github.com/settings/tokens. It is enough for the token to only have public_repo permissions.

Architecture

architecture

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published