Skip to content

Latest commit

 

History

History
371 lines (262 loc) · 24.8 KB

README.md

File metadata and controls

371 lines (262 loc) · 24.8 KB

Contributors Forks Stargazers Issues GPL-3.0 License

GeoCARET-ReEmission with Docker

About The Library

This repository provides instructions for running two software packages: GeoCARET and RE-Emission from their respective Docker images.

GeoCARET is an open-source software for geospatial analysis of reservoirs and catchments. RE-Emission is an open-source tool for estimating, reporting and visualising reservoir emissions. These tools can be used sequentially to estimate emissions from multiple reservoirs. GeoCARET processes geospatial datasets to generate input data for emission models, which RE-Emission then uses to calculate, report and visualize emissions.

Both tools are written in Python and can be installed as Python packages. To simplify and speed up the installation process, we also provide their Docker images (see here for GeoCARET and here for RE-Emission) which enable running both tools via command-line in isolated environments.

This repository outlines how to run both packages as Docker containers and provides the folder structure required to facilitate data exchange between your local file system and the containerized applications.

Pre-Requisites

Docker Desktop, Docker Engine & Docker Compose

Before running the software, you need a working Docker installation. If you are using Windows or macOS, you should install Docker Desktop, which provides an easy-to-use interface for managing Docker on desktop operating systems. Please visit Docker's official documentation and follow the appropriate instructions for installing Docker Desktop on your system. Linux users also have the option of installing Docker Engine and Docker Compose as an alternative.

Docker can be set to start automatically at system startup. If you choose not to enable this option, you’ll need to launch it manually before use. To check if Docker is running, open your console or terminal and type docker -v.

Earth Engine Account Setup

Additionally, to run GeoCARET, you will need to set up an account with Google Earth Engine / Google Cloud and request access to the private assets hosted in our project folder. Detailed instructions on how to do this are available in the following documentation pages:

(back to top)

Fetching / Building Docker Images

It is recommended to fetch the Docker images directly from the GitHub Container Registry (GHCR) using the following commands:

docker pull ghcr.io/reservoir-research/geocaret:release
docker pull ghcr.io/tomjanus/reemission:release

Docker will automatically assign names of the pulled images based on their URLs. After you have pulled both images typing the docker images command, should produce a similar output to the one below:

❯ docker images
REPOSITORY                            TAG       IMAGE ID       CREATED       SIZE
ghcr.io/reservoir-research/geocaret   release   xxxxxxxxxxxx   x weeks ago   x.xxGB
ghcr.io/tomjanus/reemission           release   xxxxxxxxxxxx   x weeks ago   x.xxGB

Image names have have a [REPOSITORY]:[TAG] format. In other words, in case you need to, you should refer to both images as ghcr.io/reservoir-research/geocaret:release and ghcr.io/tomjanus/reemission:release.

Alternatively, you can build the images from source by downloading the source code for each package from their respective GitHub repositories. Instructions for building GeoCARET can be found here, and for RE-Emission here.

Folder structure

Running both applications requires a specific folder structure that maps directories on your local file system to correspondingu locations inside the Docker containers, enabling data exchange, e.g. reading inputs and saving outputs. This mapping is defined in two separate compose.yaml files—one for each application.

The folder structure includes an demo.csv input file for GeoCARET demonstration run and a test_input.json input file for testing RE-Emission. You can also create your own input files for both GeoCARET and RE-Emission, placing them in the appropriate folders as shown in the file tree below.

geocaret-reemission (root)
├── geocaret
│   ├── auth
│   ├── compose.yml
│   ├── data
│   │   ├── demo.csv
│   │   └── README.md
│   └── outputs
└── reemission
    ├── compose.yaml
    ├── examples
    │   └── test_input.json
    └── outputs

The geocaret/data and geocaret/outputs directories are used to store the input and output data for GeoCARET calculations, respectively. The geocaret/auth folder holds authentication credentials for Earth Engine and Google Cloud. Similarly, reemission/examples and reemission/outputs store the input and output data for RE-Emission.

(back to top)

Usage

Running GeoCARET

To test everything is working correctly, you should first run the following command from inside the GeoCARET workspace folder geocaret:

docker compose run --rm geocaret

Note

You should see the following message: You must specify a command to run. See https://Reservoir-Research.github.io/geocaret/running_geocaret/running_docker.html for details and GeoCARET will exit.

More information about running GeoCARET with Docker can be found in the GeoCARET's documentation.

Running RE-Emission

To test if the RE-Emission Docker image is working, run RE-Emission using docker compose, similar to how you ran GeoCARET. Start from the reemission folder:

docker compose run --rm reemission

You should see a short usage guide

Usage: reemission [OPTIONS] COMMAND [ARGS]...

This will be followed by additional details about the available options and commands for the RE-Emission command-line interface (CLI). More information on running RE-Emission with Docker can be found here.

Simple analysis

You can run a test analysis using the input data provided in ./examples/test_input.json. From within the reemission folder, enter the following command:

docker compose run --rm reemission reemission calculate examples/test_input.json -o outputs/test_output.json -o outputs/test_output.xlsx

The analysis should complete successfully and you should find two output files - test_output.json and test_output.xlsx - in the outputs folder.

Atttention

The two instances of "reemission" in the command are intentional. The first refers to the name of the running container while the second invokes the Re-Emission's CLI.

The output files include the input, intermediate, and final output variables in both .json and Excel formats.

Note

RE-Emission also supports generating output reports in a PDF format. However, this feature requires a working LaTeX installation, which is not currently included in the RE-Emission Docker Image. To generate a PDF report, provided that LaTeX is installed in your OS, add the -o [output-filename].pdf flag to the remisison calculation command. You can view an example PDF report in the RE-Emission documentation here.

ReEmission Demo

You can run a short demo that demonstrates how RE-Emission processes tabular input data, along with reservoir and catchment delineations from GeoCARET, to calculate reservoir emissions and visualize them on an interactive map. More details about the demo can be found on the corresponding page in RE-Emission's documentation. To run the demo from within the reemission folder, use the following command:

docker compose run --rm reemission reemission run-demo examples

where examples is the directory where all input and output files will be stored.

First, pre-calculated outputs from GeoCARET will be fetched from an online source. This includes tabular outputs with input data for the emission model and geospatial data with delineations (reservoir, river, and catchment) in .shp format—one for each analyzed dam.

Once the analysis is complete, you will find several files and folders in the examples directory, including the fetched input data. The structure of the examples folder should look like this:

examples
├── demo_interactive_map
├── geocaret_outputs
├── reemission_demo_dam_db
├── reemission_demo_delineations
├── reemission_outputs
└── test_input.json

They contents of each folder are explained below:

  • reemission_demo_delineations : Contains geospatial outputs from GeoCARET, including reservoir and catchment delineations, impounded river segments, and tabular outputs with emission input parameters in output_parameters.csv. Each folder contains outputs from a single simulation run. This folder is fetched from a remote source.
  • reemission_demo_dam_db : Contains a shapefile of hydroelectric dams, represented as points with associated data about dam and turbine properties. This folder is also fetched from a remote source.
  • geeocaret_outputs : Contains merged shapefiles from reemission_demo_delineations, updated with emission estimates calculated by RE-Emission. This folder also includes the reemission_inputs.json file, created by merging and converting multiple output_parameters.csv files.
  • reemission_outputs : Contains the output data generated by RE-Emission in both .json and .xlsx formats.
  • demo_interactive_map : Contains an interactive map that visualizes the delineated reservoirs and their estimated emissions.

An animated GIF demonstrating how to run the demo (albeit not from within the Docker container) is shown below. Running the demo using the Docker image should look the same, except for the initial command.

demo-22-05-24

The interactive map can also be accessed at https://tomjanus.github.io/reemission_demo_map.

(back to top)

Running GeoCARET/ReEmission calculations

Here, we will demonstrate how to run GeoCARET and RE-Emission sequentially. We will begin with dam locations and dam heights/full supply levels, then use GeoCARET to delineate the respective reservoirs and catchments, and calculate reservoir- and catchment-level inputs for the emission model. Finally, we will calculate reservoir emissions using RE-Emission.

The demo illustrates the process for a batch of three reservoirs.

Note

To run the analysis you need to be authenticated with GeoCARET, have a Google Cloud project folder set up an have at least one assets directory within your GCloud project folder. For details how to do this, please refer to the Account Setup section of the GeoCARET documentation. For this example, we will assume that the project folder is named geocaret-demo and inside the project folder we create assets folder called emissions.

Attention

An internet connection needs to present throughout the process of running the GeoCARET calculations. Short interruptions in connectivity are permissible.

Running Times

The majority of the processing time is spent in GeoCARET, where reservoirs and catchments are delineated and their respective characteristics computed. The computational time varies depending on the size of the reservoirs and catchments, as well as the availability of Google’s servers and their current workload. In our experience, it took 19 minutes to analyze three dams during this GeoCARET demonstration run, using an Earth Engine account with unpaid usage. Please note that this time does not include the optional export of delineations to Google Drive, which took an additional 2.5 minutes. The run times for the subsequent steps are measured in seconds.

GeoCARET-ReEmission Demo

Here, we will outline the steps necessary to calculate the reservoir emissions associated with the construction of three dams. The process involves computing the reservoir and catchment characteristics for each dam and utilizing this data to estimate greenhouse gas (GHG) emissions.

Step 1 - Run GeoCARET

Assuming your project is callled geocaret-demo and you choose a standard simulation scenario (see Export Settings and Outputs in GeoCARET's documentation for options) you can run geocaret from within the geocaret folder using the following command:

docker compose run --rm geocaret python heet_cli.py data/demo.csv geocaret-demo demo standard

This run delineates the reservoirs and catchments for three dam locations and calculates the characteristics of each that are relevant for estimating reservoir GHG emissions with ReEmission.

Step 2 - Copy output_parameters.csv between geocaret and reemission folders

After the analysis completes you should see a subfolder in the outputs folder with the designation DEMO_[yyyymmdd]-[hhmm], where [yyyymmdd] represents the date of the simulation (e.g. 20240924) and [hhmm] indicates the simulation start time when (e.g. 1914). The file we are interested in is output_parameters.csv, which contains the input data for RE-Emission. You will need to copy this file to the examples folder within the reemission directory, either manually or programmatically. In the Linux Teminal you can do the following (from within the geocaret folder):

mkdir ../reemission/examples/demo
cp outputs/DEMO_[yyyymmdd]-[hhmm]/output_parameters.csv  ../reemission/examples/demo/output_parameters.csv

The equivalent commands in Windows Command Prompot (CMD) are:

mkdir ..\reemission\examples\demo
copy outputs\DEMO_[yyyymmdd]-[hhmm]\output_parameters.csv ..\reemission\examples\demo\output_parameters.csv

Step 3 - Convert output_parameters.csv to ReEmission input file in JSON format

The tabular output data from GeoCARET needs to be converted into the JSON file format required by RE-Emission. This involves using two conversion scripts in RE-Emission: the first script imputes any missing data that cannot be sourced from known geospatial datasets, and the second converts the tabular data into the appropriate JSON format.

Add missing columns (variables)

Three input variables for ReEmission cannot be derived from geospatial datasets and must be entered manually. These variables are: level of wastewater treatment in the catchment, landuse intensity in the catchment and reservoir's purpose/type represented by: c_treatment_factor, c_landuse_indensity and type, respectively. Here, we will apply default values for these parameters (across all reservoirs) using a script in ReEmission. You can adjust them later in output_parameters_upgraded.csv. To run the script, navigate to the reemission folder and execute the following command:

docker compose run --rm reemission reemission-geocaret process-tab-outputs \
  -i examples/demo/output_parameters.csv \
  -o examples/demo/output_parameters_upgraded.csv \
  -cv 'c_treatment_factor' 'primary (mechanical)' \
  -cv 'c_landuse_intensity' 'low intensity' \
  -cv 'type' 'unknown'

The allowed values for each of the three variables are listed below.

c_landuse_intensity c_treatment_factor type
low intensity no treatment hydroelectric
high intensity primary (mechanical) multipurpose
secondary biological treatment potable
tertiary irrigation
flood control
unknown
Convert CSV to JSON

We then convert the tabular data, updated with the missing input variables, into a JSON representation that conforms to RE-Emission's input data specification.

docker compose run --rm reemission reemission-geocaret tab-to-json \
  -i examples/demo/output_parameters_upgraded.csv \
  -o examples/demo/input_data_demo.json

Step 4 - Run ReEmission

Finally, we estimate the reservoir emission in ReEmission.

docker compose run --rm reemission reemission calculate \
  examples/demo/input_data_demo.json \
  -o outputs/output_data_demo.json \
  -o outputs/output_data_demo.xlsx

The generated files, outputs/output_data_demo.json and outputs/output_data_demo.xlsx, contain the inputs, the calculated intermediate values and emissions. Both files contain equivalent numerical information. JSON files are intended for data transfer and processing via scripts, e.g. in Python or JavaScript, while XLSX files are suitable for quick examination and manipulation using Microsoft Excel or LibreOffice Calc.

Exporting delineations to GDrive - OPTIONAL

We have currently configured GeoCARET to output only the tabular values. However, you can also optionally export the other generated files, including reservoir and catchment delineations (polygons), dam locations (points), and impounded river sections (polylines) in shapefile format. These files will be exported to your Google Drive. To perform the file export, use the following command, where DEMO_[yyyymmdd]-[hh-mm] corresponds to the output folder generated during the GeoCARET run and geocaret-demo is the project folder. The --drive-folder flag sets the the path on your Google Drive where the results will be exported to. All arguments: --results-path, --drive-folder and --project are required.

docker compose run --rm geocaret python heet_export_cli.py \
  --results-path projects/geocaret-demo/assets/emissions/XHEET/DEMO_[yyymmdd]-[hh-mm] \
  --drive-folder DEMO_[yyymmdd]-[hh-mm] \
  --project geocaret-demo

You can download the folder manually from your Google Drive, either via a web browser or using a Google Drive client.

Known Issues

1. Docker in Rootless mode

You may encounter permission issues when writing to volumes (the created folder structure) if your Docker is configured to run in rootless mode, which is not the default option. While using Docker in rootless mode is recommended for critical server-side applications that require increased security, we discourage its use with our applications. However, if you choose to run Docker in rootless mode and encounter permission issues—such as being unable to write files into folders—you can refer to this article for potential fixes.

(back to top)

License

GPL-3.0

📬 Contact

Resources

(back to top)

Contributors ✨


Tomasz Janus

💻⚠️ 🐛🎨📖

James Sinnott

💻⚠️ 🐛🎨📖

This project follows the all-contributors specification. Contributions of any kind are welcome!

(back to top)