Skip to content
Carlos Brandt edited this page Aug 9, 2016 · 22 revisions

DaCHS on Docker

Here go some lines about encapsulating DaCHS in Docker container(s).

Rationale

Dachs is composed by two living blocks: (1) the data access interface consulting (2) a Postgres database. Not all data is stored in the sql-db, but instead reside in files within Dachs directories tree. Typically, those files -- which we can say connect both running blocks -- are placed inside Dachs's GAVO_ROOT/inputs.

A very relevant point is to have a way to persist data and keep datasets separated.

The files structure of Dachs goes like:

/var/gavo
├── cache
├── etc
│   ├── defaultmeta.txt
│   ├── userconfig.rd
│   └── ...
├── inputs
│   ├── DATASET_1
│   │   ├── data
│   │   └── q.rd
│   └── DATASET_2
│       ├── data
│       └── q.rd
├── logs
├── state
├── tmp
└── web
    └── templates
        └── root.html

, where DATASET_1 and DATASET_2 are hypothetical datasets with files q.rd named to describe the resources. Without loosing generality lots of files have been omitted in this (example files) tree and some other have been exposed, the reason is to call attention for those files carrying information of interest for persistence.

For instance, it would be nice to have DATASET_1 and DATASET_2 as "plugable" container/volumes. Also, site-dependent files like the ones in etc and web should compose the "main" container, but be editable.

The server

The (main) container encapsulates the server itself, files and directories to run the software.

Detached config

To be able to have the settings independent from software installation -- for maintenance purposes, for example -- we would like to have files in /var/gavo/etc (remember file /etc/gavo.rc) and alike directories to be part of another docker volume.

Mobile datasets

Whenever a dataset is added to dachs-docker, an gavo import command should be run. For example, mounting DATASET_1 volume to /var/gavo/inputs/DATASET_1, should trigger the command:

$ gavo import /var/gavo/inputs/DATASET_1/q.rd

About ingesting data

Before getting into the docker node, is worth it to highlight the steps and states of the system we need to have a dataset ingested-and-available through dachs.

First, we have to place the data and their descriptor (RD) in some directory -- for instance, DATASET_1/. To ingest the data, gavo/dachs server has to be running, as well as postgresql. And then we can gavo import DATASET_1/q.

Picture the components:

                   +-------------+          +------------+
                   | gavo daemon |   ----   | postgresql |
                   +-------------+          +------------+
                    |                        |
+...........+       |                        |
| DATASET_1 |  -----+----  gavo import  -----`
+...........+       |      ```````````
                    |
                    |
                    `===============
                     | data access |
                     |  interface  |
                     ===============

It is important to have this diagram in mind to understand not only the components but the steps to make data available for Docker each container can run only one process (ideally).

Dockerizing

A first try on dockerizing DaCHS can be seen at the Docker hub. There you have dachs and postgres servers running all together. But no data, no persistence. Let's call this version v0.1.

Next step is to plug-in data volumes; to have data added from the outside world -- take DATASET_1 and DATASET_2 as examples.

Volumes on

Attaching a volume to a container -- as well as detaching and keeping it for persistence to another mount in the future -- are quite simple procedures, just some rules have to be followed to proceed properly on that.

First of all, volumes can be attached to a container only at the moment the container is initialized, volumes cannot be mounted on already running containers.

Clone this wiki locally