-
Notifications
You must be signed in to change notification settings - Fork 7
Home
Here go some lines about encapsulating DaCHS in Docker container(s).
Dachs is composed by two living blocks: (1) the data access interface consulting (2) a Postgres database. Not all data is stored in the sql-db, but instead reside in files within Dachs directories tree. Typically, those files -- which we can say connect both running blocks -- are placed inside Dachs's GAVO_ROOT/inputs
.
A very relevant point is to have a way to persist data and keep datasets separated.
The files structure of Dachs goes like:
/var/gavo ├── cache ├── etc │ ├── defaultmeta.txt │ ├── userconfig.rd │ └── ... ├── inputs │ ├── DATASET_1 │ │ ├── data │ │ └── q.rd │ └── DATASET_2 │ ├── data │ └── q.rd ├── logs ├── state ├── tmp └── web └── templates └── root.html
, where DATASET_1
and DATASET_2
are hypothetical datasets with files q.rd
named to describe the resources. Without loosing generality lots of files have been omitted in this (example files) tree and some other have been exposed, the reason is to call attention for those files carrying information of interest for persistence.
For instance, it would be nice to have DATASET_1
and DATASET_2
as "plugable" container/volumes. Also, site-dependent files like the ones in etc
and web
should compose the "main" container, but be editable.
The (main) container encapsulates the server itself, files and directories to run the software.
To be able to have the settings independent from software installation -- for maintenance purposes, for example -- we would like to have files in /var/gavo/etc
(remember file /etc/gavo.rc
) and alike directories to be part of another docker volume.
Whenever a dataset is added to dachs-docker, an gavo import
command should be run. For example, mounting DATASET_1
volume to /var/gavo/inputs/DATASET_1
, should trigger the command:
$ gavo import /var/gavo/inputs/DATASET_1/q.rd
Before getting into the docker node, is worth it to highlight the steps and states of the system we need to have a dataset ingested-and-available through dachs.
First, we have to place the data and their descriptor (RD
) in some directory -- for instance, DATASET_1/
.
To ingest the data, gavo
/dachs server has to be running, as well as postgresql
.
And then we can gavo import DATASET_1/q
.
Picture the components:
+-------------+ +------------+ | gavo daemon | ---- | postgresql | +-------------+ +------------+ | | +...........+ | | | DATASET_1 | -----+---- gavo import -----` +...........+ | ``````````` | | `=============== | data access | | interface | ===============
It is important to have this diagram in mind to understand not only the components but the steps to make data available for Docker each container can run only one process (ideally).
A first try on dockerizing DaCHS can be seen at the Docker hub. There you have dachs and postgres servers running all together. But no data, no persistence. Let's call this version v0.1
.
Next step is to plug-in data volumes; to have data added from the outside world -- take DATASET_1
and DATASET_2
as examples.
Attaching a volume to a container -- as well as detaching and keeping it for persistence to another mount in the future -- are quite simple procedures, just some rules have to be followed to proceed properly on that.
First of all, volumes can be attached to a container only at the moment the container is initialized, volumes cannot be mounted on already running containers.