Remember that containers have a volatile state: when they are removed, data modified inside the container's (virtual) filesystem will be deleted. Clearly, we need to workout a setup where data and settings remain safe -- persist -- across container shutdowns/upgrades.
Docker volumes is a central concept to this argument, volumes are essentially mounting points between the host and the container or among containers. Through volumes we can
- share data between host and container,
- share data between containers (with a storage-dedicated container).
The best strategy depends very much on each data provider's workflow and the amount of data. Anyways, we will here (together with the Workflow document) handle a couple of examples on the subject.
Generally, when we think about data to persist across Dachs instances, we think on the contents of:
/var/gavo/inputs/*
: par default the services directories;/var/gavo
: almost completely defines the site;/etc/gavo.rc
: site's metadata.
Site's metadata, though, is something rather stable -- actually, static -- in
every site.
For that component, it may be reasonable to have it in a custom container after
inheriting from chbrandt/dachs:server
like shown in the README, 'FROM dachs:server' section.
Let's consider we have a set of services under our host's /dachs/sets
:
$ tree /dachs/sets
/dachs/sets/
├── arihip
│ ├── data
│ │ └── data.txt.gz
│ └── q.rd
└── datasetx
├── data.csv
└── q.rd
We can run our Dachs container/site as follows:
# Run the 'postgres' container and then..
#
(host)$ docker run -dt --name dachs -p 80:80 \
-v /dachs/sets/arihip:/var/gavo/inputs/arihip \
-v /dachs/sets/datasetx:/var/gavo/inputs/datasetx \
chbrandt/dachs:server
And then, from another terminal window, manage (i.e., publish) the service:
(host)$ docker exec -it dachs bash
(cont)$ gavo imp arihip/q && gavo pub arihip/q
(cont)$ gavo imp datasetx/q && gavo pub datasetx/q
(cont)$ gavo serve reload
Obviously, you can handle this process as best it fits your workflow.
For example, I have a "utils
" directory with scripts I use on a daily basis
for administrative tasks, among them the DaCHS data/services management.
I usually bring my "utils
" tools with me inside a container:
# Run the 'postgres' container and then..
#
(host)$ docker run -dt --name dachs -p 80:80 \
-v /dachs/sets/arihip:/var/gavo/inputs/arihip \
-v /dachs/sets/datasetx:/var/gavo/inputs/datasetx \
-v /dachs/utils:/usr/host/utils \
-v /dachs/etc/gavo.rc:/etc/gavo.rc:ro \
chbrandt/dachs:server
Notice that in this example I also mounted the site's metadata (
/etc/gavo.rc
), with an extra parameter: ":ro
" -- in read-only mode. By default, volumes are mounted in read-write mode, which means that files/directories can be modified (either from inside the container or from by the host). The "read-only" flag will block edition from inside the container.
Another way to persist data in a docker setup is through volume containers. Basically, volume containers are containers dedicated to serve as a storage hub, exporting volumes to other, running containers.
There are two ways to have a volume container:
- (traditional) build a container and expose specific
VOLUME
path; - (recommended) create a
docker-volume
to store different paths.
This is the traditional way of creating a volume container, a Dockerfile
is
defined to export certain volumes.
For example, the following Dockerfile
could be used to pool everything from
/var/gavo
:
FROM debian
RUN mkdir -p /var/gavo
VOLUME /var/gavo
And if we built it with the following command line:
$ docker build -t mydachs:volume ./
We would then use it as:
(host)$ docker run -dt --name dachs_vargavo mydachs:volume
(host)$
(host)$ docker run -dt --name dachs -p 80:80 \
--volumes-from mydachs_volume \
chbrandt/dachs:server
Everything you do inside /var/gavo
(create, move, delete) will be saved in
dachs_vargavo
.
Since you can commit changes done to a container in a new image, you could
also versionize your dachs_vargavo
container (image, mydachs:volume
) each
time a new service/resource comes in, for example.
Again, it is up to the data publisher to decide if it is a reasonable workflow;
You'll probably not do it if your services have a lot of data under them.
Nowadays docker provides the volume
interface, specific for non-running
containers, dedicated to data persistence.
First thing to know about docker volumes is that they are only deleted from
your host's filesystem when explicitly removed -- which is a nice, very safe
feature (though, notice, if you decide to play with volumes and forget to
clean after it, data may start to accumulate under the hood.)
To create a docker volume is rather simple:
$ docker volume create dachs_store
And then, we can "mount" whichever path we want during the companion container's initialization:
$ docker run -dt --name dachs -p 80:80 \
-v dachs_store:/var/gavo \
-v dachs_store:/etc/gavo.rc \
chbrandt/dachs:server
If the volume-container is empty at a path (e.g., /var/gavo
), it will copy
the content from the companion container (e.g., dachs
); otherwise, will
just mount it at the corresponding location exposing its content.