-
Notifications
You must be signed in to change notification settings - Fork 9
Home
Genomics requires the use of many computational tools starting from assessing the quality of sequencing data, assembly, annotation, and comparison and analyses. And Bioinformatics tools are often hard to install (except on very specific computers). Version changes changes input/output, making experiiments difficult to reproduce.
To make matters worse, Biologists often lack necessary computational training to setup 10s of computational tools available for each task, compare, choose the best, and conduct their experiments.
We need easy access to Bioinformatics tool:
- on HPC, fat servers, small PCs
- setup to be reproducible and shareable
Docker is a new technology that make reproducible setups possible. Docker works by creating "to the specification" image from a Dockerfile which are then run in an isolated container. Dockerfiles or the resulting images can be persisted forever, and shared or published over the Internet, making it possible for anybody to recreate the exact same setup at any point of time in the future.
The aim of this project is to make complex Genomics software and even BioLinux! available in just one command.
# terminal 1
$ switch biolinux7
... do stuff in biolinux
# terminal 2
$ switch pacbio
... do stuff with pacbio
# terminal 3 - run a job
$ switch maker -genome Si_gnF.fa -cpus 24
To that goal, we will create and and distribute through a central repository Dockerfiles and resulting images for common Genomics software.
On standard linux host (first Ubuntu 14.04), Biolinux (or similar) running under docker in userspace (a small amount of root install is ok; most stuff should be happening in home directory):
- host OS home directory and other mounts should be mounted in client (biolinux) in the same directories at the host.
- user group IDs need to be respected (for accessing shared dirs...)
Once this works on Ubuntu 14.04, make it work under Centos 7.
Instructions/scripts to make the previous happen easily:
- setup should be one line - similar to install oh-my-zsh
- switch to/from OS should be one line (this should include
cd
-ing within the client to the hosts'pwd
). - (in github repo)
Note; have a look at kitematic.
There will still be a lot of work. This includes stress-test of specific apps (including Galaxy setup on macbook host), roll-out on Apocrita compute cluster under sci linux 6, usability under SGE queue, appropriate documentation, publication, presentation to project partners, tutorial with example analysis pipelines, test on RAL Jasmincluster (coordinate via Tim Booth & Phil Kershaw), optimise filesystem mounting protocol for huge fragmented datasets, document why Proot is inappropriate...