Database for OOI CamHD, where video files are indexed by scenes
- Launch and ssh into a google compute engine instance.
- Install requirements on the instance:
libraries/tools:
- ffmpeg
- docker (for running postgresql)
python modules
- boto
- gcs_oauth2_boto_plugin
- psycopg2
- Create the postgresql database
# start the postgresql server
./start_postgres.sh
# create the ashdm database tables
python create_db.py
This will create two tables: scenes
and scene_bounds
.
These instructions assume the videos are already in a Google Storage bucket called escience_camhd
.
To be able to read/write to Google Storage buckets, be sure you are authenticated. If you are using the default service account and that service account has permission to access your bucket, then your instance terminal session will be authenticated already.
The following command processes* all the videos to find their scene bounds.
python index_videos.py \
--src-uri gs://escience_camhd/files/RS03ASHS/PN03B/06-CAMHDA301/2016/04/04 \
--find-scene-bounds
* Right now the processing is a stub that inserts hardcoded bounds for each video.
The following command will take all the mp4 files in gs://escience_camhd/files/RS03ASHS/PN03B/06-CAMHDA301/2016/04/04
and index them into the scenes specified in the scene_bounds
table.
python index_videos.py \
--src-uri gs://escience_camhd/files/RS03ASHS/PN03B/06-CAMHDA301/2016/04/04 \
--dst-uri gs://bdmyers_escience_camhd/files/RS03ASHS/PN03B/06-CAMHDA301/2016/04/04/scenes
The overall result is new rows in the scenes
table and new video files in gs://bdmyers_escience_camhd/files/RS03ASHS/PN03B/06-CAMHDA301/2016/04/04/scenes
.
For now you can think of a query as being specified by two things: a SQL query that returns URLs of scenes and code that does something with each scene file.
See an example query of the ashdm database in example_query.py
.
Run the postgresql shell with ./ashdm_psql.sh