Osprey is a system that checks the images produced by vendors in mass digitization projects by the Collections Digitization program of the Digitization Program Office, OCIO, Smithsonian.
The system checks that the files pass a number of tests and displays the results in a web dashboard. This allows the vendor, the project manager, and the unit to monitor the progress and detect problems early.
This repo hosts the code for the dashboard, which presents the progress in each project and highlights any issues in the files.
The Osprey Worker runs in Linux and updates the dashboard via an API (see below). The Worker can be configured to run one or more of these checks:
- unique_file - Unique file name in the project
- raw_pair - There is a raw file paired in a subfolder (e.g. tifs and raws (.eip/.iiq) subfolders)
- jhove - The file is a valid image according to JHOVE
- tifpages - The tif files don't contain an embedded thumbnail, or more than one image per file
- magick - The file is a valid image according to Imagemagick
- tif_compression - The tif file is compressed using LZW to save disk space
Other file checks can be added. Documentation to be added.
The app runs in Python using the Flask module and requires a MySQL database. Install and populate the database according to the instructions in database/tables.sql.
To install the required environment and modules to the default location (/var/www/app
):
mkdir /var/www/app
cd /var/www/app
python3 -m venv venv
source venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txt
Then, test the app by running the main file:
./app.py
or:
python3 app.py
which will start the service at http://localhost:5000/
.
Update permissions:
deactivate
sudo chown -R apache:apache /var/www/app
Setup apache2/httpd as described in the web_server folder
The application includes an API which requires a key sent using POST with api_key
.
import requests
payload = {'api_key': KEY}
r = requests.post('{}/api/projects/{}'.format([API_URL], [PROJECT_ALIAS]), data=payload)
These routes are available:
/api/
: Print available routes in JSON/api/projects/
: Get the list of projects in the system/api/projects/<project_alias>
: Get the details of a project by specifying the project_aliasproject_alias
: String alias of the projectproject_id
: ID of the project (integer)folders
: Folders in this projectproject_unit
: SI Unitproject_type
: Production or Pilotproject_status
: Status of the project (e.g. ongoing, paused, completed)project_area
: Discipline area of the projectproject_description
: Description of the project, goals, and collection digitizedproject_checks
: Checks that run for all files in the projectproject_postprocessing
: Post-project steps tracked in the systemproject_manager
: PM of the projectproject_method
: Method used for digitizationproject_start
: Date when the project started digitizationproject_end
: Date when the digitization endedproject_stats
: Main stats of the projectreports
: Data reports in this project
/api/folders/<folder_id>
: Get the details of a folder and the list of filesfolder
: Name of folderfolder_id
: ID of this folder (integer)folder_date
: Date when the folder was created by the vendorno_files
: Number of files in the folderproject_id
: ID of the project (integer)project_alias
: String alias of the projectdelivered_to_dams
: Status of the folder regarding delivery to the DAMSqc_status
: QC status of the folderfiles
: Files, including file_id, in this folder
/api/files/<file_id>
: Get the details of a file by itsfile_id
file_id
: ID of the file in the system (integer)file_name
: Filenamedams_uan
: DAMS UANexif
: EXIF metadatafile_checks
: Checks of the files and resultsfile_postprocessing
: Steps tracking data steps of each filefolder_id
: ID of the folder containing the filelinks
: Links to other systems related to this imagemd5_hashes
: MD5 hashes of files related to this image, usually a TIF and a RAWpreview_image
: If not null, a link to an external rendering of the image
/api/reports/<report_id>/
: Get the data from a project report
The system has two related repos:
- Osprey Worker - Python tool that runs a series of checks on folders. Results are sent to the dashboard via an HTTP API to be saved to the database.
- Osprey Misc - Database and scripts.
Available under the Apache License 2.0. Consult the LICENSE file for details.