Data Loch

Data Loch scripts interact with Instructure's Canvas Data service, which is refreshed daily.

We store Canvas data in Amazon S3 for future processing. Steps:

download Canvas data
upload data to Amazon S3 (buckets are organized according to the DataLake design)
using Redshift Spectrum, create schemas/tables in AWS Glue data catalog.
Run queries against Redshift tables to prepare analytics data for ASC project

Amazon S3 is our storage layer, similar to a file system. The compute layers directly access and query and recognize the schematics of the data. This eliminates the need having a long continuous running compute cluster with storage capabilities. The external schemas created can be accessed via multiple compute layers without duplication of data.

Installation

Build

# .nvmrc file has preferred Node version
nvm use
npm install

Create and populate database

Create RedShift schema, download files from Canvas Data API and populate db.

node app

Run

Refresh the DataLake views. These high-performance materialized views give applications access to analytics data.

Configure the cron-like job and the script to run. The dataLake.cron.tasks configs must have a valid cron pattern and the path to the cron script to run on trigger.

"cron": {
    "enabled": true,
    "tasks": {
      "syncDumps" : {
        "interval": "00 00 5 * * 1-5",  // Runs every weekday at 5 AM
        "script": "../store/syncDumps",
        "runOnInit": false
      }
    }

When the app starts it also configures and schedules cron tasks
Alternatively, the cron scripts can also be trigger externally via a REST API for ad-hoc runs.

Cloud deployment notes

Refer AWS deployment docs

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.ebextensions		.ebextensions
config		config
docs		docs
lib		lib
logs		logs
public		public
scripts/suitec-analytics		scripts/suitec-analytics
sql		sql
test		test
.eslintrc.yml		.eslintrc.yml
.gitignore		.gitignore
.nvmrc		.nvmrc
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
app.js		app.js
buildspec.yml		buildspec.yml
gulpfile.js		gulpfile.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Loch

Installation

Build

Create and populate database

Run

Cloud deployment notes

About

Releases

Packages

Contributors 7

Languages

License

ets-berkeley-edu/data-loch

Folders and files

Latest commit

History

Repository files navigation

Data Loch

Installation

Build

Create and populate database

Run

Cloud deployment notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages