Skip to content
This repository has been archived by the owner on May 10, 2022. It is now read-only.
Steve Bennett edited this page Oct 26, 2017 · 5 revisions

Design

This is some work-in-progress design planning. OzData was made at the Brisbane Ozunconf in April 2017. At the Melbourne OzUnconf we're extending and refactoring.

Design goals

Problems to solve:

  • finding data across a wider range of sources than just data.gov.au
  • reducing the time between "this dataset looks useful" and actually querying it inside R
  • abstracting away details of datasets and services to simplify the goal of someone actually using the data to do stuff

Terminology

  • data catalogue: a place where datasets and data services are listed, like data.gov.au
  • data service: an online API through which data can be queried
  • dataset: one or more files that have to be downloaded to be used

We envision a set of packages that streamline the process of finding, accessing and using open datasets from government agencies and research organisations in Australia.

Functions

  • Search: Find datasets matching some query

  • Curated lists: Curated lists of datasets on some topic (eg "weather")

  • Generic datasets: (Things that work on thousands of different datasets)

  • Get tabular dataset: Download, unzip and lightly process some generic dataset that is ultimately a table

  • Known datasets: (Things that are individually implemented on a handful of particular datasets)

  • Get dataset: Download and process a dataset, manipulating it to be as useful as possible, possibly - applying a known standard.

  • Known dataservices: Wrappers around web APIs (such as ABS.Stat)

  • Query

  • Preview: Download, process, do whatever required to generate a visualisation.

Clone this wiki locally