-
Notifications
You must be signed in to change notification settings - Fork 0
/
copernic-awesome-data-store.html
25 lines (25 loc) · 3.74 KB
/
copernic-awesome-data-store.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<h1 id="copernic-awesome-data-store"><a href="https://github.com/amirouche/copernic">copernic</a>: awesome data store</h1>
<div class="figure">
<img src="https://raw.githubusercontent.com/amirouche/copernic/master/data.jpg" alt="data" />
<p class="caption">data</p>
</div>
<h2 id="abstract">Abstract</h2>
<p>copernic is web application that is (mostly) implemented with Python programming language. It is supported by a database that is a quad store versioned in a direct-acyclic-graph. It is possible to do time traveling queries at any point in history while still being efficient to query and modify the latest version. The versioned quad store is implemented using a novel approach dubbed generic tuple store. copernic goal is to demonstrate that versioned databases allow to implement workflows that ease cooperation.</p>
<h2 id="keywords">Keywords</h2>
<ul>
<li>data management system</li>
<li>data science</li>
<li>distributed version control system</li>
<li>knowledge base</li>
<li>open data</li>
<li>quality assurance</li>
<li>reproducible science</li>
<li>python programming language</li>
</ul>
<h2 id="introduction">Introduction</h2>
<p>Versioning in production systems is a trick everybody knows about whether it is through backup, logging systems and ad-hoc <a href="https://code.djangoproject.com/wiki/AuditTrail">audit trails</a>. It allows to inspect, debug and in worst cases rollback to previous states. There is not need to explain the great importance of versioning in software management as tools like git, mercurial, and fossil have shaped modern computing.</p>
<p>Having the power of multiple branch versioning open the door to manyfold applications. Like, it allows to implement a mechanic similar to github's pull requests and gitlab's merge requests in many products. That very mechanic is explicit about the actual human workflow in entreprise settings, in particular, when a person validates a change made by another person.</p>
<p>The <em>versioned quad store</em> make the implementation of such mechanics more systematic and less error prone as the implementation can be shared across various tools and organisations.</p>
<p>copernic takes the path of versioning data and apply the change-request mechanic to collaborate around the making of a knowledge base, similar in spirit to <a href="https://wikidata.org/">WikiData</a> and inspired from existing data management systems like CKAN.</p>
<p>The use of a version control system to store <a href="https://en.wikipedia.org/wiki/Open_data">open data</a> is a good thing as it draws a clear path for reproducible science. But none, meets all the expectations. <strong>copernic aims to replace the use of git and make practical cooperation around the creation, publication, storage, re-use and maintenance of knowledge bases that are possibly bigger than memory.</strong> Resource Description Framework (RDF) offers a good canvas for cooperation around open data but there is no solution that is good enough according to <a href="https://core.ac.uk/download/pdf/76527782.pdf">Collaborative Open Data versioning: a pragmatic approach using Linked Data, by Canova <em>et al.</em></a></p>
<p>copernic use a novel approach to store quads in an <a href="https://en.wikipedia.org/wiki/Ordered_Key-Value_Store">ordered key-value store</a>. It use <a href="https://www.foundationdb.org/">FoundationDB</a> database storage engine to deliver a pragmatic versatile ACID-compliant versioned quad store where people can cooperate around the making of the future of knowledge. It also rely on a new algorithm to query versioned tuples based on a topological graph ordering of changes. copernic only stores changes between versions. copernic does not rely on the theory of patches introduced by Darcs but re-use some its vocabulary. copernic is the future.</p>