Skip to content

Data Operations

Cristian Vasquez edited this page Oct 21, 2024 · 13 revisions

Data operations refer to the processes involved in managing, transforming, and maintaining data within a system. This includes tasks such as data transformation, integration, cleaning, enrichment, validation, loading, monitoring, and archiving.

This page refers to the processes that manage the transformation and reprocessing of data in response to changes in the pipeline, such as updates to pipeline components, ontology versions, or URI patterns. These operations are essential for ensuring that downstream systems, such as triplestores, always have up-to-date and accurate data.

The granular selection of data sources for reprocessing is achieved through queries based on the so-called Operational Metadata. This allows for precise control over which notices or datasets need to be reprocessed or updated, minimizing unnecessary operations.

Use Case Scenarios

Operational metadata plays a key role in various data operation scenarios, such as:

  • Reprocessing Notices for New Mappings: When new RML mappings become available, specific types of procurement notices need to be reprocessed to apply the new mappings.
  • Adapting to Ontology Updates: As the eProcurement Ontology evolves, notices need to be reprocessed to align with updated mappings and controlled vocabularies. Given the large volume of notices, selection criteria like notice type or publication date can be used to narrow down the scope of reprocessing.
  • Adjusting to Privacy Policy Changes: Changes in data privacy regulations may require certain notices to be reprocessed to apply new rules, such as activating or deactivating private fields.
  • Recovering from Transformation Failures: If transformation errors occur, such as those caused by connectivity issues, operational metadata allows identification of affected notices so they can be flagged and reprocessed.
  • Troubleshooting Failed Jobs: When a job fails, operational metadata provides detailed information, such as issues with the mappings or system errors, which can assist in diagnosing and resolving the problem.
  • Summarizing Processed Notices: Metadata can provide a summary of processed notices for specific batches, making it easier to track and report data transformation progress.

Accessing Metadata

A convenient method for retrieving metadata is by dereferencing a URL, which enables downstream systems to trace process details and integrate this information into data catalogs, KPIs, and quality metrics.

Example

For each daily job, a new metadata document is generated. This document, accessible via a URL, can be shared with Microsoft teams to report transformation outcomes. For instance, a notification might state: “3,000 notices were successfully transformed today.”

Clone this wiki locally