Skip to content

Data Operations

Cristian Vasquez edited this page Oct 21, 2024 · 13 revisions

Data operations

Data operations refer to the processes that manage the transformation and reprocessing of data in response to changes in the pipeline, such as updates to pipeline components, ontology versions, or URI patterns. These operations are essential for ensuring that downstream systems, such as triplestores, always have up-to-date and accurate data.

Granular selection of data sources for reprocessing is done using queries based on the Operational Metadata. This allows for precise control over which notices or datasets need to be reprocessed or updated, ensuring efficiency and minimizing unnecessary operations.

Use Case Scenarios

Operational metadata plays a key role in managing various data operation scenarios, such as:

  • Reprocessing Notices for New Mappings: When new RML mappings become available, specific types of procurement notices need to be reprocessed to apply the new mappings.
  • Adapting to Ontology Updates: As the eProcurement Ontology evolves, notices need to be reprocessed to align with updated mappings and controlled vocabularies. Given the large volume of notices, selection criteria like notice type or publication date can be used to narrow down the scope of reprocessing.
  • Adjusting to Privacy Policy Changes: Changes in data privacy regulations may require certain notices to be reprocessed to apply new rules, such as activating or deactivating private fields.
  • Recovering from Transformation Failures: If transformation errors occur, such as those caused by connectivity issues, operational metadata allows identification of affected notices so they can be flagged and reprocessed.
  • Troubleshooting Failed Jobs: When a job fails, operational metadata provides detailed information, such as issues with the mappings or system errors, which can assist in diagnosing and resolving the problem.
  • Summarizing Processed Notices: Operational metadata can provide a summary of processed notices for specific batches, making it easier to track and report data transformation progress.

Accessing Operational Metadata

Operational metadata can be retrieved by dereferencing a URL, allowing downstream systems to trace process details and integrate the information into data catalogs, KPIs, and quality metrics.

Example

For each daily job, a new metadata document is generated. This document, accessible via a URL, can be shared with teams to report transformation outcomes. For instance, a notification might state: “3,000 notices were successfully transformed today.”

Clone this wiki locally