Skip to content

Data Operations

Cristian Vasquez edited this page Oct 17, 2024 · 13 revisions

Data Operations

Operational Metadata plays a critical role in tracking the lifecycle of notice transformations. It provides valuable insights into the data’s origin, the transformations applied, any errors encountered, and the overall status of these processes.

Granularity

The operational metadata is available at two levels of granularity:

  • Batch (job) level
  • Individual (notice) level

This detailed metadata enables efficient identification of data that requires creation, updating, or deletion in downstream systems like CELLAR and notifies upstream systems regarding updates in data sources.

Use Case Scenarios

Example use cases are:

  • Handling Failed Jobs: When a job fails, metadata offers key information for troubleshooting, such as identifying connectivity issues.
  • Managing Failed Notice Transformations: In case of transformation errors, metadata helps trace problems related to data sources, enrichment, or mapping issues.
  • Updating Notices for New Ontology Versions: As ontologies evolve, notices may need to be reprocessed to use updated mappings and controlled vocabularies.
  • Activating/Deactivating Private Fields: Changes in data privacy policies may require specific notices to be reprocessed to address privacy-related adjustments.
  • Summary of processed Notices: Each day the counts of processed notices is retrieved.

Update a set of Notices of interest

  • Query mechanism: A query mechanism is essential to facilitate data operations. We need to define queries based on operational metadata, enabling the selection of notices for processing in new jobs.
  • Job creation and management: Notices identified through queries can trigger the creation of a new job. During processing, notices are locked to prevent concurrent job handling, avoiding race conditions. Once a transformation is completed or fails, the lock is released, allowing further actions.
Clone this wiki locally