-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Update readme and refactor
- Review Designs within the context of Data Sovereignty;
- Research CLI wrapper alternative to CDKTF
- Review Database Sharding within the context of Data Products’ data: https://aws.amazon.com/what-is/database-sharding/
- Review Value of data
- Verify CATs’ Project Update: Factory & Executor components; Invoice, Order, Function, Executor, & BOM Block Designs, Structure’s Ray Cluster Deployment on Kubernetese, BOM Initialization, CAT Node & Node Design
- Research Dynamic Terraform Providers for Plant Deployments
- Verify CATs’ Project Update: Structure Block Design, Data Service Collaboration Diagram, Ray Integration
- Watched Computational Governance Panel
- Review Ray documentation for InfraFunction Hooks
- Research Open Contracting Data Standard with respect to Data Product Teams: https://standard.open-contracting.org/latest/en/
- 1/22:
- Updated CATs integration tests and demo
- Resolved dependency bug
- Verify CATs’ Project Update: Process Component, Sub-Process Logging, Executor & Function Components
- 1/23:
- Updated Documentation and Demo
- Added License and Packaging for CATs
- Verify CATs’ Project Update: s3 & CoD Integration
- 1/24:
- Updated Documentation & Refactor
- CATs Data Verification
- Verify CATs’ Project Update: Updating Order Structure, Node, Service & Structure Components
- 1/25 - 1/26:
- Updated Documentation & Refactor
- Update Factory
- Reviewed Novo Nordisk Data Mesh Platform discussion
- Verify CATs’ Project Update: CATs s3 cache, BOM ERD
- Included Ubuntu 20.04 Installation Update
- Refactored CATs
- Researched CAT cache access management
- Research Economic Adapters for CATs from Ocean Protocol
- Research multilevel linked-list for CATs’ subgraph
- Research bidirectional mapping supports multilevel linked-list for CATs’ subgraph
- Consider Transducers for CAT MIMO
- Updated PR Template
- Review Model-Driven Engineering: https://en.wikipedia.org/wiki/Model-driven_engineering
- 2/12: Drafted CATs capabilities in GitHub Project and reviewed Activity Artifact Policy
- 2/13: Reviewed implementation examples of Data Contracts
- 2/14 - 2/15:
- Reviewed Data Mesh Roundtable Discussions about Data Contracts and “Agile” Data Products
- Attended Protocol Labs project updates
- 2/16: Research System Architecture layers and wrote notes as Data Contract Article for CATs
Data Mesh Resources:
- “Inside a Data Contract”: https://www.youtube.com/watch?v=ye4geXMuJKs
- “Agile in Data”: https://www.youtube.com/watch?v=XnstATam0jM
- Data Contract Articles: https://www.datamesh-architecture.com/#data-contract
Data Contract Implementation Examples:
- https://blog.det.life/data-contracts-a-guide-to-implementation-86cf9b032065
- https://levelup.gitconnected.com/create-a-web-scraping-pipeline-with-python-using-data-contracts-281a30440442
- https://docs.soda.io/soda/data-contracts.html
System Architecture:
What does a CATs data contract do?
Data Contract is a Service agreement between producer and consumer with attribute dependencies for downstream Data Product evolution with dedicated lineage. A data contracts can provide tools for collaboration on data requirements as product promises within a shared context that inform policies for contract mutation along side Data Product releases.
A Data Contract’s Product Promises are what the data product owners expect from its data consumer up to the latest block of information. These promises may include data quality, data usage terms and conditions, schema, service-objectives, billing, etc. Data Contract policy mutation cascaded downstream as bilateral lateral agreements that “forks” lineage as a new Data Product version. For Example, the consumer takes the risk of violating privacy. Data Producers create Data Contracts on Organization and Business Terms. The consumer of the Data Contract enforces Governance policies. The producer of the Data Contract owns the Data Product if the organization doesn't have a Governance body.
Governance policies are discussed between data producers and consumers to agree upon data producer requirements. These discussions should culminate into an amenable data structure / dataset. Structured data is conducive for pre-exsisting policies and less discussion. Less structured data will need more discussion and policy feedback loops. We need a Minimal Viable Data Contract that includes what is necessary for an organization to govern with the means of supporting policy feedback loops in a way that guides discussion in a way that balances the prioritization of outcomes and methodologies.
Interdependent data domains have sub-domains with identifiers for generating Data Products. CAT Nodes will generate and execute Virtual Data Products composed as Data Contracts that enforce Data Provenance using Bills of Materials (BOMs). BOMs are CATs' Content-Addressed Data Provenance record for verifiable data processing and transport on a Mesh network of CAT Nodes. Data Contracts will contain a BOMs lineages and act as block headers for Content-Addressed Transformers (CATs) instances. Data Products are mutated during policy feedback loops informed collaborators communicating their understanding of knowledge domains. Collaborators will identify knowledge sub-domains with references and will access sub-domains using Content-Addresses. Access is federated via knowledge domain hierarchies in abstractions that enable collaborators to participate in governance cycles by leveraging their understanding of knowledge.
- 2/19 - 2/21: Contextualize value of BOM within the context of Data as a Product that contains Data Contracts
- 2/22 - 2/23: Updated Readme informed by examples of Data Assets within the context of Machine-Readable Cataloging
Resources:
- https://www.loc.gov/marc/umb/um01to06.html
- https://docs.informatica.com/data-engineering/data-engineering-quality/10-2-1/business-glossary-guide/glossary-content-management/business-term-links/data-asset.html
What is a Content-Addressed Data Asset (CADA)?
CATs Data Products will consist of Data Contracts with provenance as executable BOMs lineages and act as block headers for Content-Addressed Transformers (CATs) instances that contain Data Assets. BOMs are CATs' Content-Addressed Data Provenance record for verifiable data processing and transport on a Mesh network of CAT Nodes that can contain Data Assets. A data asset may be a system or application output” (dataset) that holds value for an organization or individual that is accessible. Data Assets’ value can derive from the data's potential for generating insights, informing decision-making, contributing to product development, enhancing operational efficiency, or creating economic benefits through its sale or exchange.
CATs' Content-Addressed Data Assets are processed, sold / exchanged / published on CAT’s Data Mesh via CAT Nodes subsumed by downstream CATs’ Data Products. Data Assets consist of the following:
- Data Domains - "A predefined or user-defined Model repository object that represents the functional meaning of an" attribute "based on column data or column name such as" account identification.
- Data Objects - Content-Addresses of data sources used to extract metadata for analysis.
- 2/26: Researched Digital Asset Management related Data Contracts and Data Mesh Registry & considered a Rule Asset being used for Network Policies in addition to Attribute Quality
- 2/27: Considered Data & Rule Assets for Data Mesh Registry Artifact Schema
- https://towardsdatascience.com/the-data-mesh-registry-a-window-into-your-data-mesh-20dece35e05a
- https://docs.informatica.com/data-engineering/data-engineering-quality/10-2-1/business-glossary-guide/glossary-content-management/business-term-links/data-asset.html
- https://docs.informatica.com/data-engineering/data-engineering-quality/10-2-1/business-glossary-guide/glossary-content-management/business-term-links/rule-asset.html
- 2/28: Verify CATs Executing FaaS on PaaS
- 2/29: Review Domain-Oriented Ownership with respect to Conway's law
- 3/1: Review Data Column Lineage value to in establishing Domain-Oriented Ownership in CATs Invoice in a way that makes BOM’s searchable and discoverable
What makes CATs Governable by including BOMs within Data Product’s Data Contracts?
CATs are governable and support multi-disciplinary collaboration of data processing because CATs Architectural Quantum is an abstract governance model enforced within CATs’ Bills-Of-Materials (BOMs) for which knowledge domains are represented as meta-data of data provenance records to support domain ownership.
BOMs are unique identifiers that provide the means of data production (assembly) and transportation as reproducible lineage contextualised by knowledge domains for federated governance. BOMs consist of Data Product service Orders of data processing that are Invoiced as fulfillments of service agreements specified by Data Product’s Data Contracts
Federated Governance is enabled by BOMs due the following. The domain specific data provenance BOMs establish the legitimacy of network policy changes suggested by Fractional Stewards of Data Products by enabling them to identify data quality issues at their source on a self-serviced Data Platform of many Data Products.
CATs enables Fractional Stewards to do this because historical data production is contextualised and reproducible within the scope of their knowledge domains by design during development and production as a requirement of a service Order. CATs data processes submitted by their service Orders are Invoiced to fulfil agreements within Data Products’ Data Contracts.
A Data Contract is a Service agreement between producer and consumer with attribute dependencies for downstream Data Product evolution with dedicated lineage. Governance policy discussions between data producers and consumers in policy feedback loops about data production requirements should balance the prioritization of outcomes and methodologies should culminate into an amenable data structure / dataset.
- 3/4: Contextualize “Data as an asset” with CATs Architecture
- 3/5: Contextualize Data sovereignty with “Data as an asset” for CATs Data Mesh
- 3/6: Contextually map Data Contract initialization roles to cross-functional Operational Model for Data Products
- 3/7: Contextually map "Fractional Ownership" of "Decentralized Data Objects" ("DDOs" / "Data Assets") to "Data as an asset" and Data Partioning / Sharding
- 3/8: Contextualize Ocean Protocol & CATs Architecture with prosumption
“Data as an asset” enables the consumption, production, prosumption of Data Assets on CATs Data Mesh
“Data as an asset” 0. conceptually emphasizes recognizing and treating data as a strategic investment organizations can leverage to deliver future economic benefits by enabling the consumption, production, prosumption of ones own data as an asset. Prosumption is the consumption and production of value, "either for self-consumption or consumption by others, and can receive implicit or explicit incentives from organizations involved in the exchange." 1.
The availability of high-quality and domain-specified Data Assets enables Data Products on inter-connected CAT Nodes on CATs Data Mesh to facilitate cross-functional asset utilization within Data Initiatives in a way that support Data Sovereignty. "Data sovereignty refers to a group or individual’s right to control and maintain their own data, which includes the collection, storage, and interpretation of data." 2.
Registering and cataloging CATs can accelerate innovative Data Product creation and facilitate Data Sovereignty in Data Initiatives that discover and utilize “Data as an asset”. Data Products use and operate CAT Nodes to produce, register, and catalog “Data as an asset” as searchable and discoverable Data Assets by Data Products on CATs Data Mesh. CATs Data Assets enhances strategic, operational, and analysis informed decision-making by using BOMs as feedback loop mechanisms across domains in a way that suits specific collaborative contexts across organizations.
Resources:
- 3/11: Review ocean Data NFTs and Datatokens and relate Hexagonal architecture to Data Contract SLAs
- https://docs.oceanprotocol.com/developers/contracts/datanft-and-datatoken
- https://en.wikipedia.org/wiki/Non-fungible_token#:~:text=A%20non%2Dfungible%20token%20(NFT,to%20be%20sold%20and%20traded.
- https://en.wikipedia.org/wiki/Hexagonal_architecture_(software)
- https://blog.thepete.net/blog/2020/09/25/service-templates-service-chassis/
- 3/12
- Review Bidirectional Mapping libraries for Data Mesh BOM graph for cataloged representation
- example: https://github.com/jab/bidict
- Review Custom Terraform Provider software that enables providers to be written in any language for CATs Plant
- Review Model-Based System Engineering relate it to knowledge organization infrastructure
- https://medium.com/block-science/knowledge-networks-and-the-politics-of-protocols-af81ad0fa2d4
- Review Bidirectional Mapping libraries for Data Mesh BOM graph for cataloged representation
- 3/13 - 3/15
- Review 4 kinds of data moats within the context of data’s strategic value as a “data asset”
- Review Model-driven architecture approaches for CATs Architectural Quantum
- Review ocean.py for integration into CATs’ ingress and egress
- Review “Commons-based peer production” for CAT Node
- Updated CATs architecture, readme, and interactive logs
- 3/18 - 3/20: Contextualize data contract creation team' role responsibilities into modern roles
- 3/21 - 3/22:
- Contextualize modern data contract creation team' role responsibilities into CATs Control and Action planes for an operational model for the placement of Data Stewardship responsibilities
- Communicate the value of Data Contract inclusion in BOM bellow.
Why should Data Contracts be included in CATs' BOMs for Data Product development on a Data Mesh?
Data Product(s) CATs are executed by Data Contract deployments with Data Provenance by Ordering CATs that are Invoiced within Bills of Materials (BOMs). BOMs are CATs' Content-Addressed Data Provenance record for verifiable data processing and transport on CAT Mesh. Data Contracts will contain BOM lineages and act as headers for Content-Addressed Transformer instances (CATs). Their inclusion of BOMs are necessary for organizations to rapidly mutate Data Products alongside discussions that affect product outcomes and development methodologies.
Data Products are mutated during stakeholder discussions about Data Contracts with respect to network policy / protocol. These discussions continuously inform multi-lateral Data Product agreements between stakeholders and collaborators that produce and consume data using BOMs as feedback loop mechanisms for (re)submitting CAT Orders. These discussions should also culminate into a CAT Order of amenable data structures / datasets for which processing is Invoiced within BOMs. Collaborators can participate in data provenance supported product development by Content-Addressing Data as an Asset.
- 3/25 - 3/27:
- Review Bitol's Data Contract examples
- Review Data Contract Implementation Guide for CATs
- Review Wayfair's differentiation of Data Mesh design lean personas: Data Producer, Data Consumers, and Data Engineer
- Contextualize IBMs Knowledge Catalog as a DataOps tool in consideration of KMS and CAT-aloging
- Review Statistical Process Control to contextualize the inclusion of https://www.soda.io/
- Research data product life cycle to contextualize Data Product Manager, Data Steward, and Data Engineer
- 3/28 - 3/29:
- Contextualize a Federated Governance Model within Federated Computational Governance
- Research types of Data Valuation to avoid confirmation bias
- Contextualize Event-Driven programming for CAT Plant and Dataflow programming for CATs Process and InfrFunction
- 4/1 - 4/3:
- Research "Stewardship Fractalization" and System Architecture facilitating it and relate it to Data Stewardship
- Consider Dynamic Prompt engineering using Generative AI via an LLM for contextualization of CAT Actions that fulfill Data Contracts. These actions are initially contextualized with CATs Architectural Quantum.
- 4/4 - 4/5:
- Distinguish between Quantitative and Qualitative design drivers for end-user and data product consumer contextualization
- Consider a Streaming Data Integration for Stewardship lineage views and metadata management
- Consider each CAT Factory Client a Stream Broker as a Consumer and Producer (https://www.scaler.com/topics/kafka-broker/)
- Consider "IoT Edge-Application Management" for "IoT Analytics"
- Consider a language like SISAL for stream dataflow composition
- Review updated CoD Architecture
- Research how Analysts supports domain-oriented ownership in consideration of data procurement
- Research "telemetry data pipelines" from starburst.io to contextualize a “telemetry-catalog” in "data lakehouse" as a flatfile store
- Consider Data Engineering pain points to split and contextualize Data Engineering within CATs Action & Control Planes
- Distinguish the difference between Data Lakes and Data Federation for the implementation of a data lake solution
- Research GPT to communicate a Federated Governance Model designed to be a GPT
- 4/15:
- Contextualize LLMs and Generative AI for Fractional Data Stewardship
- Reduce scope of Data Product with Stewarship Fractionaliztion dApp steps
- Note Dataflow Programming for CAT
- Note Data Flow Architecture for project definition
- Note Statistical process control (SPC) (as user responsibility)
- 4/16-18:
- Apply Manufacturing Production to BOM design with respect to an Engineering & Manufacturing BOM types
- Contextualize CAT orders with a Transfer (Network) Function
- Contextually lift Mesh partnership with Model-Based Institution Design (MBID) and relate to Model-Based System Engineering in preperation to include Computer-Aided Governance in CATs3
- Research LangGraph for CAT Mesh reification
- Note different types of SBOMs for each CAT Arch Quantum SubComponents
- Consider Multi-Agent Conversation for row-wise business function
- https://arxiv.org/abs/2308.08155
- https://github.com/langchain-ai/langgraph/blob/main/examples/multi_agent/multi-agent-collaboration.ipynb
- Consider Pro-curation for on-boarding information onto CAT Mesh reflective of Prosumer
- Research integrating langgraph
tool_node
into CAR (Content-Addressable Router) - Research integrating langgraph
tool_executor
into CATs' Executor - Review LangChain Agents for Network Governance Reification graph state tracking
- Review "Knowledge Networks and the Politics of Protocols" within the context of Roles
- Review "Engineering for Legitimacy"
- Review Scaled and Leveled Stewardship
- Review contextualization of responsibilities based on Prompt Engineering Questions & general responsibilities of "Fractional Stewards"
- Review Project Roadmap for Stewardship Fractalization in consideration for CAT Team Dynamics
- Review Fractional Stewardship MVP approach in consideration to publishing a Policy development in Steward profile to Agent Nodes in LangGraph. These Policies are front loaded as "algorithmic suggestions"
- Note Abstract User Stories as application references
- Review "DAO Governance Model" for comparison to Federated Computational Governance Model
- Consider Marketing Steward using Prompt Engineering / partial input being a "Comparison Table/Matrix summarizing different Stewardship Organization/Solutions missions/purposes, designs and features"
- Removing s3 cache from CATs and replace with local storage solution
- Removed s3 cache from CATs and replaced with local storage solution
- Research adaptive Retrieval Augmented Generation (aRAG)
- Reviewed KMS-identity for integration into CATs
- Read "A Language for Studying Knowledge Networks: The Ethnography of LLMs"
- The Plant is a Transfer Function that accepts an Order as Input and produces and Output with by executing Function (Process) with Executor (Actuator) that executes a Process(es). The Plant exposes the control variable (u(t)) for Control Feedback Loop and the Function (Process) produces the process variable (y(t)). The Process Variable is the Statistical Process Control of CATs Dataset I/O (Ingress/Egress)
- Docker can be executed within an Alpine Linux Docker container ["Docker in Docker" (DinD)] for upcoming cadCAD's nested Block executions as a summation of the control variable (u(t)) that configure CATs Data Product and the summation of the process variable (y(t))
- Note: "Integral windup particularly occurs as a limitation of physical systems, compared with ideal systems, due to the ideal output being physically impossible (process saturation: the output of the process being limited at the top or bottom of its scale, making the error constant)."
- Concern: "Integral windup particularly occurs as a limitation of physical systems, compared with ideal systems, due to
the ideal output being physically impossible (process saturation: the
output of the process being limited at the top or bottom of its scale, making the error constant)."
- https://en.wikipedia.org/wiki/Integral_windup
- Alleviated by "A CAT at its core is a unit of computational work specified by the triplet 1) what the input is, 2) what does the computation, and 3) what the output is. Controllers require feedback, which is currently outside of the scope of a single cat. Any cyclic orchestration must be external to CATs." - BlockScience
- Alpine Linux Docker can be the execution paradigm of cadCAD and CATs Plant because they can run as Docker inside Docker "DinD” to and functionally map cadCAD multi-dimensional blocks to CAT Functions
- Review RAG stewardship fictionalization context
- Review Software Governance with respect to fractional stewardship
- Consider a Stewardship Profile that maps to agents within a Multi-agent system
- Consider roles as Architectural Responsibilities with respect to RolePlayer
- Review Docker workload on-boarding for cat Refactor
- Cosider homestar (Everywhere Computer network) for IPVM inclusion for "resilience, certainty or portability"
- Updated Bacalhau Node and refactor for CoD interoperability for CATs v3
- Exposed ingress and egress to action plane via Process with a interoperable integration point for CATs v3
- Included data product disciplines to CATs Architectural Quantum for CATs v3
- Implement InfraStructure Sub Component separately
- IPFS daemon initiated by CAT Node
- partially implement function for applying sbom
- Refactor infrafunction composes Processor & Plant and Infrstructure
- Installed KMS locally for cat/rid Integration
- bring your own cache otherwise it is local (bg: Expanso introduces breaking changes to bacalhau without stable release)
What is the Architectural purpose of CATs as a Function a.k.a. the ACG Monad?
-
Governance Plane: z(t)
- is for the Stewardship of a Data Product Supply Network of CATs represented as a Directed Acyclic Graph of Data Product Supply
-
Control Plane: y(t)
- is for the Networking of what is Produced as a result of Science & Engineering CATs
-
Action Plane: x(t)
- is for the Science & Engineering of Data Transformation as Computational Processing, a.k.a. CATs
Multi-Agent Collaboration (MAC) for CATs using Content-Addressable Router (CAR)
-
Design Description
- CATs and LangGraphs integration can enable a row wise business function as a Chart Tool of Multi-Agent Collaboration (MAC) if CAT Orders act as a Transfer (Network) Function implemented as an OOP Command Pattern for which CATs Ingress and Egress sub-processes can be executed by CATs’ Content-Addressable Router (CAR).
- Architectural Considerations: CATs can inform business decisions given the following:
- Action Plane: x(t)
- CAT Functions can be defined as LangGraph Call Tools executed by LangGraphs Tool Node
- CAT Factory produces CAT Executors integrated with LangGraphs Tool Executor.
- Control Plane: y(t) [aka Content-Addressable Router (CAR)]
- CAR integrated with LangGraphs Router.
- cadCAD (Network) Policies aka “Algorithmic Suggestions” can be deployed on LangGraphs Agent Nodes with specified Domain-Name references as Rule Asset RIDs
- Governance Plane: z(t)
- A GreyBox Model for as a feature parameterized Tensor Field with process variable (PV) as label
- The business function is a CATs Control & Action Matrix - a 2 dimensional representation of 3 dimensional space
- Action Plane: x(t)