Replies: 1 comment 2 replies
-
Thanks for putting together this document. I do agree with the proposed architecture in general, however I have a few questions, suggestions. API changesI'd like to introduce a ProviderletProbably I might be missing some info here, but I do not see the benefit of having I would make the This would prevent a couple of things:
|
Beta Was this translation helpful? Give feedback.
-
Problem
It is very uncommon to have a single cloud account, kubernetes cluster or docker environment, and the user experience of having to install VMClarity once per environment is not good, as discovered assets and detected security findings are only accessible from that environment.
Goal
It should be possible to install VMClarity's control plane once, and provide a single pane of glass for all assets and security findings across a user's environments. The user should be able to configure and manage scans across all environments from that single control plane, but the asset scans should remain distributed and run in the environment of the asset they are scanning.
Solution
In order to satisfy the goal, we need to connect multiple Providers to the VMClarity control plane. There is currently a 1:1 relationship between the Orchestrator and the enabled Provider.
To keep the scanning distributed, secure, and stateless we will split the orchestrator component into two parts:
When you deploy the VMClarity control plane the orchestrator will be deployed, but there will be no providers.
For each environment (AWS account, Kubernetes Cluster, Docker Daemon) that you want to connect to VMClarity, a provider will be deployed into that environment. The provider will connect back to the VMClarity control plane, feeding the API with discovered assets, as well as watching for AssetScan objects which are for assets that belong to that environment.
Each Provider will consist of two parts:
A provider will consist of an instance of the provider-runtime library initialised with a specific provider driver. Each provider will be a separate go module to avoid inter-provider go module conflicts.
An example of the main cmd function for a provider might be:
The provider-runtime library will be developed along side the control plane VMClarity code to ensure consistency between its logic and the orchestrator logic.
There should be a well defined support policy between the provider-runtime and the control plane. Current recommendation is N-1 which means that the providers can be one provider-runtime version behind the control plane to which they are connecting. This allows for the control plane to be upgraded first, and then the providers preventing a need to upgrade everything at once. However is a small enough support window that we shouldn't be limited by backward compatibility issues.
Each provider should compile into an independent binary.
Each provider will also be responsible for maintaining its own installation mechanism for the environment it supports, for example the AWS provider should maintain a CFN and the Kubernetes provider should maintain a helm chart. These are separate from the control plane installation methods which will be maintained in the core VMClarity repo.
The VMClarity API will be extended to include a new object type "Provider" which represents a Provider installed in an environment. As with all objects in the VMClarity API, it will have a unique UUID which will be used to identify that provider.
Providers will initially only have a small number of fields:
Assets in VMClarity will need to be extended to include a relationship list of any providers which discover that asset. When a providers discover an asset they will add their unique ID to that assets provider list.
When a provider is removed from the system then that provider ID will also be removed from all asset's provider list.
A new orchestrator controller which handles AssetScans will be added which has two responsibilities:
Process AssetScans in "Pending", then based on the target Asset's Provider list assign a provider ID to the AssetScan in a new field "ProviderID". Once this has occurred then the AssetScan is moved to Scheduled and is ready for the Provider responsible to pick it up.
Process AssetScans in "Pending", "Scheduled", "ReadyToScan" and "InProgress" and move them to "Aborted" if they have exceeded the configured timeout. This runs in the control plane to ensure that AssetScans get timed out even if the provider responsible has gone offline.
The helper services trivy-server, grype-server etc. Can either be deployed on the control plane, or in the provider or externally some place else. The location of these services will be a configuration input per provider, not determined by the control plane.
Alternatives considered
One alternative is that the provider installed into the cloud account exposes the GRPC endpoint directly and all the orchestration logic remains in the control plane. This would not be favoured as it requires the cloud account and control plane to expose an internet endpoint and provide an ingress connection from the control plane to the provider. In the architecture with the provider runtime all communication from the cloud account is initiated from inside the cloud account to the control plane which means there is no need for any of the components to be publically addressable or accessible.
Beta Was this translation helpful? Give feedback.
All reactions