You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users of the operator want to monitor backup failures and successes, in particular to alert on failed backups or a lack of successful ones.
Design
A metric will be added to the exposed operator metrics as a counter of successes and failures for EtcdBackupSchedule resources. This counter will be labelled by the namespace and name of the EtcdBackupSchedule resource.
Other options
Instrumenting all backups
All backups could be counted by building our counter from EtcdBackup resources directly. However as the backup resource has no unique name to operate on, and has only a list of endpoints, there's no good way to provide a unique identity of which cluster is being backed up.
Without labels on the metric it would be hard to identify from a dashboard or alert which etcd cluster (if there are multiple) is failing to backup.
Not using a metric
Alternatively, all of this information is available in the Kubernetes API anyway via a status field on EtcdBackup resources. However this relies on an Kuberntes administrator using and configuring something like kube-state-metrics to support alerts and dashboards on this data.
The text was updated successfully, but these errors were encountered:
Users of the operator want to monitor backup failures and successes, in particular to alert on failed backups or a lack of successful ones.
Design
A metric will be added to the exposed operator metrics as a counter of successes and failures for
EtcdBackupSchedule
resources. This counter will be labelled by the namespace and name of theEtcdBackupSchedule
resource.Other options
Instrumenting all backups
All backups could be counted by building our counter from
EtcdBackup
resources directly. However as the backup resource has no unique name to operate on, and has only a list of endpoints, there's no good way to provide a unique identity of which cluster is being backed up.Without labels on the metric it would be hard to identify from a dashboard or alert which etcd cluster (if there are multiple) is failing to backup.
Not using a metric
Alternatively, all of this information is available in the Kubernetes API anyway via a status field on
EtcdBackup
resources. However this relies on an Kuberntes administrator using and configuring something like kube-state-metrics to support alerts and dashboards on this data.The text was updated successfully, but these errors were encountered: