Skip to content

Commit

Permalink
Merge pull request #52 from mcarbonne/feat_monitor_disk_space_increase
Browse files Browse the repository at this point in the history
Feat monitor disk space increase
  • Loading branch information
mcarbonne authored Oct 8, 2024
2 parents 0d6cf83 + 55426e3 commit 46aef5c
Show file tree
Hide file tree
Showing 14 changed files with 394 additions and 39 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### unreleased
- `threshold_percent` replaced by `threshold` in `filesystemusage` provider ([details](README.md#filesystemusage)).

### 2.0.0
- `config.json` is now `config.yml`
- `scrape_interval` (for scrapers) is now a string with unit. Before, it was an integer (seconds). Example: `scrape_interval: 120` is now `scrape_interval: 120s` (or even `scrape_interval: 2m`).
- `filesystemusage` provider has been reworked to allow automatic mountpoints detection. See [here](#filesystemusage) for details.
21 changes: 13 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,10 @@ Pre-built images are available on github packages:

For automatic updates ([watchtower](https://github.com/containrrr/watchtower), [podman-auto-update](https://docs.podman.io/en/latest/markdown/podman-auto-update.1.html)...), using the lastest major tag available (`ghcr.io/mcarbonne/minimal-server-monitoring:2`) is recommanded to avoid breaking changes.

## Migrations
### 1.x to 2.x
- `config.json` is now `config.yml`
- `scrape_interval` (for scrapers) is now a string with unit. Before, it was an integer (seconds). Example: `scrape_interval: 120` is now `scrape_interval: 120s` (or even `scrape_interval: 2m`).
- `filesystemusage` provider has been reworked to allow automatic mountpoints detection. See [here](#filesystemusage) for details.
## Changelog

See [here](CHANGELOG.md).


## Minimal configuration
### Default config.yml: container, services and available disk space monitoring, with shoutrrr alerts
Expand Down Expand Up @@ -109,7 +108,7 @@ docker run \
- container status (check if started)
- container restart (check if restarting forever)
#### filesystemusage
- provide one state per mountpoint (check if enough free disk space available)
- provide two states for each mountpoint (check if there is enough free disk space available and if there are rapid changes)
- multiple instances allowed

|parameter|description|required|default value|
Expand All @@ -118,7 +117,13 @@ docker run \
|fstypes|list of file system types to consider|no|[ext4, btrfs]|
|mountpoint_blacklist|list of mountpoints to ignore|no|[]|
|mountpoint_whitelist|list of mountpoints to monitor. **When set, `fstypes` and `mountpoint_blacklist` are ignored and autodiscovery is skipped**|no|[]|
|threshold_percent|minimum threshold (percentage) of available disk space|no|20|
|threshold|minimum threshold of available disk space<sup>1</sup>|no|20%|
|rate_threshold|rate threshold over rate_threshold_window period<sup>1,2</sup>|no|0.5%|
|rate_threshold_window|window duration<sup>2</sup>|no|5m|

1. thresholds might either be relative (20%) or absolute (50m, 20gb ...). Absolute parsing is done using `ParseBytes` from [go-humanize](https://github.com/dustin/go-humanize), supported prefix list is available [here](https://github.com/dustin/go-humanize/blob/master/bytes.go).
2. rate threshold triggers an alert when remaining disk space changes by more than `rate_threshold` over `rate_threshold_window` (both increase and decrease).
Note: `rate_threshold` must be greater than or equal to `scrape_interval`.

#### ping
- provide one state per target (is target reachable)
Expand Down Expand Up @@ -165,7 +170,7 @@ scrapers:
params:
mountpoints:
- "/"
threshold_percent: 15
threshold: 15%

```

Expand Down
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ require (
github.com/containrrr/shoutrrr v0.8.0
github.com/coreos/go-systemd/v22 v22.5.0
github.com/docker/docker v27.3.1+incompatible
github.com/dustin/go-humanize v1.0.1
github.com/goccy/go-yaml v1.12.0
github.com/moby/sys/mountinfo v0.7.2
golang.org/x/sys v0.26.0
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ github.com/docker/go-connections v0.5.0 h1:USnMq7hx7gwdVZq1L49hLXaFtUdTADjXGp+uj
github.com/docker/go-connections v0.5.0/go.mod h1:ov60Kzw0kKElRwhNs9UlUHAE/F9Fe6GLaXnqyDdmEXc=
github.com/docker/go-units v0.5.0 h1:69rxXcBk27SvSaaxTtLh/8llcHD8vYHT7WSdRZ/jvr4=
github.com/docker/go-units v0.5.0/go.mod h1:fgPhTUdO+D/Jk86RDLlptpiXQzgHJF7gydDDbaIK4Dk=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/fatih/color v1.17.0 h1:GlRw1BRJxkpqUCBKzKOw098ed57fEsKeNjpTe3cSjK4=
github.com/fatih/color v1.17.0/go.mod h1:YZ7TlrGPkiz6ku9fK3TLD/pl3CpsiFyu8N92HLgmosI=
github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg=
Expand Down
2 changes: 1 addition & 1 deletion pkg/scraping/provider/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ func LoadProviderFromConfig(ctx context.Context, cfg Config) (Provider, error) {
case "systemd":
return NewProviderSystemd(ctx, cfg.Params)
case "filesystemusage":
return NewProviderFileSystemUsage(cfg.Params)
return NewProviderFileSystemUsage(cfg.Params, cfg.ScrapeInterval)
default:
return nil, fmt.Errorf("illegal provider type: %v", cfg.Type)
}
Expand Down
71 changes: 61 additions & 10 deletions pkg/scraping/provider/filesystemusage.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,76 @@ package provider

import (
"context"
"fmt"
"math"
"reflect"
"slices"
"strings"
"time"

"github.com/dustin/go-humanize"
"github.com/mcarbonne/minimal-server-monitoring/pkg/logging"
"github.com/mcarbonne/minimal-server-monitoring/pkg/storage"
"github.com/mcarbonne/minimal-server-monitoring/pkg/utils"
"github.com/mcarbonne/minimal-server-monitoring/pkg/utils/configmapper"
"github.com/mcarbonne/minimal-server-monitoring/pkg/utils/stats"
"github.com/moby/sys/mountinfo"
"golang.org/x/sys/unix"
)

type ProviderFileSystemUsage struct {
MountPrefix string `json:"mountprefix" default:""` // Host root filesytem when running inside a container
FSTypeWhitelist []string `json:"fstypes" default:"[ext4, btrfs]"`
MountPointBlacklist []string `json:"mountpoint_blacklist" default:"[]"`
MountPointWhitelist []string `json:"mountpoint_whitelist" default:"[]"`
SpaceRemainingThreshold uint `json:"threshold_percent" default:"20"`
MountPrefix string `json:"mountprefix" default:""` // Host root filesytem when running inside a container
FSTypeWhitelist []string `json:"fstypes" default:"[ext4, btrfs]"`
MountPointBlacklist []string `json:"mountpoint_blacklist" default:"[]"`
MountPointWhitelist []string `json:"mountpoint_whitelist" default:"[]"`
SpaceRemainingThreshold utils.RelativeAbsoluteValue `json:"threshold" default:"20%" custom:"relative_absolute_value"`
RateThreshold utils.RelativeAbsoluteValue `json:"rate_threshold" default:"0.5%" custom:"relative_absolute_value"`
RateThresholdWindow time.Duration `json:"rate_threshold_window" default:"5m"`

mountPointStats map[string]*stats.WindowCollector[uint64]
}

func NewProviderFileSystemUsage(params map[string]any) (Provider, error) {
cfg, err := configmapper.MapOnStruct[ProviderFileSystemUsage](params)
func NewProviderFileSystemUsage(params map[string]any, scrapeInterval time.Duration) (Provider, error) {
mapperCtx := configmapper.MakeContext()
mapperCtx.RegisterCustomParser("relative_absolute_value", func(s string) (reflect.Value, error) {
value, err := utils.RelativeAbsoluteValueFromString(s)
if err != nil {
return reflect.Value{}, err
} else {
return reflect.ValueOf(value), nil
}
})
cfg, err := configmapper.MapOnStructWithContext[ProviderFileSystemUsage](&mapperCtx, params)
cfg.mountPointStats = make(map[string]*stats.WindowCollector[uint64])
if cfg.RateThresholdWindow < scrapeInterval {
return nil, fmt.Errorf("rate_threshold must be greater than or equal to scrape_interval")
}
return &cfg, err
}

func (provider *ProviderFileSystemUsage) updateSpaceIncreaseStats(metric MetricWrapper, mountPoint string, remainingSpace, totalSpace uint64) {
_, ok := provider.mountPointStats[mountPoint]
if !ok {
v := stats.MakeWindowCollector[uint64](provider.RateThresholdWindow)
provider.mountPointStats[mountPoint] = &v
}
mpStats := provider.mountPointStats[mountPoint]
mpStats.AddNew(remainingSpace)
if mpStats.Count() >= 2 {
first := mpStats.First()
last := mpStats.Last()
rate := math.Abs(float64(last.Data-first.Data) / (last.Timestamp.Sub(first.Timestamp).Seconds()))
threshold := float64(float64(provider.RateThreshold.GetValue(totalSpace)) / provider.RateThresholdWindow.Seconds())
if rate >= threshold {
metric.PushFailure("available space changed quickly from %v to %v in %v",
humanize.Bytes(first.Data), humanize.Bytes(last.Data),
last.Timestamp.Sub(first.Timestamp).Round(time.Second))
} else {
metric.PushOK()
}
}
}

func (provider *ProviderFileSystemUsage) checkMountPoint(resultWrapper *ScrapeResultWrapper, mountPoint string) {
var stat unix.Statfs_t

Expand All @@ -36,13 +83,17 @@ func (provider *ProviderFileSystemUsage) checkMountPoint(resultWrapper *ScrapeRe
}

metric := resultWrapper.Metric("filesystemusage_"+prettyMountpoint, "mountpoint "+prettyMountpoint)
metricInc := resultWrapper.Metric("filesystemusage_"+prettyMountpoint+"_rate", "mountpoint "+prettyMountpoint)

if err != nil {
metric.PushFailure("unable to get remaining space: %v", err)
} else {
remainingSpace := 100 * stat.Bavail / stat.Blocks
if remainingSpace < uint64(provider.SpaceRemainingThreshold) {
metric.PushFailure("low space remaining (%v%%)", remainingSpace)
remainingSpace := stat.Bavail * uint64(stat.Bsize)
totalSpace := stat.Blocks * uint64(stat.Bsize)
provider.updateSpaceIncreaseStats(metricInc, mountPoint, remainingSpace, totalSpace)

if remainingSpace < provider.SpaceRemainingThreshold.GetValue(totalSpace) {
metric.PushFailure("low space remaining (%v%% / %v)", 100*remainingSpace/totalSpace, humanize.Bytes(remainingSpace))
} else {
metric.PushOK()
}
Expand Down
41 changes: 41 additions & 0 deletions pkg/utils/configmapper/context.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
package configmapper

import (
"errors"
"fmt"
"reflect"
)

type CustomParserFunction func(string) (reflect.Value, error)

type Context struct {
customParsers map[string]CustomParserFunction
}

func MakeContext() Context {
return Context{
customParsers: make(map[string]CustomParserFunction),
}
}

func (c *Context) RegisterCustomParser(name string, lambda CustomParserFunction) error {
_, ok := c.customParsers[name]
if ok {
return errors.New("Custom parser " + name + " already registered")
} else {
c.customParsers[name] = lambda
return nil
}
}

func (ctx *Context) getCustomParserIfAny(structField *reflect.StructField) (*CustomParserFunction, error) {
if tag, ok := structField.Tag.Lookup("custom"); ok {
parser, ok := ctx.customParsers[tag]
if ok {
return &parser, nil
} else {
return nil, fmt.Errorf("custom parser missing for '%v'", tag)
}
}
return nil, nil
}
Loading

0 comments on commit 46aef5c

Please sign in to comment.