At agorapulse, we're divided into severals tribes, and tribes contains several squads.
Each squads are owner of their own infrastructure.
Their complete infrastructure is as code
, using Terraform HCL language.
We're using a monorepo that contains every tribes and squad folders.
- tribe-name-1:
- foo
- bar
- tribe-name-2:
- baz
- foobar
If we take the case of conversation tribe, the conversation folder contains:
services
: containing all sources for serverless functionsapps
: containing all sources for apisinfra
: containing all terraform resources for all of their infrastructure:app
: deploying only api apps (and new relic alerts like checks 5XX error, instances count,... )lambda
: deploying AWS serverless functions (and new relic alerts like runtimes issues, overall status )data
: containing all AWS structured data (ElasticSearch, MySQL, Redis Cluster ... and ... alerting as well)monitoring
: containing NewRelic workload, and SLOs business based.
On this monitoring layer, we would like to add some SLIs.
We are populating a new relic custom event called PublisherEvent
.
This table contains the status
(info
, retry
, error
, success
) and the type of publication (mediaType
) of publication posted on social networks we use (tiktok, linkedin, facebook, youtube, instagram, ...).
We add several SLIs by network,
- slis based on all publications,
- but also based on publication that doesn't contain videos (due to network issues, file size, ...)
see it into monitoring/slo.tf
resource "newrelic_service_level" "publishing_per_services" {
for_each = local.services
guid = module.workload.guid
name = format("Publishing on %s", each.value.name)
description = format("Proportion of Success Publisher on service %s.", each.value.name)
events {
account_id = var.new_relic_account_id
valid_events {
from = "PublisherEvent"
where = format("service = '%s' AND status != 'info' AND status != 'retry' AND environments='%s'", each.value.service, var.env)
}
bad_events {
from = "PublisherEvent"
where = format("service = '%s' AND environments='%s' AND status = 'error' %s ", each.value.service, var.env, lookup(each.value, "append_nrql", ""))
}
}
objective {
target = each.value.objective
time_window {
rolling {
count = 7
unit = "DAY"
}
}
}
}
cd monitoring
AWS_PROFILE=beta terraform init
AWS_PROFILE=beta terraform apply -var-file=environments/beta.tfvars -var="new_relic_api_key=NRAK-FOOBAR"