Skip to content

Latest commit

 

History

History
527 lines (430 loc) · 21.3 KB

README.md

File metadata and controls

527 lines (430 loc) · 21.3 KB

slic-watch

serverless npm version Build Coverage Status JavaScript Style Guide

Automatic, best-practice CloudWatch Dashboards and Alarms for your SAM, CloudFormation, CDK and Serverless Framework applications.

SLIC Watch supports: AWS Lambda, API Gateway, DynamoDB, Kinesis Data Streams, SQS Queues, Step Functions, ECS (Fargate or EC2), SNS, EventBridge, Application Load Balancer and AppSync.

Supported tools include:

Contents

Getting Started with Serverless Framework

If you are using AWS SAM or CloudFormation, skip to the section below.

  1. 📦 Install the plugin:
npm install serverless-slic-watch-plugin --save-dev
  1. 🖋️ Add the plugin to the plugins section of serverless.yml:
plugins:
  - serverless-slic-watch-plugin
  1. 🪛 Optionally, add some configuration for the plugin to the custom -> slicWatch section of serverless.yml. Here, you can specify a reference to the SNS topic for alarms. This is optional, but it's usually something you want so you can receive alarm notifications via email, Slack, etc.
custom:
  slicWatch:
    alarmActionsConfig: {
      alarmActions: [{'Fn::Ref': myTopic}]
    }

See the Configuration section below for more detailed instructions on fine tuning SLIC Watch to your needs.

  1. 🚢 Deploy your application in the usual way, for example:
sls deploy
  1. 👀 Head to the CloudWatch section of the AWS Console to check out your new dashboards 📊 and alarms ⏰ !

Getting Started with AWS SAM, CDK or CloudFormation

ℹ️ IMPORTANT: If you are using AWS SAM, CDK, or just plain CloudFormation, the most important thing to know is that your AWS account/region should have the SLIC Watch Macro deployed before you do anything. Once that's done, it is very simple to add this macro as a transform to your SAM or CloudFormation template.

Deploying the SLIC Watch Macro

It would be nice if CloudFormation allowed us to publicly publish a macro so you don't need this step, but for now, you can deploy the SLIC Watch Macro using any of the following options. We have made the macro available as a Serverless Application Repository (SAR) app. This SAR app is used in Options 1 and 2 below. Option 3 is a manual option where you deploy the macro from this repository directly without using SAR.

  • Option 1 using the Service Application Repository (SAR) console: Go to SLIC Watch in the Serverless Application Repository and click the Deploy button.
  • Option 2 (using SAR with CloudFormation): If you prefer to automate the deployment of SAR apps using Infrastructure as Code, you can add the SAR app as a resource in any CloudFormation template. Note that this cannot be the same template as the application in which you want to use SLIC Watch! The snippet of CloudFormation is as follows.
 Resources:
   ...
   SlicWatchMacro:
     Type: AWS::Serverless::Application
     Properties:
       Location:
         ApplicationId: arn:aws:serverlessrepo:eu-west-1:949339270388:applications~slic-watch-app 
         SemanticVersion: <enter latest version>

To determine the list of available versions, you can use the AWS CLI:

aws serverlessrepo list-application-versions \
  --application-id arn:aws:serverlessrepo:eu-west-1:949339270388:applications/slic-watch-app
  • Option 3 (manual Macro deployment using SAM directly from source):
npm install
sam build --base-dir . --template-file cf-macro/template.yaml
sam deploy --guided

Adding the SLIC Watch Transform to SAM or CloudFormation templates

Once you have deployed the macro, you can start using SLIC Watch in SAM or CloudFormation templates by adding this to the Transform section:

Transform:
  - ...
  - SlicWatch-v3

🪛 Optionally, add some configuration for the plugin to the Metadata -> slicWatch section of template.yml. Here, you can specify a reference to the SNS topic for alarms. This is optional, but it's usually something you want so you can receive alarm notifications via email, Slack, etc.

Metadata:
  slicWatch:
    enabled: true
    alarmActionsConfig:
      alarmActions:
        - !Ref MonitoringTopic

See the Configuration section below for more detailed instructions on fine tuning SLIC Watch to your needs.

If you want to override the default alarm and dashboard settings for each Lambda Function resource, add the slicWatch property to the Metadata section.

Adding the SLIC Watch Transform to CDK Apps

Once you have deployed the macro, add it to CDK Stack in the constructor of the class that extends Stack. It should be done for every Stack in the CDK App.

// JavaScript/TypeScript:
export class MyStack extends cdk.Stack {
  constructor (scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props)

    this.addTransform('SlicWatch-v3')
    ...
  }
}
# Python:
self.add_transform("SlicWatch-v3")
// C#:
this.AddTransform("SlicWatch-v3")
// Java:
this.addTransform("SlicWatch-v3");

🪛 Optionally, add some configuration for the plugin as below:

this.templateOptions.metadata = {
  slicWatch: {
    enabled: true,
    alarmActionsConfig: {
      alarmActions: ["arn:aws:sns:eu-west-1:xxxxxxx:topic"],
      okActions: ["arn:aws:sns:eu-west-1:xxxxxxx:topic"],
      actionsEnabled: true
    }
  }
}

Features

CloudWatch Alarms and Dashboard widgets are created for all supported resources in the CloudFormation stack generated by The Serverless Framework. This includes generated resources as well as resources specified explicitly in the resources section. Any feature can be configured or disabled completely - see the section on configuration to see how.

Lambda Functions

Lambda Function alarms are created for:

  1. Errors
  2. Throttles, as a percentage of the number of invocations
  3. Duration, as a percentage of the function's configured timeout
  4. Invocations, disabled by default
  5. IteratorAge, for functions triggered by an Event Source Mapping

Lambda dashboard widgets show:

Errors Throttles Duration Average, P95 and Maximum
Errors Throttles Throttles
Invocations Concurrent Executions Iterator Age
Invocations concurrent executions Iterator Age

API Gateway

API Gateway alarms are created for:

  1. 5XX Errors
  2. 4XX Errors
  3. Latency

API Gateway dashboard widgets show:

5XX Errors 4XX Errors Latency Count
5XX Errors 4XX Errors Latency Count

DynamoDB

DynamoDB alarms are created for:

  1. Read Throttle Events (Table and GSI)
  2. Write Throttle Events (Table and GSI)
  3. UserErrors
  4. SystemErrors

Dashboard widgets are created for tables and GSIs: dynamodbGSIReadThrottle.png dynamodbGSIWriteThrottle.png dynamodbTableWriteThrottle.png

ReadThrottleEvents (Table) WriteThrottleEvent (Table)
WriteThrottleEvents Table WriteThrottleEvents Table
ReadThrottleEvents (GSI) WriteThrottleEvent (GSI)
WriteThrottleEvents GSI WriteThrottleEvents GSI

Kinesis Data Streams

Kinesis data stream alarms are created for:

  1. Iterator Age
  2. Read Provisioned Throughput Exceeded
  3. Write Provisioned Throughput Exceeded
  4. PutRecord.Success
  5. PutRecords.Success
  6. GetRecords.Success

Kinesis data stream dashboard widgets show:

Iterator Age Read Provisioned Throughput Exceeded Write Provisioned Throughput Exceeded
Iterator Age Provisioned Throughput Exceeded Put/Get Success

SQS Queues

SQS Queue alarms are create for:

  1. Age Of Oldest Message (disabled by default). If enabled, a threshold in seconds should be specified.
  2. In Flight Messages Percentage. This is a percentage of the AWS hard limits (20,000 messages for FIFO queues and 120,000 for standard queues).

SQS queue dashboard widgets show:

Messages Sent, Received and Deleted Messages Visible Age of Oldest Message
Messages Messages Visible Oldest Message

Step Functions

Step Function alarms are created for:

  1. Execution Throttled
  2. Executions Failed
  3. Executions Timed Out

The dashboard contains one widget per Step Function:

ExecutionsFailed ExecutionThrottled, ExecutionsTimedOut
Step Function widget

ECS / Fargate

ECS alarms are created for Fargate or EC2 clusters:

  1. Memory Utilization
  2. CPU Utilization

SNS

SNS alarms are created for:

  1. Number of Notifications Filtered Out due to Invalid Attributes
  2. Number of Notifications Failed

SNS Topic dashboard widgets show:

Messages Filtered Out - Invalid Attributes Notifications Failed
Invalid Attributes Notifications Failed

EventBridge

EventBridge alarms are created for:

  1. Failed Invocations
  2. Throttled Rules

EventBridge Rule dashboard widgets show:

Failed Invocations Invocations
FailedInvocations Invocations

Application Load Balancer

Application Load Balancer alarms are created for:

  1. HTTP Code ELB 5XX Count
  2. Rejected Connection Count
  3. HTTP Code Target 5XX Count
  4. UnHealthy Host Count
  5. Lambda Internal Error
  6. Lambda User Error

Application Load Balancer dashboard widgets show:

HTTP Code ELB 5XX Count HTTP Code Target 5XX Count Rejected Connection Count
HTTPCode_ELB_5XX_Count HTTPCode_Target_5XX_Count
UnHealthy Host Count Lambda User Error Lambda Internal Error
UnHealthyHostCount LambdaUserError

AppSync

AppSync alarms are created for:

  1. 5XX Error
  2. Latency

AppSync dashboard widgets show:

5XX Error, Latency, 4XX Error, Request
API Widget
Connect Server Error, Disconnect Server Error, Subscribe Server Error, Unsubscribe Server Error,PublishDataMessageServerError
Real-time Subscriptions Widget

Configuration

Configuration is entirely optional - SLIC Watch provides defaults that work out of the box.

Note: Alarm configuration is cascading. This means that configuration properties are automatically propagated from parent to children nodes (unless an override is present at the given node).

You can customize the configuration:

  • at the top level, for all resources in each service, and/or
  • at the level of individual resources

Top-level configuration

Top-level SLIC Watch configuration can be specified for all resources of each type:

  • For Serverless Framework applications, in the customslicWatch section of serverless.yml:
custom:
  slicWatch:
    enabled: true
    ...
  • For CloudFormation or SAM templates, in the MetadataslicWatch section of the template:
Metadata:
  slicWatch:
    enabled: true
    ...
  • For *CDK Stacks, the top-level SLIC Watch configuration can be set as follows.
this.templateOptions.metadata = {
  slicWatch: {
    enabled: true,
    ....
  }
}
  • The alarmActionsConfig may be optionally added to specific one or more SNS Topic destinations for all alarm status changes to ALARM and OK. If you omit destination topics, alarms are still created but are not sent to any destination. For example:
slicWatch:
  alarmActionsConfig:
    alarmActions: # Default to no actions
      - arn:aws:sns:eu-west-1:123456789012
    okActions: # Defaults to no actions
      - arn:aws:sns:eu-west-1:123456789012
    actionsEnabled: 
      - true # Defaults to true
  • Alarms or dashboards can be disabled at any level in the configuration by adding enabled: false. You can even disable all plugin functionality by specifying enabled: false at the top-level plugin configuration.

A complete set of supported options along with their defaults are shown in default-config.js

Example projects are also provided for reference:

Resource-level configuration

Alarms and dashboards for each resource can be customised using CloudFormation metadata. This configuration will take precedence over the top-level configuration.

Resources:
  regularQueue:
    Type: AWS::SQS::Queue
    Metadata:
      slicWatch:
        alarms:
          InFlightMessagesPc:
            Threshold: 95
        dashboard:
          ApproximateAgeOfOldestMessage:
            yAxis: right
          NumberOfMessagesReceived:
            enabled: false

This can be done for any CloudFormation, AWS and SAM resource. It can also be done for CDK with the following syntax.

const dlq = new sqs.Queue(this, 'DeadLetterQueue')
const cfnDlq = dlq.node.defaultChild as CfnResource
cfnDlq.cfnOptions.metadata = {
  slicWatch: {
    alarms: {
      InFlightMessagesPc: {
        Threshold: 95
      }
    },
    dashboard: {
      ApproximateAgeOfOldestMessage: {
        yAxis: 'right'
      },
      NumberOfMessagesReceived: {
        enabled: false
      }
    }
  }
}

Serverless Framework function-level configuration

Function-level configuration works a bit differently for Serverless Framework functions. Here, the slicWatch configuration parameter is set directly on the function: For each function, add the slicWatch property to configure specific overrides for alarms and dashboards relating to the AWS Lambda Function resource.

functions:
  hello:
    handler: basic-handler.hello
    slicWatch:
      dashboard:
        enabled: false    # No Lambda widgets will be created for this function
      alarms:
        Invocations:
          Threshold: 2  # The invocation threshold is specific to
                        # this function's expected invocation count

To disable all alarms for any given function, use:

functions:
  hello:
    handler: basic-handler.hello
    slicWatch:
      alarms:
        enabled: false

SAM/CloudFormation function-level configuration

Resources:
  LambdaFunction1:
    Type: AWS::Serverless::Function  # Can also be applied to AWS::Lambda::Function resources
    Properties:
      Handler: lambda1.functionHandler
    Metadata:
      slicWatch:
        alarms:
          Invocations:
            Threshold: 3
        dashboard:
          enabled: true

To disable all alarms for any given function, use:

Resources:
  LambdaFunction1:
    Type: AWS::Serverless::Function  # Can also be applied to AWS::Lambda::Function resources
    Properties:
      Handler: lambda1.functionHandler
    Metadata:
      slicWatch:
        alarms:
          enabled: false

CDK function-level configuration

const hello: lambda.Function;
const cfnFuncHello = hello.node.defaultChild as CfnResource;
cfnFuncHello.cfnOptions.metadata = {
  slicWatch: {
    alarms: {
      Invocations: {
        Threshold: 2
      }
    }
  }
}

A note on CloudWatch cost

This plugin creates additional CloudWatch resources that, apart from a limited free tier, have an associated cost. Depending on what you enable, SLIC Watch creates one dashboard and multiple alarms. The number of each depend on the number of resources in your stack and the number of stacks you have.

Check out the AWS CloudWatch Pricing page to understand the cost impact of creating CloudWatch resources.

References

Other Projects

  1. serverless-plugin-aws-alerts
  2. Real World Serverless Application - Serverless Operations
  3. CDK Watchful
  4. CDK Patterns - The CloudWatch Dashboard

Reading

  1. AWS Well Architected Serverless Applications Lens
  2. How to Monitor Lambda with CloudWatch Metrics - Yan Cui

LICENSE

Apache - LICENSE