This document is provided for informational purposes only. It represents the current product offerings and practices from Amazon Web Services (AWS) as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS products or services, each of which is provided “as is” without warranty of any kind, whether express or implied. This document does not create any warranties, representations, contractual commitments, conditions, or assurances from AWS, its affiliates, suppliers, or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.
© 2021 Amazon Web Services, Inc. or its affiliates. All Rights Reserved. This work is licensed under a Creative Commons Attribution 4.0 International License.
This AWS Content is provided subject to the terms of the AWS Customer Agreement available at http://aws.amazon.com/agreement or other written agreement between the Customer and either Amazon Web Services, Inc. or Amazon Web Services EMEA SARL or both.
- The playbooks should be stored in markdown format in a git repository.
- Create a print friendly self-contained version of each playbook for sharing with those that do not have access to the git repository.
- It is recommended that incident response team members are allowed to branch the git project and clone to their local environment, such as PC, VDI, or laptop.
- When improvements are made to the branch, submit them to reviewed, approved and merged into the master.
- The git project will host documentation and code.
- Consider using CD/CI pipeline to facilitate playbook governance and deployment.
- Threat: Describes the threat that is been addressed by the playbook
- Endgame: Describes the desired outcomes for the playbook based on the security perspective of the AWS Cloud Adoption Framework (CAF) and industry accepted security patterns, such as, vulnerability assessment and impact analysis.
- Response steps: Provides step-by-step procedure in chronological order to respond to the event based on NIST 800-61r2 - Computer Security Incident Response Guide. Refer to figure A.
- Simulation [CODE]: Provides step-by-step procedure to generate the indicators required to trigger the alert initiating the response.
- Incident classification, handling, and detection: Categorizes the playbook per MITRE ATT&CK enterprise tactics, enumerates the tools required for running the playbook, enumerates the indicators (a.k.a. findings) used for detection generating the alert, log sources required to generate indicators and facilitate analysis, and the teams involved.
- Incident handling process: Prescriptive guidance to follow for each discipline of the response. These are not in chronological order of execution, they are for reference while going through 3. Response steps. Throughout the response process, it is important to document all actions performed and centralize all collected evidence in a known repository with proper entitlements based on the incident response team RACI. It is also essential to have suitable communication channels during the response with centralized orchestration capabilities, that will enable the incident response team members to focus on their tasks within their specialties. The centralized orchestrating function will keep proper communications flowing and make sure all activities required are been performed with business knowledge and approval.
- Analysis - alert validation: The contents of the notification for the alert need to be verified directly against the source generating indicators (i.e “ground truth“). This is required for at least two reasons. First, in general, the original indicators are transformed as they flow through different systems until it reaches the incident response team what could cause unintended modification of critical incident data. Second, the notification could have been generated due to machine or human configuration error.
- Analysis - alert triage: Based on the indicators presented by the alert, search the logs that were used to generate them building context to facilitate the analysis.
- Analysis - scope: Search evidence in the available and alert-related log sources to determine the activity the actor has performed through the life-cycle of the event.
- Analysis - impact: Determine what workloads and components have been affected by the actor’s activity and its extent. Formulate hypothesis to discuss with the greater incident response team to determine impact to the business and prioritize next steps.
- Containment: Determine the containment strategy for the incident during the progress or end of the analysis phase. The appropriate strategy is dependent upon scope and impact, and approved by workload owners. Containment activities should be aligned with the affected workload’s threat modeling and actual context during incident response. A few different or complementary containment options should be available to minimize possible collateral effect, such as down time, data destruction, as a result of containment actions. During this phase, due care during evidence collection must be followed. Although forensics data aggregation started during analysis, e.g. CloudTrail logs, VPC Flow logs, and GuardDuty logs, this is the most appropriate time for snapshots and backups, as the services are reconfigured to prevent further activity, e.g. EC2 instance snapshots, RDS database backups, and Lambda trigger removal.
- Eradication & Recovery: Resources provisioned by the adversary are disabled or completely removed, and the vulnerabilities and configuration issues of all affected resources are identified. After workload owner approval, all resources are properly reconfigured and security updates are applied to reduce likelihood of success from the same or similar exploits. Reassessment of the security posture of the workload is completed. After configuration changes and security updates are applied, if the security posture of the workload still poses unacceptable level of risk for the business, recommendations for medium and long term architectural changes should be drafted and submitted for evaluation by the responsible and accountable teams.
- Post-incident activity: After all previous phases are completed, an in-depth analysis of the incident is performed in the form of “lessons learned” sessions. The incident response timeline is published and discussed focusing on the changes that need to be made to enhance preparedness for the next security incident. During this phase, the incident response team focus is to enhance directive, preventative, detection and response controls (Refer to figure B) not only for the workload affected, but for all workloads owned by the enterprise.
Figure A - Incident Response Life Cycle
Figure B - Security Perspective of the AWS Cloud Adoption Framework
- Select the threat the playbook will address and describe it in the
1. Threat section
. Provide as many references as needed that would assist the playbook reader understand it. - Review the playbook template section
2. Endgame section
and make changes or keep as is. Those are based in AWS security and industry patterns such as the Security Perspective of the CAF, Security pillar of the AWS Well-Architected Framework, Amazon Web Services: Overview of Security Processes, AWS Security Incident Response Guide, and NIST Special Publication 800-61r2 Computer Security Incident Handling Guide. - Fill out the section
5. Incident Classification, handling, and detection
with the appropriate information. You can go back to this section later, if during the course of building other sections of the playbook you end up uncovering new indicators, other tools you might want to use, and etc. - Define steps to trigger the indicators for the threat. Document the process, AWS resources, IAM principal, policies and code required, such as AWS CLI commands, AWS SDK based code, preferably wrapped in a shell script or a supported language code program. Add screenshots illustrating the various logs that are generated by the simulation activity using an analytical tool such as CloudWatch Insights or Athena.
- Develop response steps under the section
3. Response steps
section highlighting the NIST IR life cycle phase each step belongs to. The steps should be enumerated in the chronological order aligned with the affected workload’s threat model. Make it clear in the playbook that the chronological order is not immutable and can be changed depending on the context of the event. It is recommended that any deviations from the established execution order has to go through a previously approved vetting process, to minimize the risk of actions that could further damage the workload affected. - Create AWS CLI commands and Athena queries that supports each phase of the NIST IR life cycle in section
6. Incident handling process
. The series of queries and commands should fulfill the requirements of each phase from a general perspective and its output documented in the form of screen shots. The commands are run against the affected account using a role similar or equal to the incident responder’s entitlements. The queries are run against the related log repositories with Athena, to the minimum, CloudTrail and VPC Flow logs. If the logs required to be analyzed are not available through Athena, use the available analytical tool, such as a SIEM or big data solution.