AWS Monitoring, Audit and Performance

AWS CloudWatch Metrics

β€’ CloudWatch provides metrics for every services in AWS

β€’ Metric is a variable to monitor (CPUUtilization, NetworkIn…)

β€’ Metrics belong to namespaces

β€’ Dimension is an attribute of a metric (instance id, environment, etc…).

β€’ Up to 10 dimensions per metric

β€’ Metrics have timestamps

β€’ Can create CloudWatch dashboards of metrics

EC2 Detailed monitoring

β€’ EC2 instance metrics have metrics β€œevery 5 minutes”

β€’ With detailed monitoring (for a cost), you get data β€œevery 1 minute”

β€’ Use detailed monitoring if you want to scale faster for your ASG!

β€’ The AWS Free Tier allows us to have 10 detailed monitoring metrics

β€’ Note: EC2 Memory usage is by default not pushed (must be pushed from inside the instance as a custom metric)

CloudWatch Custom Metrics

β€’ Possibility to define and send your own custom metrics to CloudWatch

β€’ Example: memory (RAM) usage, disk space, number of logged in users …

β€’ Use API call PutMetricData

β€’ Ability to use dimensions (attributes) to segment metrics

β€’ Instance.id

β€’ Environment.name

β€’ Metric resolution (StorageResolution API parameter – two possible value):

β€’ Standard: 1 minute (60 seconds)

β€’ High Resolution: 1/5/10/30 second(s) – Higher cost

β€’ Important: Accepts metric data points two weeks in the past and two hours in the future (make sure to configure your EC2 instance time correctly)

CloudWatch Dashboards

β€’ Great way to setup custom dashboards for quick access to key metrics and alarms

β€’ Dashboards are global

β€’ Dashboards can include graphs from different AWS accounts and regions

β€’ You can change the time zone & time range of the dashboards

β€’ You can setup automatic refresh (10s, 1m, 2m, 5m, 15m)

β€’ Dashboards can be shared with people who don’t have an AWS account (public, email address, 3rd party SSO provider through Amazon Cognito)

β€’ Pricing:

β€’ 3 dashboards (up to 50 metrics) for free

β€’ $3/dashboard/month afterwards

you can use multiple AWS accounts and regions

CloudWatch Logs

β€’ Log groups: arbitrary name, usually representing an application

β€’ Log stream: instances within application / log files / containers

β€’ Can define log expiration policies (never expire, 30 days, etc..)

β€’ CloudWatch Logs can send logs to:

β€’ Amazon S3 (exports)

β€’ Kinesis Data Streams

β€’ Kinesis Data Firehose

β€’ AWS Lambda

β€’ ElasticSearch

CloudWatch LogsSources

β€’ SDK, CloudWatch Logs Agent, CloudWatch Unified Agent

β€’ Elastic Beanstalk: collection of logs from application

β€’ ECS: collection from containers

β€’ AWS Lambda: collection from function logs

β€’ VPC Flow Logs: VPC specific logs

β€’ API Gateway

β€’ CloudTrail based on filter

β€’ Route53: Log DNS querie

CloudWatch Logs Metric Filter & Insights

β€’ CloudWatch Logs can use filter expressions

β€’ For example, find a specific IP inside of a log

β€’ Or count occurrences of β€œERROR” in your logs

β€’ Metric filters can be used to trigger CloudWatch alarms

β€’ CloudWatch Logs Insights can be used to query logs and add queries to CloudWatch Dashboards

CloudWatch Logs – S3 Export

β€’ Log data can take up to 12 hours to become available for export

β€’ The API call is CreateExportTask

β€’ Not near-real time or real-time… use Logs Subscriptions instead

CloudWatch Logs for EC2

β€’ By default, no logs from your EC2 machine will go to CloudWatch

β€’ You need to run a CloudWatch agent on EC2 to push the log files you want

β€’ Make sure IAM permissions are correct

β€’ The CloudWatch log agent can be setup on-premises too

CloudWatch Logs Agent & Unified Agent

β€’ For virtual servers (EC2 instances, on-premises servers…)

β€’ CloudWatch Logs Agent

β€’ Old version of the agent

β€’ Can only send to CloudWatch Logs

β€’ CloudWatch Unified Agent

β€’ Collect additional system-level metrics such as RAM, processes, etc…

β€’ Collect logs to send to CloudWatch Logs

β€’ Centralized configuration using SSM Parameter Store

CloudWatch Unified Agent – Metrics

β€’ Collected directly on your Linux server / EC2 instance

β€’ CPU (active, guest, idle, system, user, steal)

β€’ Disk metrics (free, used, total), Disk IO (writes, reads, bytes, iops)

β€’ RAM (free, inactive, used, total, cached)

β€’ Netstat (number of TCP and UDP connections, net packets, bytes)

β€’ Processes (total, dead, bloqued, idle, running, sleep)

β€’ Swap Space (free, used, used %)

β€’ Reminder: out-of-the box metrics for EC2 – disk, CPU, network (high level)

CloudWatch Alarms

β€’ Alarms are used to trigger notifications for any metric

β€’ Various options (sampling, %, max, min, etc…)

β€’ Alarm States:

β€’ OK

β€’ INSUFFICIENT_DATA

β€’ ALARM

β€’ Period:

β€’ Length of time in seconds to evaluate the metric

β€’ High resolution custom metrics: 10 sec, 30 sec or multiples of 60 sec

CloudWatch Alarm Targets

β€’ Stop, Terminate, Reboot, or Recover an EC2 Instance

β€’ Trigger Auto Scaling Action

β€’ Send notification to SNS (from which you can do pretty much anything)

CloudWatch Events

β€’ Event Pattern: Intercept events from AWS services (Sources)

β€’ Example sources: EC2 Instance Start, CodeBuild Failure, S3, Trusted Advisor

β€’ Can intercept any API call with CloudTrail integration

β€’ Schedule or Cron (example: create an event every 4 hours)

β€’ A JSON payload is created from the event and passed to a target…

β€’ Compute: Lambda, Batch, ECS task

β€’ Integration: SQS, SNS, Kinesis Data Streams, Kinesis Data Firehose

β€’ Orchestration: Step Functions, CodePipeline, CodeBuild

β€’ Maintenance: SSM, EC2 Actions

Amazon EventBridge

β€’ EventBridge is the next evolution of CloudWatch Events

β€’ Default Event Bus – generated by AWS services (CloudWatch Events)

β€’ Partner Event Bus – receive events from SaaS service or applications (Zendesk, DataDog, Segment, Auth0…)

β€’ Custom Event Buses – for your own applications

β€’ Event buses can be accessed by other AWS accounts

β€’ You can archive events (all/filter) sent to an event bus (indefinitely or set period)

β€’ Ability to replay archived events

β€’ Rules: how to process the events (like CloudWatch Events)

Amazon EventBridge – Schema Registry

β€’ EventBridge can analyze the events in your bus and infer the schema

β€’ The Schema Registry allows you to generate code for your application, that will know in advance how data is structured in the event bus

β€’ Schema can be versioned

Amazon EventBridge – Resource-based Policy

β€’ Manage permissions for a specific Event Bus

β€’ Example: allow/deny events from another AWS account or AWS region

β€’ Use case: aggregate all events from your AWS Organization in a single AWS account or AWS region

Amazon EventBridge vs CloudWatch Events

β€’ Amazon EventBridge builds upon and extends CloudWatch Events.

β€’ It uses the same service API and endpoint, and the same underlying service infrastructure.

β€’ EventBridge allows extension to add event buses for your custom applications and your third-party SaaS apps.

β€’ Event Bridge has the Schema Registry capability

β€’ EventBridge has a different name to mark the new capabilities

β€’ Over time, the CloudWatch Events name will be replaced with EventBridge.

AWS CloudTrail

β€’ Provides governance, compliance and audit for your AWS Account

β€’ CloudTrail is enabled by default!

β€’ Get an history of events / API calls made within your AWS Account by:

β€’ Console

β€’ SDK

β€’ CLI

β€’ AWS Services

β€’ Can put logs from CloudTrail into CloudWatch Logs or S3

β€’ A trail can be applied to All Regions (default) or a single Region.

β€’ If a resource is deleted in AWS, investigate CloudTrail first!

CloudTrail Events

β€’ Management Events:

β€’ Operations that are performed on resources in your AWS account

β€’ Examples:

β€’ Configuring security (IAM AttachRolePolicy)

β€’ Configuring rules for routing data (Amazon EC2 CreateSubnet)

β€’ Setting up logging (AWS CloudTrail CreateTrail)

β€’ By default, trails are configured to log management events.

β€’ Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)

β€’ Data Events:

β€’ By default, data events are not logged (because high volume operations)

β€’ Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject): can separate Read and Write Events

β€’ AWS Lambda function execution activity (the Invoke API)

β€’ CloudTrail Insights Events:

β€’ See next slide

CloudTrail Insights

β€’ Enable CloudTrail Insights to detect unusual activity in your account:

β€’ inaccurate resource provisioning

β€’ hitting service limits

β€’ Bursts of AWS IAM actions

β€’ Gaps in periodic maintenance activity

β€’ CloudTrail Insights analyzes normal management events to create a baseline

β€’ And then continuously analyzes write events to detect unusual patterns

β€’ Anomalies appear in the CloudTrail console

β€’ Event is sent to Amazon S3

β€’ An EventBridge event is generated (for automation needs)

AWS Config

β€’ Helps with auditing and recording compliance of your AWS resources

β€’ Helps record configurations and changes over time

β€’ Questions that can be solved by AWS Config:

β€’ Is there unrestricted SSH access to my security groups?

β€’ Do my buckets have any public access?

β€’ How has my ALB configuration changed over time?

β€’ You can receive alerts (SNS notifications) for any changes

β€’ AWS Config is a per-region service

β€’ Can be aggregated across regions and accounts

β€’ Possibility of storing the configuration data into S3 (analyzed by Athena)

Config Rules

β€’ Can use AWS managed config rules (over 75)

β€’ Can make custom config rules (must be defined in AWS Lambda)

β€’ Ex: evaluate if each EBS disk is of type gp2

β€’ Ex: evaluate if each EC2 instance is t2.micro

β€’ Rules can be evaluated / triggered:

β€’ For each config change

β€’ And / or: at regular time intervals

β€’ AWS Config Rules does not prevent actions from happening (no deny)

β€’ Pricing: no free tier, $0.003 per configuration item recorded per region, $0.001 per config rule evaluation per region

\

CloudWatch vs CloudTrail vs Config

β€’ CloudWatch

β€’ Performance monitoring (metrics, CPU, network, etc…) & dashboards

β€’ Events & Alerting

β€’ Log Aggregation & Analysis

β€’ CloudTrail

β€’ Record API calls made within your Account by everyone

β€’ Can define trails for specific resources

β€’ Global Service

β€’ Config

β€’ Record configuration changes

β€’ Evaluate resources against compliance rules

β€’ Get timeline of changes and complian

For an Elastic Load Balancer

β€’ CloudWatch:

β€’ Monitoring Incoming connections metric

β€’ Visualize error codes as % over time

β€’ Make a dashboard to get an idea of your load balancer performance

β€’ Config:

β€’ Track security group rules for the Load Balancer

β€’ Track configuration changes for the Load Balancer

β€’ Ensure an SSL certificate is always assigned to the Load Balancer (compliance)

β€’ CloudTrail:

β€’ Track who made any changes to the Load Balancer with API calls

Last updated