AWS Monitoring, Audit and Performance
AWS CloudWatch Metrics
• CloudWatch provides metrics for every services in AWS
• Metric is a variable to monitor (CPUUtilization, NetworkIn…)
• Metrics belong to namespaces
• Dimension is an attribute of a metric (instance id, environment, etc…).
• Up to 10 dimensions per metric
• Metrics have timestamps
• Can create CloudWatch dashboards of metrics
EC2 Detailed monitoring
• EC2 instance metrics have metrics “every 5 minutes”
• With detailed monitoring (for a cost), you get data “every 1 minute”
• Use detailed monitoring if you want to scale faster for your ASG!
• The AWS Free Tier allows us to have 10 detailed monitoring metrics
• Note: EC2 Memory usage is by default not pushed (must be pushed from inside the instance as a custom metric)
CloudWatch Custom Metrics
• Possibility to define and send your own custom metrics to CloudWatch
• Example: memory (RAM) usage, disk space, number of logged in users …
• Use API call PutMetricData
• Ability to use dimensions (attributes) to segment metrics
• Instance.id
• Environment.name
• Metric resolution (StorageResolution API parameter – two possible value):
• Standard: 1 minute (60 seconds)
• High Resolution: 1/5/10/30 second(s) – Higher cost
• Important: Accepts metric data points two weeks in the past and two hours in the future (make sure to configure your EC2 instance time correctly)
aws cloudwatch put-metric-data --metric-name Buffers --namespace MyNameSpace --unit Bytes --value 231434333 --dimensions InstanceID=1-23456789,InstanceType=m1.small
CloudWatch Dashboards
• Great way to setup custom dashboards for quick access to key metrics and alarms
• Dashboards are global
• Dashboards can include graphs from different AWS accounts and regions
• You can change the time zone & time range of the dashboards
• You can setup automatic refresh (10s, 1m, 2m, 5m, 15m)
• Dashboards can be shared with people who don’t have an AWS account (public, email address, 3rd party SSO provider through Amazon Cognito)
• Pricing:
• 3 dashboards (up to 50 metrics) for free
• $3/dashboard/month afterwards
you can use multiple AWS accounts and regions

CloudWatch Logs
• Log groups: arbitrary name, usually representing an application
• Log stream: instances within application / log files / containers
• Can define log expiration policies (never expire, 30 days, etc..)
• CloudWatch Logs can send logs to:
• Amazon S3 (exports)
• Kinesis Data Streams
• Kinesis Data Firehose
• AWS Lambda
• ElasticSearch
CloudWatch LogsSources
• SDK, CloudWatch Logs Agent, CloudWatch Unified Agent
• Elastic Beanstalk: collection of logs from application
• ECS: collection from containers
• AWS Lambda: collection from function logs
• VPC Flow Logs: VPC specific logs
• API Gateway
• CloudTrail based on filter
• Route53: Log DNS querie
CloudWatch Logs Metric Filter & Insights
• CloudWatch Logs can use filter expressions
• For example, find a specific IP inside of a log
• Or count occurrences of “ERROR” in your logs
• Metric filters can be used to trigger CloudWatch alarms
• CloudWatch Logs Insights can be used to query logs and add queries to CloudWatch Dashboards

CloudWatch Logs – S3 Export
• Log data can take up to 12 hours to become available for export
• The API call is CreateExportTask
• Not near-real time or real-time… use Logs Subscriptions instead



CloudWatch Logs for EC2
• By default, no logs from your EC2 machine will go to CloudWatch
• You need to run a CloudWatch agent on EC2 to push the log files you want
• Make sure IAM permissions are correct
• The CloudWatch log agent can be setup on-premises too
CloudWatch Logs Agent & Unified Agent
• For virtual servers (EC2 instances, on-premises servers…)
• CloudWatch Logs Agent
• Old version of the agent
• Can only send to CloudWatch Logs
• CloudWatch Unified Agent
• Collect additional system-level metrics such as RAM, processes, etc…
• Collect logs to send to CloudWatch Logs
• Centralized configuration using SSM Parameter Store
CloudWatch Unified Agent – Metrics
• Collected directly on your Linux server / EC2 instance
• CPU (active, guest, idle, system, user, steal)
• Disk metrics (free, used, total), Disk IO (writes, reads, bytes, iops)
• RAM (free, inactive, used, total, cached)
• Netstat (number of TCP and UDP connections, net packets, bytes)
• Processes (total, dead, bloqued, idle, running, sleep)
• Swap Space (free, used, used %)
• Reminder: out-of-the box metrics for EC2 – disk, CPU, network (high level)
CloudWatch Alarms
• Alarms are used to trigger notifications for any metric
• Various options (sampling, %, max, min, etc…)
• Alarm States:
• OK
• INSUFFICIENT_DATA
• ALARM
• Period:
• Length of time in seconds to evaluate the metric
• High resolution custom metrics: 10 sec, 30 sec or multiples of 60 sec
CloudWatch Alarm Targets
• Stop, Terminate, Reboot, or Recover an EC2 Instance
• Trigger Auto Scaling Action
• Send notification to SNS (from which you can do pretty much anything)

CloudWatch Events
• Event Pattern: Intercept events from AWS services (Sources)
• Example sources: EC2 Instance Start, CodeBuild Failure, S3, Trusted Advisor
• Can intercept any API call with CloudTrail integration
• Schedule or Cron (example: create an event every 4 hours)
• A JSON payload is created from the event and passed to a target…
• Compute: Lambda, Batch, ECS task
• Integration: SQS, SNS, Kinesis Data Streams, Kinesis Data Firehose
• Orchestration: Step Functions, CodePipeline, CodeBuild
• Maintenance: SSM, EC2 Actions
Amazon EventBridge
• EventBridge is the next evolution of CloudWatch Events
• Default Event Bus – generated by AWS services (CloudWatch Events)
• Partner Event Bus – receive events from SaaS service or applications (Zendesk, DataDog, Segment, Auth0…)
• Custom Event Buses – for your own applications
• Event buses can be accessed by other AWS accounts
• You can archive events (all/filter) sent to an event bus (indefinitely or set period)
• Ability to replay archived events
• Rules: how to process the events (like CloudWatch Events)
Amazon EventBridge – Schema Registry
• EventBridge can analyze the events in your bus and infer the schema
• The Schema Registry allows you to generate code for your application, that will know in advance how data is structured in the event bus
• Schema can be versioned

Amazon EventBridge – Resource-based Policy
• Manage permissions for a specific Event Bus
• Example: allow/deny events from another AWS account or AWS region
• Use case: aggregate all events from your AWS Organization in a single AWS account or AWS region

Amazon EventBridge vs CloudWatch Events
• Amazon EventBridge builds upon and extends CloudWatch Events.
• It uses the same service API and endpoint, and the same underlying service infrastructure.
• EventBridge allows extension to add event buses for your custom applications and your third-party SaaS apps.
• Event Bridge has the Schema Registry capability
• EventBridge has a different name to mark the new capabilities
• Over time, the CloudWatch Events name will be replaced with EventBridge.
AWS CloudTrail
• Provides governance, compliance and audit for your AWS Account
• CloudTrail is enabled by default!
• Get an history of events / API calls made within your AWS Account by:
• Console
• SDK
• CLI
• AWS Services
• Can put logs from CloudTrail into CloudWatch Logs or S3
• A trail can be applied to All Regions (default) or a single Region.
• If a resource is deleted in AWS, investigate CloudTrail first!

CloudTrail Events
• Management Events:
• Operations that are performed on resources in your AWS account
• Examples:
• Configuring security (IAM AttachRolePolicy)
• Configuring rules for routing data (Amazon EC2 CreateSubnet)
• Setting up logging (AWS CloudTrail CreateTrail)
• By default, trails are configured to log management events.
• Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
• Data Events:
• By default, data events are not logged (because high volume operations)
• Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject): can separate Read and Write Events
• AWS Lambda function execution activity (the Invoke API)
• CloudTrail Insights Events:
• See next slide
CloudTrail Insights
• Enable CloudTrail Insights to detect unusual activity in your account:
• inaccurate resource provisioning
• hitting service limits
• Bursts of AWS IAM actions
• Gaps in periodic maintenance activity
• CloudTrail Insights analyzes normal management events to create a baseline
• And then continuously analyzes write events to detect unusual patterns
• Anomalies appear in the CloudTrail console
• Event is sent to Amazon S3
• An EventBridge event is generated (for automation needs)


AWS Config
• Helps with auditing and recording compliance of your AWS resources
• Helps record configurations and changes over time
• Questions that can be solved by AWS Config:
• Is there unrestricted SSH access to my security groups?
• Do my buckets have any public access?
• How has my ALB configuration changed over time?
• You can receive alerts (SNS notifications) for any changes
• AWS Config is a per-region service
• Can be aggregated across regions and accounts
• Possibility of storing the configuration data into S3 (analyzed by Athena)
Config Rules
• Can use AWS managed config rules (over 75)
• Can make custom config rules (must be defined in AWS Lambda)
• Ex: evaluate if each EBS disk is of type gp2
• Ex: evaluate if each EC2 instance is t2.micro
• Rules can be evaluated / triggered:
• For each config change
• And / or: at regular time intervals
• AWS Config Rules does not prevent actions from happening (no deny)
• Pricing: no free tier, $0.003 per configuration item recorded per region, $0.001 per config rule evaluation per region

\


CloudWatch vs CloudTrail vs Config
• CloudWatch
• Performance monitoring (metrics, CPU, network, etc…) & dashboards
• Events & Alerting
• Log Aggregation & Analysis
• CloudTrail
• Record API calls made within your Account by everyone
• Can define trails for specific resources
• Global Service
• Config
• Record configuration changes
• Evaluate resources against compliance rules
• Get timeline of changes and complian
For an Elastic Load Balancer
• CloudWatch:
• Monitoring Incoming connections metric
• Visualize error codes as % over time
• Make a dashboard to get an idea of your load balancer performance
• Config:
• Track security group rules for the Load Balancer
• Track configuration changes for the Load Balancer
• Ensure an SSL certificate is always assigned to the Load Balancer (compliance)
• CloudTrail:
• Track who made any changes to the Load Balancer with API calls
Last updated