How to Monitor EC2, CloudWatch, EBS, RDS, ELB, ElastiCache Using Metrics

AWS is the front runners when it comes to have highly available, fault tolerant, secure and high scaling service which can integrate with almost everything in cloud as well as your own data center. For monitoring your VPC(Virtual Private Cloud), AWS has these two very renowned services:
– CloudWatch and
– CloudTrail While ‘CloudTrail’ is primarily used to monitor the API calls made to other services or Applications, ‘CloudWatch’ is used for monitoring and logging events periodically (default every 5 minutes, detailed every minute). It can be used to monitor:
– Compute resources like EB2s, ELBs, Route53, Auto Scaling Groups,
– Storage & CDN resources like S3, CloudFront, EBS Volumes,
– Database and Analytics services like DynamoDB, Elastic cache, RDS, Elastic MapReduce, Redshift
– SNS, SQS etc.
Let’s cover them one by one on how CloudWatch monitors different services: A- EC2(Elastic Compute Cloud) CloudWatch can monitor the following metrics: # CPU – CPUCreditUsage (number of CPU credits consumed by the instance. One CPU credit equals one vCPU running at 100% utilization for one minute) – CPUCreditBalance (number of CPU credits available for the instance to burst beyond its base CPU utilization, expire every 24 hrs) – CPUUtilization (percentage of allocated EC2 compute units) # Network – NetworkIn (number of bytes received on all network interfaces by the instance) – NetworkOut (number of bytes sent out on all network interfaces by the instance) – NetworkPacketsIn (number of packets received on all network interfaces by the instance) – NetworkPacketsOut (number of packets sent out on all network interfaces by the instance) # Disk – DiskReadOps (Completed read operations from all instance volumes) – DiskWriteOps (Completed write operations to all instance store volumes ) – DiskReadBytes (Bytes read from all instance store volumes ) – DiskWriteBytes (Bytes written to all instance store volumes) # Status Check – StatusCheckFailed (Reports whether the instance has passed both the instance status check and the system status check in the last minute) – StatusCheckFailed_Instance (whether the instance has passed the instance status check in the last minute.) – StatusCheckFailed_System (whether the instance has passed the system status check in the last minute.) – You can create alarms based on these above metrics to watch the health of the host and the instances. Rest of the article to be continued…