AWS Elastic Load Balancing provides redundant access to computing resources such as EC2 instances. By placing several instances behind a single load balancer, you ensure that traffic will only go to active instances, and if any node becomes unhealthy, the load balancer will automatically drive traffic away from them. This is configured using target groups which group instances together in a pool, and defines what healthy means. While this system provides various reporting facilities, there’s no direct way to parse the health status of those targets using custom filters, so that you can be notified if a target becomes unhealthy, but not if the instance has been shut down for maintenance on purpose, for example. This is where using a Lambda function can come in handy.
For this tutorial, we’ll need the following resources:
- One or more EC2 instances mapped to one or more target groups. These are the instances that we will monitor the health of.
- An ELB load balancer driving web traffic to those target groups. You can create one on the EC2 page in the AWS console, mapping specific hostnames to target groups.
- A SNS topic setup to receive notifications. This can be mapped to an email address or SMS phone number, so that you will get alerts when an instance becomes unhealthy.
The type of application you’re using doesn’t matter, nor does the type of target. When you add instances to a target group, you can configure what healthy and what unhealthy is defined as, such as a specific HTTP status code. All our Lambda function will do is check whether the system is healthy or not.
Once your resources are deployed, create a new Lambda function using the Python interpreter by going to the AWS console, Lambda, and selecting Create function. You will be shown a basic function structure to which we’ll add some code.
First, let’s import boto3, the AWS library, and enumerate regions:
import boto3
import json
results = []
client = boto3.client('ec2', region_name='us-east-1')
regions = client.describe_regions()['Regions']
Note that if you’re only using one region, you could skip that part. Then let’s enumerate all target groups in each region:
for region in regions:
elb = boto3.client('elbv2', region['RegionName'])
tgs = elb.describe_target_groups()
Now, let’s enumerate the health of all targets in each target group and pull the health information of each target, including the instance ID, the port being monitored by the load balancer, the state (healthy or unhealthy) and the description. Note that the description is not always provided:
for tg in tgs['TargetGroups']:
tgh = elb.describe_target_health(TargetGroupArn=tg['TargetGroupArn'])
for t in tgh['TargetHealthDescriptions']:
result = {'id': t['Target']['Id'], 'port': t['Target']['Port'], 'state': t['TargetHealth']['State'], 'reason': ""}
if 'Description' in t['TargetHealth']:
result['reason'] = t['TargetHealth']['Description']
The most interesting reason to create this function is that now we can decide when we want to be alerted about a target’s health status. For example, we only want to be notified if the state is set to unhealthy. However, we don’t want to be notified if the instance is simply down, just if there is an actual error code. So let’s add the result to our list only if those statements are true:
if (result['state'] != "healthy" and result['state'] != "initial" and result['reason'] != "Target is in the stopped state"):
results.append("Instance {} on port {} reported {} status. Reason: {}".format(result['id'], result['port'], result['state'], result['reason']))
Here you can add any additional conditions you want. Finally, all we have left to do is send a message off to our SNS topic if there are any positive results:
if len(results) > 0:
client = boto3.client('sns')
response = client.publish(
TargetArn="arn:aws:sns:us-east-1:XXXXXXXXXXXXXXX",
Message=json.dumps({'default': json.dumps(results, sort_keys=False, indent=4)}),
Subject='Targets health failure',
MessageStructure='json'
)
You’ll need to change the TargetArn to correspond to your own topic.
Here is the finished code:
That’s all the function needs, and you should now be able to save and test it by pressing the Test button at the top of the console page. If any target is unhealthy and meets the conditions defined, you should get a message listing the instances that should be looked at. Now of course, you don’t want to have to run the function manually constantly in order to check your targets, so let’s add a CloudWatch Event that will run our function automatically every 5 minutes.
On the AWS console, select CloudWatch and click on the Rules option on the left side. Here, create a new rule by clicking the Create rule button, give it a meaningful name and under Targets select your newly created Lambda function. Under Event source, select a schedule of 5 minutes, and save your rule. Now, your Lambda function should automatically run every 5 minutes and warn you if any of your target becomes unhealthy, as defined by your own custom conditions.