Drain and Replace EKS Worker Nodes

Drain and Replace EKS Worker Nodes

By : -

Unliked managed node groups, EKS worker node groups have to be recycled outside EKS after updating the AMI reference. I recently migrated all of my hosted sites from ECS to EKS and am using terragrunt/terraform for all infrastructure as code. I then upgraded the AMI that I used for the worker node groups and had to write a script to recycle the nodes gracefully.

First, get the current version string and set it in the variable K8S_VERSION. This will ensure that only out-dated nodes are replaced and ensure that the script is idempotent. This value can easily be acquired via kubectl get nodes as long as you remove the “v”.

K8S_VERSION="1.21.2-eks-55daa9d"

Next, get a list of EKS nodes that require recycling.

nodes=$(kubectl get nodes -o jsonpath="{.items[?(@.status.nodeInfo.kubeletVersion==\"v$K8S_VERSION\")].metadata.name}")

Finally, iterate through the list of EKS nodes. On each node:

  • Retrieve the EC2 instance ID.
  • Drain the node.
  • Delete the node.
  • Terminate the EC2 instance.
  • Wait to allow autoscaling to replace the node and allow the node to start taking pods.

Here is the full script – I have added delays to allow me to validate the process as it progresses.

#!/bin/bash


K8S_VERSION="1.21.2-eks-55daa9d"
nodes=$(kubectl get nodes -o jsonpath="{.items[?(@.status.nodeInfo.kubeletVersion==\"v$K8S_VERSION\")].metadata.name}")

for node in ${nodes[@]}
do
  echo "node: ${node}"
  ec2_instance_id=$(aws ec2 describe-instances --query 'Reservations[*].Instances[*].{Instance:InstanceId}' --filters Name=private-dns-name,Values=${node} --output text)
  echo "ec2 instance ID: ${ec2_instance_id}"

  echo "draining node:  ${node}"
  kubectl drain --ignore-daemonsets --delete-emptydir-data --force ${node}
  sleep 120 
  echo "deleting node:  ${node}"
  kubectl delete node ${node}
  sleep 120 
  echo "terminating ec2 instance:  ${ec2_instance_id}"
  aws ec2 terminate-instances --instance-id ${ec2_instance_id}

  sleep 600 
done

How do you handle rolling upgrades of AMIs on EKS worker node groups?

Leave a Reply

Your email address will not be published. Required fields are marked *