Horizontal, Vertical & Autoscaling with CloudFormation & Kubernetes (Part 2)
— CloudFormation, Kubernetes, IaC, EC2, AWS, Load Balancing, Microservices, Prometheus, Grafana — 6 min read
Scenario: Enhancing a Learning Management System (LMS) Deployment
Context:
As the root admin for lms.educationapps.vic.gov.au
, I'm responsible for ensuring the system is scalable, resilient, and performs efficiently under varying loads. This LMS is critical for delivering online education content to thousands of learners and instructors.
Horizontal Scaling with AWS Auto Scaling
Challenge: During peak usage times, such as the start of a new school term or during major online assessments, the LMS experiences significant spikes in traffic. The goal is to ensure the LMS can handle these spikes without performance degradation.
Solution: I implemented AWS Auto Scaling to dynamically adjust the number of EC2 instances based on traffic load, ensuring consistent performance.
How I did it:
- Created an Auto Scaling Group: Configured an Auto Scaling Group (ASG) with a Launch Configuration specifying the AMI, instance type, and security groups. Set the minimum, maximum, and desired number of instances based on expected traffic patterns.
- Configured Scaling Policies: Implemented target tracking scaling policies based on CPU utilization. This ensures that new instances are automatically launched when the CPU utilization of existing instances exceeds 70%, and instances are terminated when the utilization drops below 30%.
- Integrated Load Balancer: Deployed an Elastic Load Balancer (ELB) to distribute incoming traffic across multiple EC2 instances in the ASG, ensuring even load distribution and high availability.
- Continuous Monitoring: Used Amazon CloudWatch to monitor metrics and set alarms to trigger scaling activities, ensuring that the system automatically adjusts to changing traffic conditions.
Outcome: The LMS maintained high performance and availability during peak times, with the system automatically scaling out to handle increased traffic and scaling in during off-peak times to optimize costs.
AWSTemplateFormatVersion: "2010-09-09"Description: "AWS CloudFormation Template for Horizontal Scaling with Auto Scaling Group"
Parameters: InstanceType: Type: String Default: t3.medium Description: EC2 instance type
VPC: Type: AWS::EC2::VPC::Id Description: VPC for the Auto Scaling Group
SubnetIds: Type: List<AWS::EC2::Subnet::Id> Description: Subnet IDs for the Auto Scaling Group
Resources: LaunchConfiguration: # Specifies the AMI, instance type, and security group for EC2 instances. Type: AWS::AutoScaling::LaunchConfiguration Properties: ImageId: ami-0c55b159cbfafe1f0 # Replace with a valid AMI ID in your region InstanceType: !Ref InstanceType SecurityGroups: - !Ref InstanceSecurityGroup UserData: Fn::Base64: | #!/bin/bash # Install and start the LMS application yum update -y yum install -y httpd systemctl start httpd systemctl enable httpd
AutoScalingGroup: # Manages the scaling of instances based on load, with a minimum size of 1, a maximum size of 5, and a desired capacity of 2. Type: AWS::AutoScaling::AutoScalingGroup Properties: VPCZoneIdentifier: !Ref SubnetIds LaunchConfigurationName: !Ref LaunchConfiguration MinSize: 1 MaxSize: 5 DesiredCapacity: 2 TargetGroupARNs: - !Ref TargetGroup MetricsCollection: - Granularity: "1Minute" HealthCheckType: "EC2" HealthCheckGracePeriod: 300
ScaleUpPolicy: # Defines target tracking policies to scale up based on CPU utilization. Type: AWS::AutoScaling::ScalingPolicy Properties: AutoScalingGroupName: !Ref AutoScalingGroup PolicyType: "TargetTrackingScaling" TargetTrackingConfiguration: PredefinedMetricSpecification: PredefinedMetricType: ASGAverageCPUUtilization TargetValue: 50.0
ScaleDownPolicy: # Defines target tracking policies to scale down based on CPU utilization. Type: AWS::AutoScaling::ScalingPolicy Properties: AutoScalingGroupName: !Ref AutoScalingGroup PolicyType: "TargetTrackingScaling" TargetTrackingConfiguration: PredefinedMetricSpecification: PredefinedMetricType: ASGAverageCPUUtilization TargetValue: 30.0
InstanceSecurityGroup: #Allows HTTP access to the instances. Type: AWS::EC2::SecurityGroup Properties: GroupDescription: "Enable HTTP access" VpcId: !Ref VPC SecurityGroupIngress: - IpProtocol: tcp FromPort: 80 ToPort: 80 CidrIp: 0.0.0.0/0
TargetGroup: # Used by the load balancer to route traffic to the instances. Type: AWS::ElasticLoadBalancingV2::TargetGroup Properties: VpcId: !Ref VPC Port: 80 Protocol: HTTP HealthCheckProtocol: HTTP HealthCheckPort: "80" HealthCheckPath: "/" Matcher: HttpCode: "200" TargetType: instance
Outputs: AutoScalingGroupName: Description: "Auto Scaling Group Name" Value: !Ref AutoScalingGroup
Vertical Scaling with Kubernetes on AWS
Challenge: Certain compute-intensive operations within the LMS, such as video processing or large-scale data analysis, required more resources than initially provisioned. The goal was to ensure these operations could be performed efficiently without over-provisioning resources during regular usage.
Solution: I leveraged Kubernetes to manage containerized applications, allowing for vertical scaling of specific pods to handle resource-intensive tasks.
How I did it:
- Deployed Kubernetes Cluster: Set up a Kubernetes cluster on AWS using Amazon EKS (Elastic Kubernetes Service). Deployed the LMS application as a set of microservices within the cluster.
- Implemented Vertical Pod Autoscaler (VPA): Configured the Vertical Pod Autoscaler to automatically adjust the CPU and memory requests and limits for specific pods based on their observed usage. This ensured that resource-intensive tasks received the necessary resources while maintaining efficient resource utilization during normal operations.
- Utilized Horizontal Pod Autoscaler (HPA): In addition to VPA, used the Horizontal Pod Autoscaler to scale the number of pods based on traffic load. This provided a dual-layered approach to scaling, handling both high traffic and resource-intensive operations.
- Monitoring and Optimization: Implemented monitoring using Prometheus and Grafana to track resource usage and performance metrics. Continuously optimized resource requests and limits based on real-time data.
Outcome: The LMS efficiently handled compute-intensive operations by vertically scaling specific pods, while also maintaining the ability to horizontally scale the entire application during high traffic periods. This dual-scaling approach ensured optimal performance and resource utilization. By leveraging AWS Auto Scaling for horizontal scaling and Kubernetes for both horizontal and vertical scaling, I ensured that the LMS was capable of handling varying loads and resource-intensive tasks efficiently. These implementations improved the system's scalability, resilience, and performance, contributing to a better user experience for students and teachers relying on the LMS.
Steps:
- Create an EKS cluster using CloudFormation.
- Use Kubernetes manifests to configure Vertical Pod Autoscaler (VPA) for specific pods.
AWSTemplateFormatVersion: "2010-09-09"Description: "AWS CloudFormation Template for EKS Cluster"
Parameters: ClusterName: Type: String Default: "eks-cluster" Description: "The name of the EKS cluster"
NodeInstanceType: Type: String Default: t3.medium Description: "EC2 instance type for the EKS worker nodes"
NodeGroupSize: Type: Number Default: 2 Description: "The desired number of worker nodes"
Resources: EKSCluster: #Creates an EKS cluster with worker nodes. Type: AWS::EKS::Cluster Properties: Name: !Ref ClusterName ResourcesVpcConfig: SubnetIds: - subnet-12345678 # Replace with your Subnet IDs - subnet-87654321 # Replace with your Subnet IDs SecurityGroupIds: - !Ref NodeSecurityGroup
NodeGroup: #Manages the worker nodes using an Auto Scaling Group. Type: AWS::AutoScaling::AutoScalingGroup Properties: DesiredCapacity: !Ref NodeGroupSize MinSize: 1 MaxSize: 4 VPCZoneIdentifier: - subnet-12345678 # Replace with your Subnet IDs - subnet-87654321 # Replace with your Subnet IDs LaunchConfigurationName: !Ref NodeLaunchConfig TargetGroupARNs: - !Ref TargetGroup
NodeLaunchConfig: Type: AWS::AutoScaling::LaunchConfiguration Properties: InstanceType: !Ref NodeInstanceType ImageId: ami-0c55b159cbfafe1f0 # Replace with a valid EKS optimized AMI ID SecurityGroups: - !Ref NodeSecurityGroup UserData: Fn::Base64: | #!/bin/bash /etc/eks/bootstrap.sh eks-cluster
NodeSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: "EKS worker nodes security group" VpcId: vpc-12345678 # Replace with your VPC ID SecurityGroupIngress: - IpProtocol: tcp FromPort: 0 ToPort: 65535 SourceSecurityGroupId: !Ref ControlPlaneSecurityGroup
ControlPlaneSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: "EKS control plane security group" VpcId: vpc-12345678 # Replace with your VPC ID
Outputs: ClusterName: Description: "EKS Cluster Name" Value: !Ref ClusterName NodeInstanceType: Description: "EKS Node Instance Type" Value: !Ref NodeInstanceType NodeGroupSize: Description: "EKS Node Group Size" Value: !Ref NodeGroupSize
Kubernetes Manifest for Vertical Pod Autoscaler (VPA):
apiVersion: autoscaling.k8s.io/v1kind: VerticalPodAutoscaler # Configures VPA to automatically adjust the CPU and memory requests and limits for the LMS deployment based on observed usage.metadata: name: lms-vpaspec: targetRef: apiVersion: "apps/v1" kind: Deployment name: lms-deployment updatePolicy: updateMode: "Auto"---apiVersion: apps/v1kind: Deployment # Defines the LMS application deployment with initial resource requests and limits, which can be adjusted by VPA.metadata: name: lms-deploymentspec: replicas: 2 selector: matchLabels: app: lms template: metadata: labels: app: lms spec: containers: - name: lms-container image: my-lms-image:latest # Replace with your LMS image resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "1000m" memory: "2Gi"
Share this post!
Thanks for reading! Don't forget to smash that share button and subscribe.