Learn how to scale applications in Kubernetes using manual scaling, Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and Cluster Autoscaler. Understand Services and Ingress Controllers for load balancing.
Table of Contents
1. Scaling Overview
Kubernetes provides multiple ways to scale applications:
- Manual Scaling: Manually adjust replica count
- Horizontal Pod Autoscaler (HPA): Automatically scale Pods based on metrics
- Vertical Pod Autoscaler (VPA): Automatically adjust resource requests/limits
- Cluster Autoscaler: Automatically add/remove Nodes
Load balancing is handled by Services and Ingress Controllers, which distribute traffic across Pod instances.
2. Manual Scaling
The simplest way to scale is manually adjusting the replica count.
2.1 Scale Using kubectl
# Scale Deployment to 5 replicas
kubectl scale deployment nginx-deployment --replicas=5
# Scale ReplicaSet
kubectl scale replicaset nginx-replicaset --replicas=3
# Scale StatefulSet
kubectl scale statefulset web --replicas=5
2.2 Scale Using YAML
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 5 # Set desired replica count
selector:
matchLabels:
app: nginx
template:
# ... Pod template
kubectl apply -f deployment.yaml
2.3 Verify Scaling
# Watch Pods being created
kubectl get pods -w
# Check Deployment status
kubectl get deployment nginx-deployment
# Describe Deployment
kubectl describe deployment nginx-deployment
3. Horizontal Pod Autoscaler
The Horizontal Pod Autoscaler (HPA) automatically scales the number of Pods based on observed CPU utilization, memory usage, or custom metrics.
3.1 HPA Requirements
- Metrics Server must be installed (for CPU/memory metrics)
- Deployment/ReplicaSet must have resource requests defined
- For custom metrics, need metrics API adapter
3.2 HPA Definition
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
3.3 Create HPA
# Create HPA from YAML
kubectl apply -f hpa.yaml
# Create HPA imperatively
kubectl autoscale deployment nginx-deployment \
--cpu-percent=70 \
--min=2 \
--max=10
# View HPA status
kubectl get hpa
# Describe HPA
kubectl describe hpa nginx-hpa
3.4 HPA Behavior
HPA checks metrics every 15 seconds (default). Scaling decisions are based on:
- Current metric values
- Target utilization
- Number of Pods
HPA calculates: desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]
3.5 Custom Metrics
metrics:
- type: Pods
pods:
metric:
name: requests-per-second
target:
type: AverageValue
averageValue: "100"
4. Vertical Pod Autoscaler
The Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests/limits for Pods based on historical usage.
4.1 VPA Modes
- Off: Only provides recommendations
- Initial: Sets resources only at Pod creation
- Auto: Updates resources automatically (requires Pod restart)
- Recreate: Deletes and recreates Pods with new resources
4.2 VPA Definition
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: nginx-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deployment
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: nginx
minAllowed:
cpu: 100m
memory: 50Mi
maxAllowed:
cpu: 1
memory: 500Mi
4.3 VPA vs HPA
VPA and HPA serve different purposes:
- HPA: Scales number of Pods horizontally
- VPA: Adjusts resource requests/limits per Pod
- They can be used together, but not on the same metrics
5. Cluster Autoscaler
The Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster by adding or removing Nodes based on Pod scheduling needs.
5.1 How It Works
- Monitors Pods that cannot be scheduled due to insufficient resources
- Adds Nodes when Pods are pending
- Removes Nodes when they're underutilized
- Works with cloud provider Node groups
5.2 Cluster Autoscaler Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
spec:
replicas: 1
template:
spec:
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.24.0
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --nodes=1:10:node-group-name
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
5.3 Node Conditions
Cluster Autoscaler considers a Node for removal if:
- All Pods can be moved to other Nodes
- Node utilization is below threshold
- No Pods with local storage
- No Pods with PodDisruptionBudget preventing eviction
6. Services
A Service provides a stable network endpoint to access Pods. It load balances traffic across Pod instances.
6.1 Service Types
- ClusterIP: Internal cluster IP (default)
- NodePort: Exposes service on each Node's IP at a static port
- LoadBalancer: Creates external load balancer (cloud providers)
- ExternalName: Maps service to external DNS name
6.2 ClusterIP Service
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: ClusterIP
6.3 LoadBalancer Service
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: LoadBalancer
6.4 Service Load Balancing
Services use kube-proxy to load balance traffic:
- Round-robin: Distributes requests evenly
- Session affinity: Routes same client to same Pod
- Uses iptables or IPVS rules
6.5 Headless Service
A headless service (clusterIP: None) returns individual Pod IPs, useful for StatefulSets:
apiVersion: v1
kind: Service
metadata:
name: nginx-headless
spec:
clusterIP: None
selector:
app: nginx
ports:
- port: 80
7. Ingress Controllers
An Ingress provides HTTP/HTTPS routing to Services based on hostname and path. It requires an Ingress Controller.
7.1 Ingress Controllers
Popular Ingress Controllers:
- NGINX Ingress Controller: Most popular
- Traefik: Cloud-native reverse proxy
- HAProxy: High-performance load balancer
- Istio Gateway: Part of service mesh
7.2 Ingress Definition
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
spec:
ingressClassName: nginx
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app-service
port:
number: 80
- host: api.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080
7.3 TLS/HTTPS
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
spec:
tls:
- hosts:
- app.example.com
secretName: tls-secret
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app-service
port:
number: 80
7.4 Install NGINX Ingress Controller
# Using Helm
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace
# Using kubectl
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.0/deploy/static/provider/cloud/deploy.yaml
8. Best Practices
8.1 Set Resource Requests
Always set resource requests for HPA to work correctly. Without requests, HPA cannot calculate utilization.
8.2 Use HPA for Stateless Applications
HPA works best with stateless applications. For stateful applications, consider StatefulSets with manual scaling.
8.3 Monitor Autoscaling Behavior
Monitor HPA decisions and adjust thresholds based on actual application behavior. Use metrics to understand scaling patterns.
8.4 Use Readiness Probes
Ensure Pods have readiness probes so Services only route traffic to healthy Pods during scaling events.
8.5 Configure PodDisruptionBudgets
Use PodDisruptionBudgets to ensure minimum availability during rolling updates and Node maintenance.
8.6 Use Ingress for HTTP/HTTPS
Use Ingress instead of LoadBalancer Services for HTTP/HTTPS traffic to save costs and simplify routing.
8.7 Enable Metrics Server
Ensure Metrics Server is installed and running for HPA to function. It's required for CPU and memory-based autoscaling.
Summary: Kubernetes provides multiple scaling mechanisms: manual scaling, HPA for Pod scaling, VPA for resource adjustment, and Cluster Autoscaler for Node scaling. Services provide load balancing, while Ingress Controllers handle HTTP/HTTPS routing. Always set resource requests and monitor autoscaling behavior.
0 Comments