Scaling and Load Balancing

Learn how to scale applications in Kubernetes using manual scaling, Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and Cluster Autoscaler. Understand Services and Ingress Controllers for load balancing.

1. Scaling Overview
2. Manual Scaling
3. Horizontal Pod Autoscaler
4. Vertical Pod Autoscaler
5. Cluster Autoscaler
6. Services
7. Ingress Controllers
8. Best Practices

1. Scaling Overview

Kubernetes provides multiple ways to scale applications:

Manual Scaling: Manually adjust replica count
Horizontal Pod Autoscaler (HPA): Automatically scale Pods based on metrics
Vertical Pod Autoscaler (VPA): Automatically adjust resource requests/limits
Cluster Autoscaler: Automatically add/remove Nodes

Load balancing is handled by Services and Ingress Controllers, which distribute traffic across Pod instances.

2. Manual Scaling

The simplest way to scale is manually adjusting the replica count.

2.1 Scale Using kubectl

# Scale Deployment to 5 replicas
kubectl scale deployment nginx-deployment --replicas=5

# Scale ReplicaSet
kubectl scale replicaset nginx-replicaset --replicas=3

# Scale StatefulSet
kubectl scale statefulset web --replicas=5

2.2 Scale Using YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 5  # Set desired replica count
  selector:
    matchLabels:
      app: nginx
  template:
    # ... Pod template

kubectl apply -f deployment.yaml

2.3 Verify Scaling

# Watch Pods being created
kubectl get pods -w

# Check Deployment status
kubectl get deployment nginx-deployment

# Describe Deployment
kubectl describe deployment nginx-deployment

3. Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) automatically scales the number of Pods based on observed CPU utilization, memory usage, or custom metrics.

3.1 HPA Requirements

Metrics Server must be installed (for CPU/memory metrics)
Deployment/ReplicaSet must have resource requests defined
For custom metrics, need metrics API adapter

3.2 HPA Definition

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

3.3 Create HPA

# Create HPA from YAML
kubectl apply -f hpa.yaml

# Create HPA imperatively
kubectl autoscale deployment nginx-deployment \
  --cpu-percent=70 \
  --min=2 \
  --max=10

# View HPA status
kubectl get hpa

# Describe HPA
kubectl describe hpa nginx-hpa

3.4 HPA Behavior

HPA checks metrics every 15 seconds (default). Scaling decisions are based on:

Current metric values
Target utilization
Number of Pods

HPA calculates: desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]

3.5 Custom Metrics

metrics:
- type: Pods
  pods:
    metric:
      name: requests-per-second
    target:
      type: AverageValue
      averageValue: "100"

4. Vertical Pod Autoscaler

The Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests/limits for Pods based on historical usage.

4.1 VPA Modes

Off: Only provides recommendations
Initial: Sets resources only at Pod creation
Auto: Updates resources automatically (requires Pod restart)
Recreate: Deletes and recreates Pods with new resources

4.2 VPA Definition

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: nginx-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: nginx
      minAllowed:
        cpu: 100m
        memory: 50Mi
      maxAllowed:
        cpu: 1
        memory: 500Mi

4.3 VPA vs HPA

VPA and HPA serve different purposes:

HPA: Scales number of Pods horizontally
VPA: Adjusts resource requests/limits per Pod
They can be used together, but not on the same metrics

5. Cluster Autoscaler

The Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster by adding or removing Nodes based on Pod scheduling needs.

5.1 How It Works

Monitors Pods that cannot be scheduled due to insufficient resources
Adds Nodes when Pods are pending
Removes Nodes when they're underutilized
Works with cloud provider Node groups

5.2 Cluster Autoscaler Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
spec:
  replicas: 1
  template:
    spec:
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.24.0
        name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --nodes=1:10:node-group-name
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false

5.3 Node Conditions

Cluster Autoscaler considers a Node for removal if:

All Pods can be moved to other Nodes
Node utilization is below threshold
No Pods with local storage
No Pods with PodDisruptionBudget preventing eviction

6. Services

A Service provides a stable network endpoint to access Pods. It load balances traffic across Pod instances.

6.1 Service Types

ClusterIP: Internal cluster IP (default)
NodePort: Exposes service on each Node's IP at a static port
LoadBalancer: Creates external load balancer (cloud providers)
ExternalName: Maps service to external DNS name

6.2 ClusterIP Service

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: ClusterIP

6.3 LoadBalancer Service

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: LoadBalancer

6.4 Service Load Balancing

Services use kube-proxy to load balance traffic:

Round-robin: Distributes requests evenly
Session affinity: Routes same client to same Pod
Uses iptables or IPVS rules

6.5 Headless Service

A headless service (clusterIP: None) returns individual Pod IPs, useful for StatefulSets:

apiVersion: v1
kind: Service
metadata:
  name: nginx-headless
spec:
  clusterIP: None
  selector:
    app: nginx
  ports:
  - port: 80

7. Ingress Controllers

An Ingress provides HTTP/HTTPS routing to Services based on hostname and path. It requires an Ingress Controller.

7.1 Ingress Controllers

Popular Ingress Controllers:

NGINX Ingress Controller: Most popular
Traefik: Cloud-native reverse proxy
HAProxy: High-performance load balancer
Istio Gateway: Part of service mesh

7.2 Ingress Definition

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
spec:
  ingressClassName: nginx
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app-service
            port:
              number: 80
  - host: api.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 8080

7.3 TLS/HTTPS

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
spec:
  tls:
  - hosts:
    - app.example.com
    secretName: tls-secret
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app-service
            port:
              number: 80

7.4 Install NGINX Ingress Controller

# Using Helm
helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace

# Using kubectl
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.0/deploy/static/provider/cloud/deploy.yaml

8. Best Practices

8.1 Set Resource Requests

Always set resource requests for HPA to work correctly. Without requests, HPA cannot calculate utilization.

8.2 Use HPA for Stateless Applications

HPA works best with stateless applications. For stateful applications, consider StatefulSets with manual scaling.

8.3 Monitor Autoscaling Behavior

Monitor HPA decisions and adjust thresholds based on actual application behavior. Use metrics to understand scaling patterns.

8.4 Use Readiness Probes

Ensure Pods have readiness probes so Services only route traffic to healthy Pods during scaling events.

8.5 Configure PodDisruptionBudgets

Use PodDisruptionBudgets to ensure minimum availability during rolling updates and Node maintenance.

8.6 Use Ingress for HTTP/HTTPS

Use Ingress instead of LoadBalancer Services for HTTP/HTTPS traffic to save costs and simplify routing.

8.7 Enable Metrics Server

Ensure Metrics Server is installed and running for HPA to function. It's required for CPU and memory-based autoscaling.

Summary: Kubernetes provides multiple scaling mechanisms: manual scaling, HPA for Pod scaling, VPA for resource adjustment, and Cluster Autoscaler for Node scaling. Services provide load balancing, while Ingress Controllers handle HTTP/HTTPS routing. Always set resource requests and monitor autoscaling behavior.