Kubernetes Architecture

Deep dive into Kubernetes architecture: understand the control plane, master/worker nodes, API Server, Controller Manager, Scheduler, and how components work together to orchestrate containers.

1. Architecture Overview
2. Control Plane (Master Nodes)
3. API Server
4. etcd
5. Scheduler
6. Controller Manager
7. Worker Nodes
8. Kubelet
9. Kube-proxy
10. Container Runtime
11. Communication Flow

1. Architecture Overview

Kubernetes follows a client-server architecture with two main components:

Control Plane (Master): Manages the cluster and makes global decisions
Worker Nodes: Run containerized applications

The control plane maintains the desired state of the cluster, while worker nodes execute the workloads.

2. Control Plane (Master Nodes)

The Control Plane is the brain of Kubernetes. It makes decisions about the cluster (scheduling, detecting and responding to cluster events) and maintains the cluster's desired state.

2.1 Control Plane Components

The control plane consists of four main components:

API Server: Front-end for the Kubernetes control plane
etcd: Consistent and highly-available key-value store
Scheduler: Assigns Pods to Nodes
Controller Manager: Runs controller processes

2.2 High Availability

For production clusters, the control plane should be highly available with multiple master nodes. This ensures:

Fault tolerance if a master node fails
Load distribution across multiple API servers
etcd cluster for data redundancy

3. API Server

The API Server is the central management entity and the only component that directly communicates with etcd. All other components interact with the API Server.

3.1 Responsibilities

Exposes the Kubernetes API
Validates and processes API requests
Authenticates and authorizes requests
Reads from and writes to etcd
Manages API versioning

3.2 API Server Features

RESTful API: All operations are REST API calls
Authentication: Supports multiple authentication methods (certificates, tokens, etc.)
Authorization: RBAC (Role-Based Access Control) for fine-grained permissions
Admission Control: Validates and mutates requests before persistence

3.3 API Server Endpoint

The API Server typically runs on port 6443 (HTTPS). You can access it via:

kubectl cluster-info
# Output shows API server endpoint

4. etcd

etcd is a consistent and highly-available key-value store used as Kubernetes' backing store for all cluster data.

4.1 What etcd Stores

Cluster configuration
State of all objects (Pods, Services, Deployments, etc.)
Secrets and ConfigMaps
Network policies
RBAC rules

4.2 etcd Characteristics

Consistency: All nodes see the same data
Availability: Tolerates node failures
Partition tolerance: Continues operating despite network partitions
Watch API: Components can watch for changes

4.3 etcd Cluster

In production, etcd runs as a cluster (typically 3 or 5 nodes) for high availability. Only the API Server directly communicates with etcd.

5. Scheduler

The Scheduler watches for newly created Pods with no assigned Node and selects a Node for them to run on.

5.1 Scheduling Process

The scheduler considers:

Resource requirements: CPU and memory requests
Resource limits: Maximum CPU and memory
Affinity/anti-affinity rules: Pod placement preferences
Taints and tolerations: Node restrictions
Node selectors: Label-based selection

5.2 Scheduling Algorithm

The scheduler uses a two-step process:

Filtering: Find Nodes that can run the Pod
Scoring: Rank Nodes and select the best one

5.3 Custom Schedulers

You can run multiple schedulers and specify which scheduler to use for each Pod, allowing for custom scheduling policies.

6. Controller Manager

The Controller Manager runs controller processes that regulate the state of the cluster. Each controller watches the shared state of the cluster through the API Server and makes changes to move the current state toward the desired state.

6.1 Built-in Controllers

Key controllers include:

Deployment Controller: Manages ReplicaSets and Deployments
ReplicaSet Controller: Maintains desired number of Pod replicas
Node Controller: Monitors Node health
Service Controller: Manages cloud load balancers
Namespace Controller: Manages Namespace lifecycle
Job Controller: Manages Job objects

6.2 Controller Pattern

Controllers follow a reconciliation loop:

Observe the current state
Compare with desired state
Take action to reconcile differences
Repeat

6.3 Example: ReplicaSet Controller

If a ReplicaSet specifies 3 replicas but only 2 Pods exist, the ReplicaSet Controller creates a new Pod to match the desired state.

7. Worker Nodes

Worker Nodes are machines that run your containerized applications. Each Node must run three components:

kubelet: Node agent
kube-proxy: Network proxy
Container runtime: Runs containers

Worker Nodes communicate with the control plane through the API Server.

8. Kubelet

The kubelet is an agent that runs on each Node. It ensures containers are running in a Pod.

8.1 Kubelet Responsibilities

Receives Pod specifications from the API Server
Ensures containers described in Pod specs are running
Reports Pod and Node status to the API Server
Monitors container health
Mounts volumes and secrets

8.2 Kubelet Communication

The kubelet communicates with:

API Server: Receives Pod specs and reports status
Container runtime: Starts/stops containers
cAdvisor: Collects resource usage metrics

9. Kube-proxy

kube-proxy is a network proxy that runs on each Node. It maintains network rules that allow communication to Pods from inside and outside the cluster.

9.1 Kube-proxy Responsibilities

Implements Service abstraction
Load balances traffic to Pods
Maintains iptables/IPVS rules
Handles Service discovery

9.2 Proxy Modes

kube-proxy can run in different modes:

iptables: Uses iptables rules (default, most efficient)
IPVS: Uses IP Virtual Server (better performance for large clusters)
userspace: Userspace proxy (legacy, not recommended)

10. Container Runtime

The Container Runtime is the software responsible for running containers. Kubernetes supports several container runtimes through the Container Runtime Interface (CRI).

10.1 Supported Runtimes

containerd: Industry-standard runtime (recommended)
CRI-O: Lightweight runtime designed for Kubernetes
Docker: Via containerd (Docker Engine is deprecated)
Mirantis Container Runtime: Docker alternative

10.2 Container Runtime Interface (CRI)

CRI is a plugin interface that enables kubelet to use different container runtimes without recompiling. It standardizes how Kubernetes interacts with container runtimes.

11. Communication Flow

Understanding how components communicate helps troubleshoot issues:

11.1 Creating a Pod

User submits Pod spec via kubectl
kubectl sends request to API Server
API Server validates and stores in etcd
Scheduler watches API Server for unscheduled Pods
Scheduler selects Node and updates Pod spec
API Server updates Pod in etcd
kubelet on selected Node watches API Server
kubelet creates Pod via container runtime
kubelet reports Pod status to API Server

11.2 Component Communication

All components → API Server: Single point of communication
API Server ↔ etcd: Only API Server talks to etcd
kubelet → API Server: Reports status, receives Pod specs
Controllers → API Server: Watch and reconcile state

Summary: Kubernetes architecture consists of a control plane (API Server, etcd, Scheduler, Controller Manager) and worker nodes (kubelet, kube-proxy, container runtime). All components communicate through the API Server, which is the central management entity.