Kubernetes Architecture

Deep dive into Kubernetes architecture: understand the control plane, master/worker nodes, API Server, Controller Manager, Scheduler, and how components work together to orchestrate containers.

Table of Contents

1. Architecture Overview

Kubernetes follows a client-server architecture with two main components:

  • Control Plane (Master): Manages the cluster and makes global decisions
  • Worker Nodes: Run containerized applications

The control plane maintains the desired state of the cluster, while worker nodes execute the workloads.

2. Control Plane (Master Nodes)

The Control Plane is the brain of Kubernetes. It makes decisions about the cluster (scheduling, detecting and responding to cluster events) and maintains the cluster's desired state.

2.1 Control Plane Components

The control plane consists of four main components:

  1. API Server: Front-end for the Kubernetes control plane
  2. etcd: Consistent and highly-available key-value store
  3. Scheduler: Assigns Pods to Nodes
  4. Controller Manager: Runs controller processes

2.2 High Availability

For production clusters, the control plane should be highly available with multiple master nodes. This ensures:

  • Fault tolerance if a master node fails
  • Load distribution across multiple API servers
  • etcd cluster for data redundancy

3. API Server

The API Server is the central management entity and the only component that directly communicates with etcd. All other components interact with the API Server.

3.1 Responsibilities

  • Exposes the Kubernetes API
  • Validates and processes API requests
  • Authenticates and authorizes requests
  • Reads from and writes to etcd
  • Manages API versioning

3.2 API Server Features

  • RESTful API: All operations are REST API calls
  • Authentication: Supports multiple authentication methods (certificates, tokens, etc.)
  • Authorization: RBAC (Role-Based Access Control) for fine-grained permissions
  • Admission Control: Validates and mutates requests before persistence

3.3 API Server Endpoint

The API Server typically runs on port 6443 (HTTPS). You can access it via:

kubectl cluster-info
# Output shows API server endpoint

4. etcd

etcd is a consistent and highly-available key-value store used as Kubernetes' backing store for all cluster data.

4.1 What etcd Stores

  • Cluster configuration
  • State of all objects (Pods, Services, Deployments, etc.)
  • Secrets and ConfigMaps
  • Network policies
  • RBAC rules

4.2 etcd Characteristics

  • Consistency: All nodes see the same data
  • Availability: Tolerates node failures
  • Partition tolerance: Continues operating despite network partitions
  • Watch API: Components can watch for changes

4.3 etcd Cluster

In production, etcd runs as a cluster (typically 3 or 5 nodes) for high availability. Only the API Server directly communicates with etcd.

5. Scheduler

The Scheduler watches for newly created Pods with no assigned Node and selects a Node for them to run on.

5.1 Scheduling Process

The scheduler considers:

  • Resource requirements: CPU and memory requests
  • Resource limits: Maximum CPU and memory
  • Affinity/anti-affinity rules: Pod placement preferences
  • Taints and tolerations: Node restrictions
  • Node selectors: Label-based selection

5.2 Scheduling Algorithm

The scheduler uses a two-step process:

  1. Filtering: Find Nodes that can run the Pod
  2. Scoring: Rank Nodes and select the best one

5.3 Custom Schedulers

You can run multiple schedulers and specify which scheduler to use for each Pod, allowing for custom scheduling policies.

6. Controller Manager

The Controller Manager runs controller processes that regulate the state of the cluster. Each controller watches the shared state of the cluster through the API Server and makes changes to move the current state toward the desired state.

6.1 Built-in Controllers

Key controllers include:

  • Deployment Controller: Manages ReplicaSets and Deployments
  • ReplicaSet Controller: Maintains desired number of Pod replicas
  • Node Controller: Monitors Node health
  • Service Controller: Manages cloud load balancers
  • Namespace Controller: Manages Namespace lifecycle
  • Job Controller: Manages Job objects

6.2 Controller Pattern

Controllers follow a reconciliation loop:

  1. Observe the current state
  2. Compare with desired state
  3. Take action to reconcile differences
  4. Repeat

6.3 Example: ReplicaSet Controller

If a ReplicaSet specifies 3 replicas but only 2 Pods exist, the ReplicaSet Controller creates a new Pod to match the desired state.

7. Worker Nodes

Worker Nodes are machines that run your containerized applications. Each Node must run three components:

  1. kubelet: Node agent
  2. kube-proxy: Network proxy
  3. Container runtime: Runs containers

Worker Nodes communicate with the control plane through the API Server.

8. Kubelet

The kubelet is an agent that runs on each Node. It ensures containers are running in a Pod.

8.1 Kubelet Responsibilities

  • Receives Pod specifications from the API Server
  • Ensures containers described in Pod specs are running
  • Reports Pod and Node status to the API Server
  • Monitors container health
  • Mounts volumes and secrets

8.2 Kubelet Communication

The kubelet communicates with:

  • API Server: Receives Pod specs and reports status
  • Container runtime: Starts/stops containers
  • cAdvisor: Collects resource usage metrics

9. Kube-proxy

kube-proxy is a network proxy that runs on each Node. It maintains network rules that allow communication to Pods from inside and outside the cluster.

9.1 Kube-proxy Responsibilities

  • Implements Service abstraction
  • Load balances traffic to Pods
  • Maintains iptables/IPVS rules
  • Handles Service discovery

9.2 Proxy Modes

kube-proxy can run in different modes:

  • iptables: Uses iptables rules (default, most efficient)
  • IPVS: Uses IP Virtual Server (better performance for large clusters)
  • userspace: Userspace proxy (legacy, not recommended)

10. Container Runtime

The Container Runtime is the software responsible for running containers. Kubernetes supports several container runtimes through the Container Runtime Interface (CRI).

10.1 Supported Runtimes

  • containerd: Industry-standard runtime (recommended)
  • CRI-O: Lightweight runtime designed for Kubernetes
  • Docker: Via containerd (Docker Engine is deprecated)
  • Mirantis Container Runtime: Docker alternative

10.2 Container Runtime Interface (CRI)

CRI is a plugin interface that enables kubelet to use different container runtimes without recompiling. It standardizes how Kubernetes interacts with container runtimes.

11. Communication Flow

Understanding how components communicate helps troubleshoot issues:

11.1 Creating a Pod

  1. User submits Pod spec via kubectl
  2. kubectl sends request to API Server
  3. API Server validates and stores in etcd
  4. Scheduler watches API Server for unscheduled Pods
  5. Scheduler selects Node and updates Pod spec
  6. API Server updates Pod in etcd
  7. kubelet on selected Node watches API Server
  8. kubelet creates Pod via container runtime
  9. kubelet reports Pod status to API Server

11.2 Component Communication

  • All components → API Server: Single point of communication
  • API Server ↔ etcd: Only API Server talks to etcd
  • kubelet → API Server: Reports status, receives Pod specs
  • Controllers → API Server: Watch and reconcile state

Summary: Kubernetes architecture consists of a control plane (API Server, etcd, Scheduler, Controller Manager) and worker nodes (kubelet, kube-proxy, container runtime). All components communicate through the API Server, which is the central management entity.

Post a Comment

0 Comments