Kubernetes Operators

Kubernetes Operators represent a powerful pattern for managing complex applications on Kubernetes. They extend the Kubernetes API to create, configure, and manage instances of stateful applications on behalf of Kubernetes users.

What are Operators?

Operators are software extensions to Kubernetes that use custom resources to manage applications and their components. They follow Kubernetes principles, notably the control loop concept, to automate operational tasks that would typically require human intervention.

The Operator Pattern

The Operator pattern captures how you can write code to automate a task beyond what Kubernetes itself provides. It combines:

  • Custom Resource Definitions (CRDs): Extend the Kubernetes API with application-specific resources
  • Custom Controllers: Implement the control loop that watches and reconciles the desired state
  • Operational Knowledge: Encode human operational expertise into software

Why Use Operators?

Operators are particularly useful for:

  • Stateful Applications: Databases, message queues, and other stateful systems
  • Complex Deployment Procedures: Applications requiring multi-step installation/configuration
  • Day-2 Operations: Backup, restore, scaling, upgrades, and failure recovery
  • Domain-Specific Knowledge: Encoding operational expertise into automation

How Operators Work

Operators follow this basic workflow:

  1. Watch for changes to custom resources
  2. Compare the current state with the desired state
  3. Take action to reconcile any differences
  4. Update the status of the custom resource
  5. Repeat the process continuously

Popular Operators

Many popular applications have operators available:

  • Prometheus Operator: Manages Prometheus monitoring instances
  • Elasticsearch Operator: Manages Elasticsearch clusters
  • PostgreSQL Operator: Manages PostgreSQL databases
  • Redis Operator: Manages Redis clusters
  • Cert-Manager: Manages TLS certificates

Building a Custom Operator

You can build operators using various frameworks and tools. The most popular approaches are:

  • Kubebuilder: SDK for building Kubernetes APIs using CRDs
  • Operator SDK: Framework for building operators (now part of Kubebuilder)
  • KUDO: Kubernetes Universal Declarative Operator
  • Metacontroller: Lightweight way to write controllers
  • Native Go client: Using client-go and other Kubernetes client libraries

Step 1: Define a Custom Resource

First, define a Custom Resource Definition (CRD) that extends the Kubernetes API:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: webapps.example.com
spec:
  group: example.com
  names:
    kind: WebApp
    listKind: WebAppList
    plural: webapps
    singular: webapp
  scope: Namespaced
  versions:
  - name: v1alpha1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              replicas:
                type: integer
                minimum: 1
                maximum: 5
              image:
                type: string
              port:
                type: integer
          status:
            type: object
            properties:
              availableReplicas:
                type: integer
              conditions:
                type: array
                items:
                  type: object
                  properties:
                    type:
                      type: string
                    status:
                      type: string
                    message:
                      type: string
    

Step 2: Create API Types

Define Go types that represent your custom resource:

package v1alpha1

import (
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// WebAppSpec defines the desired state of WebApp
type WebAppSpec struct {
	Replicas int32  `json:"replicas"`
	Image    string `json:"image"`
	Port     int32  `json:"port"`
}

// WebAppStatus defines the observed state of WebApp
type WebAppStatus struct {
	AvailableReplicas int32              `json:"availableReplicas"`
	Conditions        []metav1.Condition `json:"conditions,omitempty"`
}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status

// WebApp is the Schema for the webapps API
type WebApp struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   WebAppSpec   `json:"spec,omitempty"`
	Status WebAppStatus `json:"status,omitempty"`
}

//+kubebuilder:object:root=true

// WebAppList contains a list of WebApp
type WebAppList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []WebApp `json:"items"`
}

func init() {
	SchemeBuilder.Register(&WebApp{}, &WebAppList{})
}
    

Step 3: Implement the Controller

Create a controller that reconciles the desired state:

package controllers

import (
	"context"
	"fmt"
	appsv1 "k8s.io/api/apps/v1"
	corev1 "k8s.io/api/core/v1"
	"k8s.io/apimachinery/pkg/api/errors"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/apimachinery/pkg/types"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
	"sigs.k8s.io/controller-runtime/pkg/log"

	webappv1alpha1 "github.com/example/webapp-operator/api/v1alpha1"
)

// WebAppReconciler reconciles a WebApp object
type WebAppReconciler struct {
	client.Client
	Scheme *runtime.Scheme
}

//+kubebuilder:rbac:groups=webapp.example.com,resources=webapps,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=webapp.example.com,resources=webapps/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=webapp.example.com,resources=webapps/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete

// Reconcile is the main control loop function
func (r *WebAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	log := log.FromContext(ctx)
	
	// Fetch the WebApp instance
	webapp := &webappv1alpha1.WebApp{}
	err := r.Get(ctx, req.NamespacedName, webapp)
	if err != nil {
		if errors.IsNotFound(err) {
			// Request object not found, could have been deleted after reconcile request
			return ctrl.Result{}, nil
		}
		// Error reading the object
		return ctrl.Result{}, err
	}
	
	// Check if the Deployment already exists, if not create a new one
	deployment := &appsv1.Deployment{}
	err = r.Get(ctx, types.NamespacedName{Name: webapp.Name, Namespace: webapp.Namespace}, deployment)
	if err != nil && errors.IsNotFound(err) {
		// Define a new Deployment
		dep := r.deploymentForWebApp(webapp)
		log.Info("Creating a new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
		err = r.Create(ctx, dep)
		if err != nil {
			log.Error(err, "Failed to create new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
			return ctrl.Result{}, err
		}
		// Deployment created successfully
		return ctrl.Result{Requeue: true}, nil
	} else if err != nil {
		log.Error(err, "Failed to get Deployment")
		return ctrl.Result{}, err
	}
	
	// Ensure the deployment size is the same as the spec
	size := webapp.Spec.Replicas
	if *deployment.Spec.Replicas != size {
		deployment.Spec.Replicas = &size
		err = r.Update(ctx, deployment)
		if err != nil {
			log.Error(err, "Failed to update Deployment", "Deployment.Namespace", deployment.Namespace, "Deployment.Name", deployment.Name)
			return ctrl.Result{}, err
		}
	}
	
	// Check if the Service already exists, if not create a new one
	service := &corev1.Service{}
	err = r.Get(ctx, types.NamespacedName{Name: webapp.Name, Namespace: webapp.Namespace}, service)
	if err != nil && errors.IsNotFound(err) {
		// Define a new Service
		svc := r.serviceForWebApp(webapp)
		log.Info("Creating a new Service", "Service.Namespace", svc.Namespace, "Service.Name", svc.Name)
		err = r.Create(ctx, svc)
		if err != nil {
			log.Error(err, "Failed to create new Service", "Service.Namespace", svc.Namespace, "Service.Name", svc.Name)
			return ctrl.Result{}, err
		}
		// Service created successfully
		return ctrl.Result{Requeue: true}, nil
	} else if err != nil {
		log.Error(err, "Failed to get Service")
		return ctrl.Result{}, err
	}
	
	// Update the WebApp status with the available replicas
	webapp.Status.AvailableReplicas = deployment.Status.AvailableReplicas
	err = r.Status().Update(ctx, webapp)
	if err != nil {
		log.Error(err, "Failed to update WebApp status")
		return ctrl.Result{}, err
	}
	
	return ctrl.Result{}, nil
}

// deploymentForWebApp returns a WebApp Deployment object
func (r *WebAppReconciler) deploymentForWebApp(w *webappv1alpha1.WebApp) *appsv1.Deployment {
	labels := labelsForWebApp(w.Name)
	replicas := w.Spec.Replicas
	
	dep := &appsv1.Deployment{
		ObjectMeta: metav1.ObjectMeta{
			Name:      w.Name,
			Namespace: w.Namespace,
		},
		Spec: appsv1.DeploymentSpec{
			Replicas: &replicas,
			Selector: &metav1.LabelSelector{
				MatchLabels: labels,
			},
			Template: corev1.PodTemplateSpec{
				ObjectMeta: metav1.ObjectMeta{
					Labels: labels,
				},
				Spec: corev1.PodSpec{
					Containers: []corev1.Container{{
						Image: w.Spec.Image,
						Name:  "webapp",
						Ports: []corev1.ContainerPort{{
							ContainerPort: w.Spec.Port,
							Name:          "http",
						}},
					}},
				},
			},
		},
	}
	
	// Set WebApp instance as the owner and controller
	controllerutil.SetControllerReference(w, dep, r.Scheme)
	return dep
}

// serviceForWebApp returns a WebApp Service object
func (r *WebAppReconciler) serviceForWebApp(w *webappv1alpha1.WebApp) *corev1.Service {
	labels := labelsForWebApp(w.Name)
	
	svc := &corev1.Service{
		ObjectMeta: metav1.ObjectMeta{
			Name:      w.Name,
			Namespace: w.Namespace,
		},
		Spec: corev1.ServiceSpec{
			Selector: labels,
			Ports: []corev1.ServicePort{
				{
					Port:     w.Spec.Port,
					NodePort: 30000, // Optional: use specific node port or let Kubernetes assign
				},
			},
			Type: corev1.ServiceTypeNodePort,
		},
	}
	
	// Set WebApp instance as the owner and controller
	controllerutil.SetControllerReference(w, svc, r.Scheme)
	return svc
}

// labelsForWebApp returns the labels for selecting the resources
func labelsForWebApp(name string) map[string]string {
	return map[string]string{"app": "webapp", "webapp_cr": name}
}

// SetupWithManager sets up the controller with the Manager
func (r *WebAppReconciler) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&webappv1alpha1.WebApp{}).
		Owns(&appsv1.Deployment{}).
		Owns(&corev1.Service{}).
		Complete(r)
}
    

Step 4: Build and Deploy the Operator

Create Dockerfile and build the operator image:

# Dockerfile
FROM golang:1.19 as builder
WORKDIR /workspace
COPY go.mod go.mod
COPY go.sum go.sum
RUN go mod download
COPY . .
RUN make manager

FROM gcr.io/distroless/static:nonroot
WORKDIR /
COPY --from=builder /workspace/bin/manager .
USER 65532:65532
ENTRYPOINT ["/manager"]
    

Create deployment manifests:

# config/manager/manager.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: webapp-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: controller-manager
  namespace: webapp-system
spec:
  replicas: 1
  selector:
    matchLabels:
      control-plane: controller-manager
  template:
    metadata:
      labels:
        control-plane: controller-manager
    spec:
      containers:
      - command:
        - /manager
        args:
        - --leader-elect
        image: controller:latest
        name: manager
        resources:
          limits:
            cpu: 100m
            memory: 30Mi
          requests:
            cpu: 100m
            memory: 20Mi
      terminationGracePeriodSeconds: 10
    

Step 5: Deploy and Use the Operator

Deploy the operator and create a WebApp custom resource:

# Deploy the CRD
kubectl apply -f config/crd/bases/webapp.example.com_webapps.yaml

# Deploy the operator
kubectl apply -f config/manager/manager.yaml

# Create a WebApp instance
apiVersion: webapp.example.com/v1alpha1
kind: WebApp
metadata:
  name: example-webapp
spec:
  replicas: 3
  image: nginx:latest
  port: 80
    

Operator Best Practices

Design Considerations

  • Make your operator idempotent - it should handle multiple reconciliations safely
  • Implement proper error handling and backoff strategies
  • Use finalizers for proper resource cleanup
  • Provide comprehensive status information
  • Support multiple versions of your custom resource with conversion webhooks

Testing

  • Write unit tests for your reconciliation logic
  • Use envtest for integration testing with a real API server
  • Implement end-to-end tests with kind or minikube
  • Test upgrade paths and backward compatibility

Security

  • Follow the principle of least privilege for RBAC permissions
  • Run your operator with a non-root user
  • Secure your operator's communication with TLS
  • Regularly update dependencies for security patches

Operator Frameworks Comparison

Framework Language Learning Curve Best For
Kubebuilder Go Moderate Complex operators, full control
Operator SDK Go, Ansible, Helm Low to Moderate Various skill levels, multiple approaches
KUDO YAML/Declarative Low Simple operators, declarative approach
Metacontroller Any language Low Simple webhook-based operators

Kubernetes Operators represent a powerful pattern for managing complex applications on Kubernetes. By encoding operational knowledge into software, they can dramatically reduce the operational burden of running stateful applications while making them more reliable and easier to manage.

Post a Comment

0 Comments