Kubernetes Operators represent a powerful pattern for managing complex applications on Kubernetes. They extend the Kubernetes API to create, configure, and manage instances of stateful applications on behalf of Kubernetes users.
What are Operators?
Operators are software extensions to Kubernetes that use custom resources to manage applications and their components. They follow Kubernetes principles, notably the control loop concept, to automate operational tasks that would typically require human intervention.
The Operator Pattern
The Operator pattern captures how you can write code to automate a task beyond what Kubernetes itself provides. It combines:
- Custom Resource Definitions (CRDs): Extend the Kubernetes API with application-specific resources
- Custom Controllers: Implement the control loop that watches and reconciles the desired state
- Operational Knowledge: Encode human operational expertise into software
Why Use Operators?
Operators are particularly useful for:
- Stateful Applications: Databases, message queues, and other stateful systems
- Complex Deployment Procedures: Applications requiring multi-step installation/configuration
- Day-2 Operations: Backup, restore, scaling, upgrades, and failure recovery
- Domain-Specific Knowledge: Encoding operational expertise into automation
How Operators Work
Operators follow this basic workflow:
- Watch for changes to custom resources
- Compare the current state with the desired state
- Take action to reconcile any differences
- Update the status of the custom resource
- Repeat the process continuously
Popular Operators
Many popular applications have operators available:
- Prometheus Operator: Manages Prometheus monitoring instances
- Elasticsearch Operator: Manages Elasticsearch clusters
- PostgreSQL Operator: Manages PostgreSQL databases
- Redis Operator: Manages Redis clusters
- Cert-Manager: Manages TLS certificates
Building a Custom Operator
You can build operators using various frameworks and tools. The most popular approaches are:
- Kubebuilder: SDK for building Kubernetes APIs using CRDs
- Operator SDK: Framework for building operators (now part of Kubebuilder)
- KUDO: Kubernetes Universal Declarative Operator
- Metacontroller: Lightweight way to write controllers
- Native Go client: Using client-go and other Kubernetes client libraries
Step 1: Define a Custom Resource
First, define a Custom Resource Definition (CRD) that extends the Kubernetes API:
apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: name: webapps.example.com spec: group: example.com names: kind: WebApp listKind: WebAppList plural: webapps singular: webapp scope: Namespaced versions: - name: v1alpha1 served: true storage: true schema: openAPIV3Schema: type: object properties: spec: type: object properties: replicas: type: integer minimum: 1 maximum: 5 image: type: string port: type: integer status: type: object properties: availableReplicas: type: integer conditions: type: array items: type: object properties: type: type: string status: type: string message: type: string
Step 2: Create API Types
Define Go types that represent your custom resource:
package v1alpha1 import ( metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" ) // WebAppSpec defines the desired state of WebApp type WebAppSpec struct { Replicas int32 `json:"replicas"` Image string `json:"image"` Port int32 `json:"port"` } // WebAppStatus defines the observed state of WebApp type WebAppStatus struct { AvailableReplicas int32 `json:"availableReplicas"` Conditions []metav1.Condition `json:"conditions,omitempty"` } //+kubebuilder:object:root=true //+kubebuilder:subresource:status // WebApp is the Schema for the webapps API type WebApp struct { metav1.TypeMeta `json:",inline"` metav1.ObjectMeta `json:"metadata,omitempty"` Spec WebAppSpec `json:"spec,omitempty"` Status WebAppStatus `json:"status,omitempty"` } //+kubebuilder:object:root=true // WebAppList contains a list of WebApp type WebAppList struct { metav1.TypeMeta `json:",inline"` metav1.ListMeta `json:"metadata,omitempty"` Items []WebApp `json:"items"` } func init() { SchemeBuilder.Register(&WebApp{}, &WebAppList{}) }
Step 3: Implement the Controller
Create a controller that reconciles the desired state:
package controllers import ( "context" "fmt" appsv1 "k8s.io/api/apps/v1" corev1 "k8s.io/api/core/v1" "k8s.io/apimachinery/pkg/api/errors" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/runtime" "k8s.io/apimachinery/pkg/types" ctrl "sigs.k8s.io/controller-runtime" "sigs.k8s.io/controller-runtime/pkg/client" "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil" "sigs.k8s.io/controller-runtime/pkg/log" webappv1alpha1 "github.com/example/webapp-operator/api/v1alpha1" ) // WebAppReconciler reconciles a WebApp object type WebAppReconciler struct { client.Client Scheme *runtime.Scheme } //+kubebuilder:rbac:groups=webapp.example.com,resources=webapps,verbs=get;list;watch;create;update;patch;delete //+kubebuilder:rbac:groups=webapp.example.com,resources=webapps/status,verbs=get;update;patch //+kubebuilder:rbac:groups=webapp.example.com,resources=webapps/finalizers,verbs=update //+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete //+kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete // Reconcile is the main control loop function func (r *WebAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { log := log.FromContext(ctx) // Fetch the WebApp instance webapp := &webappv1alpha1.WebApp{} err := r.Get(ctx, req.NamespacedName, webapp) if err != nil { if errors.IsNotFound(err) { // Request object not found, could have been deleted after reconcile request return ctrl.Result{}, nil } // Error reading the object return ctrl.Result{}, err } // Check if the Deployment already exists, if not create a new one deployment := &appsv1.Deployment{} err = r.Get(ctx, types.NamespacedName{Name: webapp.Name, Namespace: webapp.Namespace}, deployment) if err != nil && errors.IsNotFound(err) { // Define a new Deployment dep := r.deploymentForWebApp(webapp) log.Info("Creating a new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name) err = r.Create(ctx, dep) if err != nil { log.Error(err, "Failed to create new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name) return ctrl.Result{}, err } // Deployment created successfully return ctrl.Result{Requeue: true}, nil } else if err != nil { log.Error(err, "Failed to get Deployment") return ctrl.Result{}, err } // Ensure the deployment size is the same as the spec size := webapp.Spec.Replicas if *deployment.Spec.Replicas != size { deployment.Spec.Replicas = &size err = r.Update(ctx, deployment) if err != nil { log.Error(err, "Failed to update Deployment", "Deployment.Namespace", deployment.Namespace, "Deployment.Name", deployment.Name) return ctrl.Result{}, err } } // Check if the Service already exists, if not create a new one service := &corev1.Service{} err = r.Get(ctx, types.NamespacedName{Name: webapp.Name, Namespace: webapp.Namespace}, service) if err != nil && errors.IsNotFound(err) { // Define a new Service svc := r.serviceForWebApp(webapp) log.Info("Creating a new Service", "Service.Namespace", svc.Namespace, "Service.Name", svc.Name) err = r.Create(ctx, svc) if err != nil { log.Error(err, "Failed to create new Service", "Service.Namespace", svc.Namespace, "Service.Name", svc.Name) return ctrl.Result{}, err } // Service created successfully return ctrl.Result{Requeue: true}, nil } else if err != nil { log.Error(err, "Failed to get Service") return ctrl.Result{}, err } // Update the WebApp status with the available replicas webapp.Status.AvailableReplicas = deployment.Status.AvailableReplicas err = r.Status().Update(ctx, webapp) if err != nil { log.Error(err, "Failed to update WebApp status") return ctrl.Result{}, err } return ctrl.Result{}, nil } // deploymentForWebApp returns a WebApp Deployment object func (r *WebAppReconciler) deploymentForWebApp(w *webappv1alpha1.WebApp) *appsv1.Deployment { labels := labelsForWebApp(w.Name) replicas := w.Spec.Replicas dep := &appsv1.Deployment{ ObjectMeta: metav1.ObjectMeta{ Name: w.Name, Namespace: w.Namespace, }, Spec: appsv1.DeploymentSpec{ Replicas: &replicas, Selector: &metav1.LabelSelector{ MatchLabels: labels, }, Template: corev1.PodTemplateSpec{ ObjectMeta: metav1.ObjectMeta{ Labels: labels, }, Spec: corev1.PodSpec{ Containers: []corev1.Container{{ Image: w.Spec.Image, Name: "webapp", Ports: []corev1.ContainerPort{{ ContainerPort: w.Spec.Port, Name: "http", }}, }}, }, }, }, } // Set WebApp instance as the owner and controller controllerutil.SetControllerReference(w, dep, r.Scheme) return dep } // serviceForWebApp returns a WebApp Service object func (r *WebAppReconciler) serviceForWebApp(w *webappv1alpha1.WebApp) *corev1.Service { labels := labelsForWebApp(w.Name) svc := &corev1.Service{ ObjectMeta: metav1.ObjectMeta{ Name: w.Name, Namespace: w.Namespace, }, Spec: corev1.ServiceSpec{ Selector: labels, Ports: []corev1.ServicePort{ { Port: w.Spec.Port, NodePort: 30000, // Optional: use specific node port or let Kubernetes assign }, }, Type: corev1.ServiceTypeNodePort, }, } // Set WebApp instance as the owner and controller controllerutil.SetControllerReference(w, svc, r.Scheme) return svc } // labelsForWebApp returns the labels for selecting the resources func labelsForWebApp(name string) map[string]string { return map[string]string{"app": "webapp", "webapp_cr": name} } // SetupWithManager sets up the controller with the Manager func (r *WebAppReconciler) SetupWithManager(mgr ctrl.Manager) error { return ctrl.NewControllerManagedBy(mgr). For(&webappv1alpha1.WebApp{}). Owns(&appsv1.Deployment{}). Owns(&corev1.Service{}). Complete(r) }
Step 4: Build and Deploy the Operator
Create Dockerfile and build the operator image:
# Dockerfile FROM golang:1.19 as builder WORKDIR /workspace COPY go.mod go.mod COPY go.sum go.sum RUN go mod download COPY . . RUN make manager FROM gcr.io/distroless/static:nonroot WORKDIR / COPY --from=builder /workspace/bin/manager . USER 65532:65532 ENTRYPOINT ["/manager"]
Create deployment manifests:
# config/manager/manager.yaml apiVersion: v1 kind: Namespace metadata: name: webapp-system --- apiVersion: apps/v1 kind: Deployment metadata: name: controller-manager namespace: webapp-system spec: replicas: 1 selector: matchLabels: control-plane: controller-manager template: metadata: labels: control-plane: controller-manager spec: containers: - command: - /manager args: - --leader-elect image: controller:latest name: manager resources: limits: cpu: 100m memory: 30Mi requests: cpu: 100m memory: 20Mi terminationGracePeriodSeconds: 10
Step 5: Deploy and Use the Operator
Deploy the operator and create a WebApp custom resource:
# Deploy the CRD kubectl apply -f config/crd/bases/webapp.example.com_webapps.yaml # Deploy the operator kubectl apply -f config/manager/manager.yaml # Create a WebApp instance apiVersion: webapp.example.com/v1alpha1 kind: WebApp metadata: name: example-webapp spec: replicas: 3 image: nginx:latest port: 80
Operator Best Practices
Design Considerations
- Make your operator idempotent - it should handle multiple reconciliations safely
- Implement proper error handling and backoff strategies
- Use finalizers for proper resource cleanup
- Provide comprehensive status information
- Support multiple versions of your custom resource with conversion webhooks
Testing
- Write unit tests for your reconciliation logic
- Use envtest for integration testing with a real API server
- Implement end-to-end tests with kind or minikube
- Test upgrade paths and backward compatibility
Security
- Follow the principle of least privilege for RBAC permissions
- Run your operator with a non-root user
- Secure your operator's communication with TLS
- Regularly update dependencies for security patches
Operator Frameworks Comparison
Framework | Language | Learning Curve | Best For |
---|---|---|---|
Kubebuilder | Go | Moderate | Complex operators, full control |
Operator SDK | Go, Ansible, Helm | Low to Moderate | Various skill levels, multiple approaches |
KUDO | YAML/Declarative | Low | Simple operators, declarative approach |
Metacontroller | Any language | Low | Simple webhook-based operators |
Kubernetes Operators represent a powerful pattern for managing complex applications on Kubernetes. By encoding operational knowledge into software, they can dramatically reduce the operational burden of running stateful applications while making them more reliable and easier to manage.
0 Comments