Monitoring and Logging

Effective monitoring and logging are essential for maintaining the health, performance, and security of Kubernetes clusters and applications. Kubernetes provides various tools and patterns to collect metrics, monitor resources, and aggregate logs from containers, pods, and cluster components.

Monitoring with Prometheus and Grafana

Prometheus has become the de facto standard for monitoring Kubernetes clusters, while Grafana provides powerful visualization capabilities for the collected metrics.

Prometheus Architecture

Prometheus is a pull-based monitoring system that collects metrics from configured targets:

Prometheus Server: Scrapes and stores time series data
Exporters: Expose metrics in Prometheus format (Node Exporter, cAdvisor, etc.)
Pushgateway: Handles metrics from short-lived jobs
Alertmanager: Handles alerts sent by Prometheus Server
Service Discovery: Automatically discovers monitoring targets in Kubernetes

Key Kubernetes Metrics to Monitor

Cluster-level Metrics

Node CPU and memory utilization
Disk space and I/O
Network bandwidth
API server latency and error rates

Workload-level Metrics

Pod CPU and memory usage
Container restarts
Application-specific metrics
Request latency and error rates

Setting Up Prometheus in Kubernetes

Using the Prometheus Operator

The Prometheus Operator simplifies Prometheus setup and management in Kubernetes:

# Install Prometheus Operator using Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack

ServiceMonitor Resource

ServiceMonitor defines how Prometheus should monitor services:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
  - port: web
    interval: 30s
    path: /metrics
  namespaceSelector:
    matchNames:
    - my-namespace

PodMonitor Resource

PodMonitor defines how Prometheus should monitor pods directly:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: my-pod-monitor
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: my-app
  podMetricsEndpoints:
  - port: metrics
    interval: 30s
    path: /metrics

Grafana Dashboards

Grafana connects to Prometheus to visualize metrics through customizable dashboards:

Example Dashboard Configuration

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
  name: kubernetes-cluster-monitoring
  labels:
    app: grafana
spec:
  json: |
    {
      "title": "Kubernetes Cluster Monitoring",
      "tags": ["kubernetes", "prometheus"],
      "timezone": "browser",
      "panels": [
        {
          "title": "CPU Usage",
          "type": "graph",
          "targets": [
            {
              "expr": "sum(rate(container_cpu_usage_seconds_total{container!=\"POD\",container!=\"\"}[5m])) by (pod)",
              "legendFormat": "{{pod}}"
            }
          ]
        }
      ]
    }

Alerting with Prometheus

PrometheusRule resources define alerts based on metric conditions:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-alerts
  labels:
    release: prometheus
spec:
  groups:
  - name: my-app.rules
    rules:
    - alert: HighMemoryUsage
      expr: (container_memory_working_set_bytes{container!=\"\",container!=\"POD\"} / container_spec_memory_limit_bytes) > 0.8
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage in pod {{ $labels.pod }}"
        description: "Pod {{ $labels.pod }} is using {{ $value }}% of its memory limit"

Log Management with Fluentd, Elasticsearch, and Kibana (EFK Stack)

The EFK stack is a popular solution for collecting, storing, and analyzing logs in Kubernetes:

Fluentd Architecture

Fluentd collects, processes, and forwards logs from various sources:

Input Plugins: Collect logs from sources (files, systemd, etc.)
Parser Plugins: Parse logs into structured data
Filter Plugins: Process and modify log records
Output Plugins: Send logs to destinations (Elasticsearch, S3, etc.)
Buffer: Temporarily stores logs during processing

Setting Up Fluentd in Kubernetes

Fluentd DaemonSet Configuration

Fluentd typically runs as a DaemonSet to collect logs from each node:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: logging
  labels:
    app: fluentd
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      serviceAccountName: fluentd
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch.logging.svc.cluster.local"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        - name: FLUENT_ELASTICSEARCH_SCHEME
          value: "http"
        resources:
          limits:
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 256Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluentd-config
          mountPath: /fluentd/etc
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluentd-config
        configMap:
          name: fluentd-config

Fluentd Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: logging
data:
  fluent.conf: |
    
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      
    
    
    
      @type kubernetes_metadata
    
    
    
      @type elasticsearch
      host elasticsearch.logging.svc.cluster.local
      port 9200
      logstash_format true
      logstash_prefix fluentd
      include_tag_key true
      type_name fluentd
      
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever true
        retry_max_interval 30
        chunk_limit_size 2M
        queue_limit_length 8
        overflow_action block

Elasticsearch Configuration

Elasticsearch stores and indexes the log data collected by Fluentd:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
  namespace: logging
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.10.2
        env:
        - name: discovery.type
          value: single-node
        - name: ES_JAVA_OPTS
          value: "-Xms512m -Xmx512m"
        - name: xpack.security.enabled
          value: "false"
        ports:
        - containerPort: 9200
          name: http
        - containerPort: 9300
          name: transport
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
        resources:
          limits:
            memory: 1Gi
          requests:
            cpu: 500m
            memory: 1Gi
      volumes:
      - name: data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
  namespace: logging
spec:
  selector:
    app: elasticsearch
  ports:
  - port: 9200
    name: http
  - port: 9300
    name: transport
  clusterIP: None

Kibana Configuration

Kibana provides a web interface for searching, analyzing, and visualizing log data:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
  namespace: logging
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - name: kibana
        image: docker.elastic.co/kibana/kibana:7.10.2
        env:
        - name: ELASTICSEARCH_HOSTS
          value: "http://elasticsearch.logging.svc.cluster.local:9200"
        ports:
        - containerPort: 5601
        resources:
          requests:
            cpu: 100m
            memory: 500Mi
          limits:
            memory: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: kibana
  namespace: logging
spec:
  selector:
    app: kibana
  ports:
  - port: 5601
    targetPort: 5601
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kibana
  namespace: logging
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: kibana.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kibana
            port:
              number: 5601

Best Practices for Monitoring and Logging

Monitoring Best Practices

Monitor at multiple levels: cluster, node, pod, and container
Set up meaningful alerts with appropriate thresholds
Use histograms for latency measurements instead of averages
Regularly review and update your monitoring dashboards
Monitor resource utilization and plan for capacity

Logging Best Practices

Implement structured logging in your applications
Include correlation IDs for tracing requests across services
Set appropriate log retention policies
Secure access to your logging infrastructure
Regularly archive old logs to cold storage

Performance Considerations

Limit the cardinality of your metrics to prevent Prometheus overload
Use sampling for high-volume logs
Configure appropriate buffer sizes for Fluentd
Monitor the monitoring system itself
Consider using Thanos or Cortex for long-term metric storage

Implementing a comprehensive monitoring and logging solution is crucial for maintaining the reliability and performance of your Kubernetes clusters and applications. The combination of Prometheus for metrics and the EFK stack for logs provides a powerful observability platform that can scale with your needs.

Report Abuse

Monitoring and Logging

Monitoring with Prometheus and Grafana

Prometheus Architecture

Key Kubernetes Metrics to Monitor

Cluster-level Metrics

Workload-level Metrics

Setting Up Prometheus in Kubernetes

Using the Prometheus Operator

ServiceMonitor Resource

PodMonitor Resource

Grafana Dashboards

Example Dashboard Configuration

Alerting with Prometheus

Log Management with Fluentd, Elasticsearch, and Kibana (EFK Stack)

Fluentd Architecture

Setting Up Fluentd in Kubernetes

Fluentd DaemonSet Configuration

Fluentd Configuration

Elasticsearch Configuration

Kibana Configuration

Best Practices for Monitoring and Logging

Monitoring Best Practices

Logging Best Practices

Performance Considerations

Post a Comment

0 Comments

🎧 LISTEN TO THIS BLOG — AUDIO READER!

🌐 Translate This Blog

PragmaCode IT Topics

DevOps Roadmap

Most Popular

Java 25: A Complete Overview of Revolutionary Changes

Branching Strategies in Git: A Complete Guide

How to Implement OpenAPI in a Java Spring Boot Project

Labels

Random Posts

Recent in Sports

Popular Posts

Java 25: A Complete Overview of Revolutionary Changes

Branching Strategies in Git: A Complete Guide

How to Implement OpenAPI in a Java Spring Boot Project

Menu Footer Widget

Contact form