Monitoring and Logging

Effective monitoring and logging are essential for maintaining the health, performance, and security of Kubernetes clusters and applications. Kubernetes provides various tools and patterns to collect metrics, monitor resources, and aggregate logs from containers, pods, and cluster components.

Monitoring with Prometheus and Grafana

Prometheus has become the de facto standard for monitoring Kubernetes clusters, while Grafana provides powerful visualization capabilities for the collected metrics.

Prometheus Architecture

Prometheus is a pull-based monitoring system that collects metrics from configured targets:

  • Prometheus Server: Scrapes and stores time series data
  • Exporters: Expose metrics in Prometheus format (Node Exporter, cAdvisor, etc.)
  • Pushgateway: Handles metrics from short-lived jobs
  • Alertmanager: Handles alerts sent by Prometheus Server
  • Service Discovery: Automatically discovers monitoring targets in Kubernetes

Key Kubernetes Metrics to Monitor

Cluster-level Metrics

  • Node CPU and memory utilization
  • Disk space and I/O
  • Network bandwidth
  • API server latency and error rates

Workload-level Metrics

  • Pod CPU and memory usage
  • Container restarts
  • Application-specific metrics
  • Request latency and error rates

Setting Up Prometheus in Kubernetes

Using the Prometheus Operator

The Prometheus Operator simplifies Prometheus setup and management in Kubernetes:

# Install Prometheus Operator using Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack
    

ServiceMonitor Resource

ServiceMonitor defines how Prometheus should monitor services:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
  - port: web
    interval: 30s
    path: /metrics
  namespaceSelector:
    matchNames:
    - my-namespace
    

PodMonitor Resource

PodMonitor defines how Prometheus should monitor pods directly:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: my-pod-monitor
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: my-app
  podMetricsEndpoints:
  - port: metrics
    interval: 30s
    path: /metrics
    

Grafana Dashboards

Grafana connects to Prometheus to visualize metrics through customizable dashboards:

Example Dashboard Configuration

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
  name: kubernetes-cluster-monitoring
  labels:
    app: grafana
spec:
  json: |
    {
      "title": "Kubernetes Cluster Monitoring",
      "tags": ["kubernetes", "prometheus"],
      "timezone": "browser",
      "panels": [
        {
          "title": "CPU Usage",
          "type": "graph",
          "targets": [
            {
              "expr": "sum(rate(container_cpu_usage_seconds_total{container!=\"POD\",container!=\"\"}[5m])) by (pod)",
              "legendFormat": "{{pod}}"
            }
          ]
        }
      ]
    }
    

Alerting with Prometheus

PrometheusRule resources define alerts based on metric conditions:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-alerts
  labels:
    release: prometheus
spec:
  groups:
  - name: my-app.rules
    rules:
    - alert: HighMemoryUsage
      expr: (container_memory_working_set_bytes{container!=\"\",container!=\"POD\"} / container_spec_memory_limit_bytes) > 0.8
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage in pod {{ $labels.pod }}"
        description: "Pod {{ $labels.pod }} is using {{ $value }}% of its memory limit"
    

Log Management with Fluentd, Elasticsearch, and Kibana (EFK Stack)

The EFK stack is a popular solution for collecting, storing, and analyzing logs in Kubernetes:

Fluentd Architecture

Fluentd collects, processes, and forwards logs from various sources:

  • Input Plugins: Collect logs from sources (files, systemd, etc.)
  • Parser Plugins: Parse logs into structured data
  • Filter Plugins: Process and modify log records
  • Output Plugins: Send logs to destinations (Elasticsearch, S3, etc.)
  • Buffer: Temporarily stores logs during processing

Setting Up Fluentd in Kubernetes

Fluentd DaemonSet Configuration

Fluentd typically runs as a DaemonSet to collect logs from each node:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: logging
  labels:
    app: fluentd
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      serviceAccountName: fluentd
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch.logging.svc.cluster.local"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        - name: FLUENT_ELASTICSEARCH_SCHEME
          value: "http"
        resources:
          limits:
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 256Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluentd-config
          mountPath: /fluentd/etc
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluentd-config
        configMap:
          name: fluentd-config
    

Fluentd Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: logging
data:
  fluent.conf: |
    
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      
    
    
    
      @type kubernetes_metadata
    
    
    
      @type elasticsearch
      host elasticsearch.logging.svc.cluster.local
      port 9200
      logstash_format true
      logstash_prefix fluentd
      include_tag_key true
      type_name fluentd
      
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever true
        retry_max_interval 30
        chunk_limit_size 2M
        queue_limit_length 8
        overflow_action block
      
    
    

Elasticsearch Configuration

Elasticsearch stores and indexes the log data collected by Fluentd:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
  namespace: logging
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.10.2
        env:
        - name: discovery.type
          value: single-node
        - name: ES_JAVA_OPTS
          value: "-Xms512m -Xmx512m"
        - name: xpack.security.enabled
          value: "false"
        ports:
        - containerPort: 9200
          name: http
        - containerPort: 9300
          name: transport
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
        resources:
          limits:
            memory: 1Gi
          requests:
            cpu: 500m
            memory: 1Gi
      volumes:
      - name: data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
  namespace: logging
spec:
  selector:
    app: elasticsearch
  ports:
  - port: 9200
    name: http
  - port: 9300
    name: transport
  clusterIP: None
    

Kibana Configuration

Kibana provides a web interface for searching, analyzing, and visualizing log data:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
  namespace: logging
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - name: kibana
        image: docker.elastic.co/kibana/kibana:7.10.2
        env:
        - name: ELASTICSEARCH_HOSTS
          value: "http://elasticsearch.logging.svc.cluster.local:9200"
        ports:
        - containerPort: 5601
        resources:
          requests:
            cpu: 100m
            memory: 500Mi
          limits:
            memory: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: kibana
  namespace: logging
spec:
  selector:
    app: kibana
  ports:
  - port: 5601
    targetPort: 5601
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kibana
  namespace: logging
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: kibana.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kibana
            port:
              number: 5601
    

Best Practices for Monitoring and Logging

Monitoring Best Practices

  • Monitor at multiple levels: cluster, node, pod, and container
  • Set up meaningful alerts with appropriate thresholds
  • Use histograms for latency measurements instead of averages
  • Regularly review and update your monitoring dashboards
  • Monitor resource utilization and plan for capacity

Logging Best Practices

  • Implement structured logging in your applications
  • Include correlation IDs for tracing requests across services
  • Set appropriate log retention policies
  • Secure access to your logging infrastructure
  • Regularly archive old logs to cold storage

Performance Considerations

  • Limit the cardinality of your metrics to prevent Prometheus overload
  • Use sampling for high-volume logs
  • Configure appropriate buffer sizes for Fluentd
  • Monitor the monitoring system itself
  • Consider using Thanos or Cortex for long-term metric storage

Implementing a comprehensive monitoring and logging solution is crucial for maintaining the reliability and performance of your Kubernetes clusters and applications. The combination of Prometheus for metrics and the EFK stack for logs provides a powerful observability platform that can scale with your needs.

Post a Comment

0 Comments