9.8 Observability and Monitoring

Gaining Visibility into Kubernetes

Observability combines metrics, logs, and traces to understand system behavior and troubleshoot issues.

Metrics with Prometheus

Collecting System Metrics

# ServiceMonitor for app metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: webapp-metrics
spec:
  selector:
    matchLabels:
      app: webapp
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

Custom Metrics

# Pod with metrics endpoint
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      - name: app
        image: webapp:latest
        ports:
        - name: metrics
          containerPort: 9090

Logging

Centralized Log Collection

# Fluent Bit DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:latest
        volumeMounts:
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: config
          mountPath: /fluent-bit/etc
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: config
        configMap:
          name: fluent-bit-config

Application Logging

# Structured logging example
apiVersion: v1
kind: Pod
metadata:
  name: app-with-logging
spec:
  containers:
  - name: app
    image: myapp:latest
    env:
    - name: LOG_LEVEL
      value: "info"
    - name: LOG_FORMAT
      value: "json"

Health Checks

Probes for Application Health

# Pod with health checks
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  template:
    spec:
      containers:
      - name: app
        image: webapp:latest
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        startupProbe:
          httpGet:
            path: /startup
            port: 8080
          failureThreshold: 30
          periodSeconds: 10

Alerts

Prometheus Alerting Rules

# PrometheusRule for alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: webapp-alerts
spec:
  groups:
  - name: webapp.rules
    rules:
    - alert: HighErrorRate
      expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High error rate detected"
        description: "Error rate is {{ $value }} errors per second"

Distributed Tracing

Tracing with Jaeger

# Application with tracing
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  template:
    spec:
      containers:
      - name: app
        image: webapp:latest
        env:
        - name: JAEGER_AGENT_HOST
          value: "jaeger-agent"
        - name: JAEGER_AGENT_PORT
          value: "6831"

Dashboard with Grafana

Visualization Setup

# Grafana ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
data:
  prometheus.yaml: |
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://prometheus-server:80
      access: proxy
      isDefault: true

Resource Monitoring

Cluster Resource Usage

# Built-in resource monitoring
kubectl top nodes
kubectl top pods --all-namespaces
kubectl top pods --containers

Metrics Server

# Enable metrics collection
apiVersion: v1
kind: Service
metadata:
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    k8s-app: metrics-server
  ports:
  - port: 443
    targetPort: 4443

Essential Commands

# Metrics and monitoring
kubectl top nodes
kubectl top pods
kubectl get servicemonitors
kubectl get prometheusrules

# Logs
kubectl logs pod-name
kubectl logs pod-name -c container-name
kubectl logs -f deployment/webapp
kubectl logs --previous pod-name

# Events and troubleshooting
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl describe pod problematic-pod

What’s Next?

Next, we’ll explore Helm Package Management for deploying and managing complex applications.