################################ 9.8 Observability and Monitoring ################################ **Gaining Visibility into Kubernetes** Observability combines metrics, logs, and traces to understand system behavior and troubleshoot issues. ======================= Metrics with Prometheus ======================= **Collecting System Metrics** .. code-block:: yaml # ServiceMonitor for app metrics apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: webapp-metrics spec: selector: matchLabels: app: webapp endpoints: - port: metrics interval: 30s path: /metrics **Custom Metrics** .. code-block:: yaml # Pod with metrics endpoint apiVersion: apps/v1 kind: Deployment metadata: name: webapp spec: template: metadata: labels: app: webapp spec: containers: - name: app image: webapp:latest ports: - name: metrics containerPort: 9090 ======= Logging ======= **Centralized Log Collection** .. code-block:: yaml # Fluent Bit DaemonSet apiVersion: apps/v1 kind: DaemonSet metadata: name: fluent-bit namespace: logging spec: selector: matchLabels: app: fluent-bit template: metadata: labels: app: fluent-bit spec: containers: - name: fluent-bit image: fluent/fluent-bit:latest volumeMounts: - name: varlog mountPath: /var/log readOnly: true - name: config mountPath: /fluent-bit/etc volumes: - name: varlog hostPath: path: /var/log - name: config configMap: name: fluent-bit-config **Application Logging** .. code-block:: yaml # Structured logging example apiVersion: v1 kind: Pod metadata: name: app-with-logging spec: containers: - name: app image: myapp:latest env: - name: LOG_LEVEL value: "info" - name: LOG_FORMAT value: "json" ============= Health Checks ============= **Probes for Application Health** .. code-block:: yaml # Pod with health checks apiVersion: apps/v1 kind: Deployment metadata: name: webapp spec: template: spec: containers: - name: app image: webapp:latest livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 startupProbe: httpGet: path: /startup port: 8080 failureThreshold: 30 periodSeconds: 10 ====== Alerts ====== **Prometheus Alerting Rules** .. code-block:: yaml # PrometheusRule for alerts apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: webapp-alerts spec: groups: - name: webapp.rules rules: - alert: HighErrorRate expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1 for: 5m labels: severity: warning annotations: summary: "High error rate detected" description: "Error rate is {{ $value }} errors per second" =================== Distributed Tracing =================== **Tracing with Jaeger** .. code-block:: yaml # Application with tracing apiVersion: apps/v1 kind: Deployment metadata: name: webapp spec: template: spec: containers: - name: app image: webapp:latest env: - name: JAEGER_AGENT_HOST value: "jaeger-agent" - name: JAEGER_AGENT_PORT value: "6831" ====================== Dashboard with Grafana ====================== **Visualization Setup** .. code-block:: yaml # Grafana ConfigMap apiVersion: v1 kind: ConfigMap metadata: name: grafana-datasources data: prometheus.yaml: | apiVersion: 1 datasources: - name: Prometheus type: prometheus url: http://prometheus-server:80 access: proxy isDefault: true =================== Resource Monitoring =================== **Cluster Resource Usage** .. code-block:: bash # Built-in resource monitoring kubectl top nodes kubectl top pods --all-namespaces kubectl top pods --containers **Metrics Server** .. code-block:: yaml # Enable metrics collection apiVersion: v1 kind: Service metadata: name: metrics-server namespace: kube-system spec: selector: k8s-app: metrics-server ports: - port: 443 targetPort: 4443 ================== Essential Commands ================== .. code-block:: bash # Metrics and monitoring kubectl top nodes kubectl top pods kubectl get servicemonitors kubectl get prometheusrules # Logs kubectl logs pod-name kubectl logs pod-name -c container-name kubectl logs -f deployment/webapp kubectl logs --previous pod-name # Events and troubleshooting kubectl get events --sort-by=.metadata.creationTimestamp kubectl describe pod problematic-pod ============ What's Next? ============ Next, we'll explore **Helm Package Management** for deploying and managing complex applications.