9.8 Observability and Monitoring
Gaining Visibility into Kubernetes
Observability combines metrics, logs, and traces to understand system behavior and troubleshoot issues.
Metrics with Prometheus
Collecting System Metrics
# ServiceMonitor for app metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: webapp-metrics
spec:
selector:
matchLabels:
app: webapp
endpoints:
- port: metrics
interval: 30s
path: /metrics
Custom Metrics
# Pod with metrics endpoint
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: app
image: webapp:latest
ports:
- name: metrics
containerPort: 9090
Logging
Centralized Log Collection
# Fluent Bit DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:latest
volumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
- name: config
mountPath: /fluent-bit/etc
volumes:
- name: varlog
hostPath:
path: /var/log
- name: config
configMap:
name: fluent-bit-config
Application Logging
# Structured logging example
apiVersion: v1
kind: Pod
metadata:
name: app-with-logging
spec:
containers:
- name: app
image: myapp:latest
env:
- name: LOG_LEVEL
value: "info"
- name: LOG_FORMAT
value: "json"
Health Checks
Probes for Application Health
# Pod with health checks
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
template:
spec:
containers:
- name: app
image: webapp:latest
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
startupProbe:
httpGet:
path: /startup
port: 8080
failureThreshold: 30
periodSeconds: 10
Alerts
Prometheus Alerting Rules
# PrometheusRule for alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: webapp-alerts
spec:
groups:
- name: webapp.rules
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} errors per second"
Distributed Tracing
Tracing with Jaeger
# Application with tracing
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
template:
spec:
containers:
- name: app
image: webapp:latest
env:
- name: JAEGER_AGENT_HOST
value: "jaeger-agent"
- name: JAEGER_AGENT_PORT
value: "6831"
Dashboard with Grafana
Visualization Setup
# Grafana ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
data:
prometheus.yaml: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server:80
access: proxy
isDefault: true
Resource Monitoring
Cluster Resource Usage
# Built-in resource monitoring
kubectl top nodes
kubectl top pods --all-namespaces
kubectl top pods --containers
Metrics Server
# Enable metrics collection
apiVersion: v1
kind: Service
metadata:
name: metrics-server
namespace: kube-system
spec:
selector:
k8s-app: metrics-server
ports:
- port: 443
targetPort: 4443
Essential Commands
# Metrics and monitoring
kubectl top nodes
kubectl top pods
kubectl get servicemonitors
kubectl get prometheusrules
# Logs
kubectl logs pod-name
kubectl logs pod-name -c container-name
kubectl logs -f deployment/webapp
kubectl logs --previous pod-name
# Events and troubleshooting
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl describe pod problematic-pod
What’s Next?
Next, we’ll explore Helm Package Management for deploying and managing complex applications.