9.11 Production Best Practices

Running Kubernetes in Production

Production Kubernetes requires careful planning for security, reliability, monitoring, and resource management.

Resource Management

Resource Requests and Limits

# Properly configured resources
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  template:
    spec:
      containers:
      - name: app
        image: webapp:latest
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

Quality of Service Classes

  • Guaranteed: requests = limits for all containers

  • Burstable: at least one container has requests < limits

  • BestEffort: no requests or limits specified

High Availability

Pod Disruption Budgets

# Ensure minimum availability
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: webapp-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: webapp

Anti-Affinity Rules

# Spread pods across nodes
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: webapp
            topologyKey: kubernetes.io/hostname

Cluster Autoscaling

Horizontal Pod Autoscaler

# CPU-based autoscaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Cluster Autoscaler

# Node autoscaling
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
        name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster

Backup and Disaster Recovery

Velero Backup

# Backup schedule
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
  namespace: velero
spec:
  schedule: "0 2 * * *"
  template:
    includedNamespaces:
    - production
    - staging
    ttl: "720h"  # 30 days

Cost Optimization

Resource Quotas

# Namespace resource limits
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    pods: "50"

Spot Instances and Node Pools

# Mixed instance types
apiVersion: v1
kind: Node
metadata:
  labels:
    node-type: spot
    instance-type: m5.large
spec:
  taints:
  - key: spot-instance
    value: "true"
    effect: NoSchedule

Security Hardening

Pod Security Standards

# Restricted namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Network Policies

# Default deny all
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Essential Commands

# Resource monitoring
kubectl top nodes
kubectl top pods --all-namespaces
kubectl describe node node-name

# Cluster health
kubectl get componentstatuses
kubectl get events --all-namespaces

# Resource management
kubectl get resourcequotas --all-namespaces
kubectl get poddisruptionbudgets
kubectl get hpa

What’s Next?

Next, we’ll explore Troubleshooting and Debugging techniques for production issues.