############################## 9.11 Production Best Practices ############################## **Running Kubernetes in Production** Production Kubernetes requires careful planning for security, reliability, monitoring, and resource management. =================== Resource Management =================== **Resource Requests and Limits** .. code-block:: yaml # Properly configured resources apiVersion: apps/v1 kind: Deployment metadata: name: webapp spec: template: spec: containers: - name: app image: webapp:latest resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" **Quality of Service Classes** - **Guaranteed**: requests = limits for all containers - **Burstable**: at least one container has requests < limits - **BestEffort**: no requests or limits specified ================= High Availability ================= **Pod Disruption Budgets** .. code-block:: yaml # Ensure minimum availability apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: webapp-pdb spec: minAvailable: 2 selector: matchLabels: app: webapp **Anti-Affinity Rules** .. code-block:: yaml # Spread pods across nodes apiVersion: apps/v1 kind: Deployment metadata: name: webapp spec: template: spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app: webapp topologyKey: kubernetes.io/hostname =================== Cluster Autoscaling =================== **Horizontal Pod Autoscaler** .. code-block:: yaml # CPU-based autoscaling apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: webapp-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: webapp minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 **Cluster Autoscaler** .. code-block:: yaml # Node autoscaling apiVersion: apps/v1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system spec: template: spec: containers: - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0 name: cluster-autoscaler command: - ./cluster-autoscaler - --v=4 - --stderrthreshold=info - --cloud-provider=aws - --skip-nodes-with-local-storage=false - --expander=least-waste - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster ============================ Backup and Disaster Recovery ============================ **Velero Backup** .. code-block:: yaml # Backup schedule apiVersion: velero.io/v1 kind: Schedule metadata: name: daily-backup namespace: velero spec: schedule: "0 2 * * *" template: includedNamespaces: - production - staging ttl: "720h" # 30 days ================= Cost Optimization ================= **Resource Quotas** .. code-block:: yaml # Namespace resource limits apiVersion: v1 kind: ResourceQuota metadata: name: compute-quota namespace: production spec: hard: requests.cpu: "100" requests.memory: 200Gi limits.cpu: "200" limits.memory: 400Gi pods: "50" **Spot Instances and Node Pools** .. code-block:: yaml # Mixed instance types apiVersion: v1 kind: Node metadata: labels: node-type: spot instance-type: m5.large spec: taints: - key: spot-instance value: "true" effect: NoSchedule ================== Security Hardening ================== **Pod Security Standards** .. code-block:: yaml # Restricted namespace apiVersion: v1 kind: Namespace metadata: name: production labels: pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restricted **Network Policies** .. code-block:: yaml # Default deny all apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: production spec: podSelector: {} policyTypes: - Ingress - Egress ================== Essential Commands ================== .. code-block:: bash # Resource monitoring kubectl top nodes kubectl top pods --all-namespaces kubectl describe node node-name # Cluster health kubectl get componentstatuses kubectl get events --all-namespaces # Resource management kubectl get resourcequotas --all-namespaces kubectl get poddisruptionbudgets kubectl get hpa ============ What's Next? ============ Next, we'll explore **Troubleshooting and Debugging** techniques for production issues.