8.8 Container Best Practices

A diagram showing container security layers and best practices

From Working to World-Class

You can run containers. You can build images. You can orchestrate applications. Now comes the crucial transformation: turning your container knowledge into production-ready expertise that enterprises depend on. This section distills hard-won lessons from thousands of production deployments, security incidents, and performance optimizations.

These aren’t theoretical guidelines - they’re battle-tested practices that prevent outages, security breaches, and operational headaches.

Learning Objectives

By the end of this section, you will:

Implement container security hardening that passes enterprise audits
Optimize images for size, performance, and reliability
Design production-ready container architectures
Monitor container performance and troubleshoot issues effectively
Automate security scanning and compliance checking
Apply operational best practices for day-2 container management

Prerequisites: Solid understanding of containers, Dockerfiles, and orchestration concepts

Security Best Practices

The Container Security Model

Container security operates on multiple layers, often called “defense in depth”:

┌─────────────────────────────────────┐
│         Application Layer           │  ← Code vulnerabilities, secrets
├─────────────────────────────────────┤
│         Container Layer             │  ← Image vulnerabilities, runtime config
├─────────────────────────────────────┤
│         Host OS Layer               │  ← Kernel, system services
├─────────────────────────────────────┤
│       Infrastructure Layer          │  ← Network, storage, compute
└─────────────────────────────────────┘

1. Secure Base Images

Use Minimal Base Images:

# EXCELLENT: Distroless (no shell, minimal attack surface)
FROM gcr.io/distroless/python3

# GOOD: Alpine Linux (minimal, security-focused)
FROM python:3.11-alpine

# ACCEPTABLE: Slim images (smaller than full images)
FROM python:3.11-slim

# AVOID: Full images (unnecessary packages, larger attack surface)
FROM python:3.11  # Contains compilers, debuggers, etc.

Pin Specific Versions:

# GOOD: Specific version with SHA256 hash
FROM python:3.11.6-alpine@sha256:a5b78f3e2a63ce3b...

# ACCEPTABLE: Specific semantic version
FROM python:3.11.6-alpine

# BAD: Moving tags
FROM python:3.11-alpine  # Could change
FROM python:latest       # Definitely will change

Scan Images for Vulnerabilities:

# Using Trivy (free, comprehensive)
trivy image python:3.11-alpine

# Using Docker Scout (integrated with Docker)
docker scout cves python:3.11-alpine

# Using Snyk (commercial, detailed reporting)
snyk test --docker python:3.11-alpine

# Automate in CI/CD
# Fail builds if critical vulnerabilities found
trivy image --exit-code 1 --severity HIGH,CRITICAL myapp:latest

2. Non-Root User Security

Why Root is Dangerous:

# If container escapes with root user:
docker run --rm -it -v /:/host ubuntu:latest chroot /host
# ^ This gives full host access if running as root

Secure User Implementation:

# Create dedicated user with specific UID/GID
RUN groupadd -r appuser -g 1001 && \
    useradd -r -g appuser -u 1001 -s /bin/false appuser

# Create application directory and set ownership
WORKDIR /app
COPY --chown=appuser:appuser . .

# Install dependencies as root, then switch
RUN pip install -r requirements.txt

# Switch to non-root user for runtime
USER appuser

# Verify (for debugging)
RUN whoami  # Should output: appuser

Advanced Security Hardening:

# Drop all capabilities, add only what's needed
# Use with: docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE

# Read-only root filesystem
# Use with: docker run --read-only --tmpfs /tmp

# No new privileges
# Use with: docker run --security-opt=no-new-privileges

# Use security profiles
# Use with: docker run --security-opt=seccomp:seccomp-profile.json

3. Secrets Management

Never Do This:

# NEVER: Secrets in images
ENV API_KEY=sk-1234567890abcdef
ENV DATABASE_PASSWORD=super_secret_password
COPY api_keys.txt /app/

Proper Secrets Handling:

# Docker Compose with external secrets
version: '3.8'
services:
  app:
    image: myapp:latest
    environment:
      - API_KEY_FILE=/run/secrets/api_key
    secrets:
      - api_key

secrets:
  api_key:
    external: true  # Managed outside compose

# Runtime secret injection
docker run -e API_KEY="$(cat /secure/api_key)" myapp

# Using init containers to fetch secrets
# Kubernetes secret mounting
# HashiCorp Vault integration

Image Optimization

Size Optimization Strategies

1. Multi-Stage Builds for Minimal Images:

# Build stage
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build && npm prune --production

# Runtime stage - significantly smaller
FROM node:18-alpine AS runtime
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nextjs -u 1001
WORKDIR /app

# Copy only necessary files
COPY --from=builder --chown=nextjs:nodejs /app/dist ./dist
COPY --from=builder --chown=nextjs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nextjs:nodejs /app/package.json ./package.json

USER nextjs
CMD ["npm", "start"]

2. Layer Optimization:

# BAD: Creates multiple layers, poor caching
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y wget
RUN apt-get clean

# GOOD: Single layer, better caching
RUN apt-get update && \
    apt-get install -y \
        curl \
        wget \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

3. Dependency Management:

# Python: Use wheels and no-cache
RUN pip install --no-cache-dir --find-links wheels -r requirements.txt

# Node.js: Clean npm cache
RUN npm ci --only=production && npm cache clean --force

# Go: Use modules and static linking
RUN go mod download && \
    CGO_ENABLED=0 go build -ldflags="-w -s" -o app

4. Use .dockerignore:

# .dockerignore
.git
.gitignore
README.md
Dockerfile
.dockerignore
node_modules
.env
.env.local
coverage/
.nyc_output
target/
.pytest_cache
__pycache__

Image Size Comparison:

# Analyze image layers
docker history myapp:latest

# Compare sizes
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"

# Dive tool for detailed analysis
dive myapp:latest

Performance Best Practices

Resource Management

Memory and CPU Limits:

# Docker Compose
services:
  web:
    image: myapp:latest
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: '1.0'
        reservations:
          memory: 256M
          cpus: '0.5'

# Docker run
docker run -m 512m --cpus="1.0" myapp:latest

JVM Applications:

# Set heap size relative to container memory
ENV JAVA_OPTS="-Xmx400m -Xms400m"
# For 512MB container, leave ~100MB for non-heap

Health Checks for Reliability:

# Comprehensive health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD curl -f http://localhost:8080/health || exit 1

# Health check endpoint implementation
@app.route('/health')
def health_check():
    try:
        # Check database connection
        db.session.execute('SELECT 1')

        # Check external dependencies
        response = requests.get('http://api.service.com/ping', timeout=5)

        # Check resource usage
        memory_usage = psutil.virtual_memory().percent
        if memory_usage > 90:
            return {'status': 'unhealthy', 'reason': 'high_memory'}, 503

        return {'status': 'healthy', 'timestamp': datetime.utcnow().isoformat()}
    except Exception as e:
        return {'status': 'unhealthy', 'error': str(e)}, 503

Startup and Graceful Shutdown:

# Use exec form for proper signal handling
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app:app"]

# Graceful shutdown handling
import signal
import sys

def signal_handler(sig, frame):
    print('Gracefully shutting down...')
    # Close database connections
    # Finish processing current requests
    # Clean up resources
    sys.exit(0)

signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)

Monitoring and Logging

Structured Logging

import logging
import json
from datetime import datetime

class StructuredLogger:
    def __init__(self):
        self.logger = logging.getLogger(__name__)
        handler = logging.StreamHandler()
        handler.setFormatter(self.JSONFormatter())
        self.logger.addHandler(handler)
        self.logger.setLevel(logging.INFO)

    class JSONFormatter(logging.Formatter):
        def format(self, record):
            log_data = {
                'timestamp': datetime.utcnow().isoformat(),
                'level': record.levelname,
                'service': 'my-app',
                'message': record.getMessage(),
                'container_id': os.environ.get('HOSTNAME', 'unknown')
            }
            if hasattr(record, 'request_id'):
                log_data['request_id'] = record.request_id
            return json.dumps(log_data)

Container Metrics Collection:

# Docker Compose with monitoring
version: '3.8'
services:
  app:
    image: myapp:latest
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  # Prometheus metrics collection
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  # Grafana for visualization
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

Application Metrics:

from prometheus_client import Counter, Histogram, generate_latest

REQUEST_COUNT = Counter('app_requests_total', 'Total requests', ['method', 'endpoint'])
REQUEST_DURATION = Histogram('app_request_duration_seconds', 'Request duration')

@REQUEST_DURATION.time()
def process_request():
    REQUEST_COUNT.labels(method='GET', endpoint='/api/users').inc()
    # Your application logic here

Development Workflow

Efficient Development Setup

Hot Reloading Configuration:

# docker-compose.dev.yml
version: '3.8'
services:
  web:
    build:
      context: .
      target: development
    volumes:
      - .:/app
      - /app/node_modules  # Prevent overwriting
    environment:
      - NODE_ENV=development
      - CHOKIDAR_USEPOLLING=true  # For file watching in containers

Testing in Containers:

# Multi-stage with test stage
FROM python:3.11-slim AS base
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

FROM base AS test
COPY requirements-test.txt .
RUN pip install -r requirements-test.txt
COPY . .
CMD ["pytest", "-v"]

FROM base AS production
COPY . .
CMD ["gunicorn", "app:app"]

# Run tests in container
docker build --target test -t myapp:test .
docker run --rm myapp:test

CI/CD Integration:

# .github/workflows/container.yml
name: Container CI/CD
on: [push, pull_request]

jobs:
  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build image
        run: docker build -t myapp:${{ github.sha }} .
      - name: Security scan
        run: |
          docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
            aquasec/trivy:latest image --exit-code 1 --severity HIGH,CRITICAL \
            myapp:${{ github.sha }}

  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run tests
        run: |
          docker build --target test -t myapp:test .
          docker run --rm myapp:test

Production Deployment

Zero-Downtime Deployments

Rolling Updates with Health Checks:

# Kubernetes deployment example
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  template:
    spec:
      containers:
      - name: app
        image: myapp:v1.2.3
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

Blue-Green Deployments:

# Blue-green deployment script
#!/bin/bash
NEW_VERSION=$1

# Deploy new version to green environment
docker service create --name myapp-green --replicas 3 myapp:$NEW_VERSION

# Wait for health checks
while ! curl -f http://green.example.com/health; do
    sleep 5
done

# Switch traffic (update load balancer)
update_load_balancer_to_green

# Remove old blue environment
docker service rm myapp-blue

# Rename green to blue for next deployment
docker service update --name myapp-blue myapp-green

Backup and Recovery:

# Backup strategy for stateful services
services:
  postgres:
    image: postgres:15
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./backups:/backups
    environment:
      - POSTGRES_DB=myapp

  backup:
    image: postgres:15
    volumes:
      - postgres_data:/var/lib/postgresql/data:ro
      - ./backups:/backups
    command: |
      sh -c "
      while true; do
        pg_dump -h postgres -U postgres myapp > /backups/backup_$(date +%Y%m%d_%H%M%S).sql
        sleep 3600
      done
      "

Security Scanning Automation

Implementing Security Gates

#!/bin/bash
# security-scan.sh
IMAGE_NAME=$1

echo "Running security scan on $IMAGE_NAME..."

# Vulnerability scan
trivy image --format json --output scan-results.json $IMAGE_NAME

# Check for critical vulnerabilities
CRITICAL_VULN=$(jq '.Results[]?.Vulnerabilities[]? | select(.Severity == "CRITICAL") | length' scan-results.json | wc -l)

if [ "$CRITICAL_VULN" -gt 0 ]; then
    echo " Critical vulnerabilities found. Deployment blocked."
    exit 1
fi

# License compliance check
docker run --rm -v $(pwd):/workspace fossa/fossa analyze

# Secret detection
docker run --rm -v $(pwd):/workspace trufflesecurity/trufflehog:latest filesystem /workspace

echo " Security scan passed."

Policy as Code:

# Open Policy Agent (OPA) policy
package docker.security

deny[msg] {
    input.User == "root"
    msg := "Container cannot run as root user"
}

deny[msg] {
    input.Image.tag == "latest"
    msg := "Image must use specific version tags, not 'latest'"
}

deny[msg] {
    not input.HealthCheck
    msg := "Container must define a health check"
}

Troubleshooting Guide

Common Production Issues

Container Won’t Start:

# Check container logs
docker logs container_name

# Run interactively to debug
docker run -it --entrypoint /bin/sh image_name

# Check resource constraints
docker stats
docker system df

Performance Issues:

# Monitor resource usage
docker stats --no-stream

# Check container processes
docker exec container_name ps aux

# Memory analysis
docker exec container_name cat /proc/meminfo

# Check for memory leaks
docker exec container_name pmap -x 1

Network Issues:

# Test connectivity between containers
docker exec container1 ping container2

# Check DNS resolution
docker exec container_name nslookup service_name

# Inspect network configuration
docker network inspect network_name

Storage Issues:

# Check disk usage
docker system df

# Clean up unused resources
docker system prune -a

# Check volume mounts
docker inspect container_name | jq '.[].Mounts'

Operational Excellence

Day-2 Operations Checklist

Daily Tasks:

Monitor container health and resource usage
Review security scan results
Check backup completion
Monitor application metrics and alerts

Weekly Tasks:

Update base images for security patches
Review and clean up unused images/containers
Performance baseline comparison
Security policy compliance audit

Monthly Tasks:

Disaster recovery testing
Capacity planning review
Security training and awareness
Tool and process optimization

Automated Monitoring:

# Comprehensive monitoring stack
version: '3.8'
services:
  # Log aggregation
  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"

  # Metrics collection
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"

  # Alerting
  alertmanager:
    image: prom/alertmanager:latest
    ports:
      - "9093:9093"

  # Visualization
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"

Future-Proofing

Container Technology Evolution

WebAssembly (WASM) Containers:

Smaller, faster, more secure than traditional containers
Language-agnostic runtime
Better isolation and portability

Confidential Computing:

Hardware-encrypted container execution
Protection against privileged access attacks
Secure multi-party computation

GitOps and Infrastructure as Code:

Declarative container configuration
Version-controlled infrastructure
Automated drift detection and correction

What’s Next?

You now have the knowledge to run containers securely and efficiently in production. These practices form the foundation for scaling to Kubernetes, implementing microservices architectures, and building robust cloud-native applications.

Key takeaways:

Security is multi-layered and must be built in from the start
Image optimization reduces costs and improves performance
Monitoring and logging are essential for production operations
Automation prevents human error and improves reliability
Best practices evolve - stay current with the container ecosystem

Warning

Continuous Learning: Container technology evolves rapidly. Join the community, follow security advisories, and regularly update your practices. What’s secure today may be vulnerable tomorrow.