8.8 Container Best Practices

From Working to World-Class
You can run containers. You can build images. You can orchestrate applications. Now comes the crucial transformation: turning your container knowledge into production-ready expertise that enterprises depend on. This section distills hard-won lessons from thousands of production deployments, security incidents, and performance optimizations.
These aren’t theoretical guidelines - they’re battle-tested practices that prevent outages, security breaches, and operational headaches.
Learning Objectives
By the end of this section, you will:
Implement container security hardening that passes enterprise audits
Optimize images for size, performance, and reliability
Design production-ready container architectures
Monitor container performance and troubleshoot issues effectively
Automate security scanning and compliance checking
Apply operational best practices for day-2 container management
Prerequisites: Solid understanding of containers, Dockerfiles, and orchestration concepts
Security Best Practices
The Container Security Model
Container security operates on multiple layers, often called “defense in depth”:
┌─────────────────────────────────────┐
│ Application Layer │ ← Code vulnerabilities, secrets
├─────────────────────────────────────┤
│ Container Layer │ ← Image vulnerabilities, runtime config
├─────────────────────────────────────┤
│ Host OS Layer │ ← Kernel, system services
├─────────────────────────────────────┤
│ Infrastructure Layer │ ← Network, storage, compute
└─────────────────────────────────────┘
1. Secure Base Images
Use Minimal Base Images:
# EXCELLENT: Distroless (no shell, minimal attack surface)
FROM gcr.io/distroless/python3
# GOOD: Alpine Linux (minimal, security-focused)
FROM python:3.11-alpine
# ACCEPTABLE: Slim images (smaller than full images)
FROM python:3.11-slim
# AVOID: Full images (unnecessary packages, larger attack surface)
FROM python:3.11 # Contains compilers, debuggers, etc.
Pin Specific Versions:
# GOOD: Specific version with SHA256 hash
FROM python:3.11.6-alpine@sha256:a5b78f3e2a63ce3b...
# ACCEPTABLE: Specific semantic version
FROM python:3.11.6-alpine
# BAD: Moving tags
FROM python:3.11-alpine # Could change
FROM python:latest # Definitely will change
Scan Images for Vulnerabilities:
# Using Trivy (free, comprehensive)
trivy image python:3.11-alpine
# Using Docker Scout (integrated with Docker)
docker scout cves python:3.11-alpine
# Using Snyk (commercial, detailed reporting)
snyk test --docker python:3.11-alpine
# Automate in CI/CD
# Fail builds if critical vulnerabilities found
trivy image --exit-code 1 --severity HIGH,CRITICAL myapp:latest
2. Non-Root User Security
Why Root is Dangerous:
# If container escapes with root user:
docker run --rm -it -v /:/host ubuntu:latest chroot /host
# ^ This gives full host access if running as root
Secure User Implementation:
# Create dedicated user with specific UID/GID
RUN groupadd -r appuser -g 1001 && \
useradd -r -g appuser -u 1001 -s /bin/false appuser
# Create application directory and set ownership
WORKDIR /app
COPY --chown=appuser:appuser . .
# Install dependencies as root, then switch
RUN pip install -r requirements.txt
# Switch to non-root user for runtime
USER appuser
# Verify (for debugging)
RUN whoami # Should output: appuser
Advanced Security Hardening:
# Drop all capabilities, add only what's needed
# Use with: docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE
# Read-only root filesystem
# Use with: docker run --read-only --tmpfs /tmp
# No new privileges
# Use with: docker run --security-opt=no-new-privileges
# Use security profiles
# Use with: docker run --security-opt=seccomp:seccomp-profile.json
3. Secrets Management
Never Do This:
# NEVER: Secrets in images
ENV API_KEY=sk-1234567890abcdef
ENV DATABASE_PASSWORD=super_secret_password
COPY api_keys.txt /app/
Proper Secrets Handling:
# Docker Compose with external secrets
version: '3.8'
services:
app:
image: myapp:latest
environment:
- API_KEY_FILE=/run/secrets/api_key
secrets:
- api_key
secrets:
api_key:
external: true # Managed outside compose
# Runtime secret injection
docker run -e API_KEY="$(cat /secure/api_key)" myapp
# Using init containers to fetch secrets
# Kubernetes secret mounting
# HashiCorp Vault integration
Image Optimization
Size Optimization Strategies
1. Multi-Stage Builds for Minimal Images:
# Build stage
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build && npm prune --production
# Runtime stage - significantly smaller
FROM node:18-alpine AS runtime
RUN addgroup -g 1001 -S nodejs && \
adduser -S nextjs -u 1001
WORKDIR /app
# Copy only necessary files
COPY --from=builder --chown=nextjs:nodejs /app/dist ./dist
COPY --from=builder --chown=nextjs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nextjs:nodejs /app/package.json ./package.json
USER nextjs
CMD ["npm", "start"]
2. Layer Optimization:
# BAD: Creates multiple layers, poor caching
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y wget
RUN apt-get clean
# GOOD: Single layer, better caching
RUN apt-get update && \
apt-get install -y \
curl \
wget \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
3. Dependency Management:
# Python: Use wheels and no-cache
RUN pip install --no-cache-dir --find-links wheels -r requirements.txt
# Node.js: Clean npm cache
RUN npm ci --only=production && npm cache clean --force
# Go: Use modules and static linking
RUN go mod download && \
CGO_ENABLED=0 go build -ldflags="-w -s" -o app
4. Use .dockerignore:
# .dockerignore
.git
.gitignore
README.md
Dockerfile
.dockerignore
node_modules
.env
.env.local
coverage/
.nyc_output
target/
.pytest_cache
__pycache__
Image Size Comparison:
# Analyze image layers
docker history myapp:latest
# Compare sizes
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"
# Dive tool for detailed analysis
dive myapp:latest
Performance Best Practices
Resource Management
Memory and CPU Limits:
# Docker Compose
services:
web:
image: myapp:latest
deploy:
resources:
limits:
memory: 512M
cpus: '1.0'
reservations:
memory: 256M
cpus: '0.5'
# Docker run
docker run -m 512m --cpus="1.0" myapp:latest
JVM Applications:
# Set heap size relative to container memory
ENV JAVA_OPTS="-Xmx400m -Xms400m"
# For 512MB container, leave ~100MB for non-heap
Health Checks for Reliability:
# Comprehensive health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
# Health check endpoint implementation
@app.route('/health')
def health_check():
try:
# Check database connection
db.session.execute('SELECT 1')
# Check external dependencies
response = requests.get('http://api.service.com/ping', timeout=5)
# Check resource usage
memory_usage = psutil.virtual_memory().percent
if memory_usage > 90:
return {'status': 'unhealthy', 'reason': 'high_memory'}, 503
return {'status': 'healthy', 'timestamp': datetime.utcnow().isoformat()}
except Exception as e:
return {'status': 'unhealthy', 'error': str(e)}, 503
Startup and Graceful Shutdown:
# Use exec form for proper signal handling
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app:app"]
# Graceful shutdown handling
import signal
import sys
def signal_handler(sig, frame):
print('Gracefully shutting down...')
# Close database connections
# Finish processing current requests
# Clean up resources
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
Monitoring and Logging
Structured Logging
import logging
import json
from datetime import datetime
class StructuredLogger:
def __init__(self):
self.logger = logging.getLogger(__name__)
handler = logging.StreamHandler()
handler.setFormatter(self.JSONFormatter())
self.logger.addHandler(handler)
self.logger.setLevel(logging.INFO)
class JSONFormatter(logging.Formatter):
def format(self, record):
log_data = {
'timestamp': datetime.utcnow().isoformat(),
'level': record.levelname,
'service': 'my-app',
'message': record.getMessage(),
'container_id': os.environ.get('HOSTNAME', 'unknown')
}
if hasattr(record, 'request_id'):
log_data['request_id'] = record.request_id
return json.dumps(log_data)
Container Metrics Collection:
# Docker Compose with monitoring
version: '3.8'
services:
app:
image: myapp:latest
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
# Prometheus metrics collection
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
# Grafana for visualization
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
Application Metrics:
from prometheus_client import Counter, Histogram, generate_latest
REQUEST_COUNT = Counter('app_requests_total', 'Total requests', ['method', 'endpoint'])
REQUEST_DURATION = Histogram('app_request_duration_seconds', 'Request duration')
@REQUEST_DURATION.time()
def process_request():
REQUEST_COUNT.labels(method='GET', endpoint='/api/users').inc()
# Your application logic here
Development Workflow
Efficient Development Setup
Hot Reloading Configuration:
# docker-compose.dev.yml
version: '3.8'
services:
web:
build:
context: .
target: development
volumes:
- .:/app
- /app/node_modules # Prevent overwriting
environment:
- NODE_ENV=development
- CHOKIDAR_USEPOLLING=true # For file watching in containers
Testing in Containers:
# Multi-stage with test stage
FROM python:3.11-slim AS base
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
FROM base AS test
COPY requirements-test.txt .
RUN pip install -r requirements-test.txt
COPY . .
CMD ["pytest", "-v"]
FROM base AS production
COPY . .
CMD ["gunicorn", "app:app"]
# Run tests in container
docker build --target test -t myapp:test .
docker run --rm myapp:test
CI/CD Integration:
# .github/workflows/container.yml
name: Container CI/CD
on: [push, pull_request]
jobs:
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build image
run: docker build -t myapp:${{ github.sha }} .
- name: Security scan
run: |
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy:latest image --exit-code 1 --severity HIGH,CRITICAL \
myapp:${{ github.sha }}
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run tests
run: |
docker build --target test -t myapp:test .
docker run --rm myapp:test
Production Deployment
Zero-Downtime Deployments
Rolling Updates with Health Checks:
# Kubernetes deployment example
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
spec:
containers:
- name: app
image: myapp:v1.2.3
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Blue-Green Deployments:
# Blue-green deployment script
#!/bin/bash
NEW_VERSION=$1
# Deploy new version to green environment
docker service create --name myapp-green --replicas 3 myapp:$NEW_VERSION
# Wait for health checks
while ! curl -f http://green.example.com/health; do
sleep 5
done
# Switch traffic (update load balancer)
update_load_balancer_to_green
# Remove old blue environment
docker service rm myapp-blue
# Rename green to blue for next deployment
docker service update --name myapp-blue myapp-green
Backup and Recovery:
# Backup strategy for stateful services
services:
postgres:
image: postgres:15
volumes:
- postgres_data:/var/lib/postgresql/data
- ./backups:/backups
environment:
- POSTGRES_DB=myapp
backup:
image: postgres:15
volumes:
- postgres_data:/var/lib/postgresql/data:ro
- ./backups:/backups
command: |
sh -c "
while true; do
pg_dump -h postgres -U postgres myapp > /backups/backup_$(date +%Y%m%d_%H%M%S).sql
sleep 3600
done
"
Security Scanning Automation
Implementing Security Gates
#!/bin/bash
# security-scan.sh
IMAGE_NAME=$1
echo "Running security scan on $IMAGE_NAME..."
# Vulnerability scan
trivy image --format json --output scan-results.json $IMAGE_NAME
# Check for critical vulnerabilities
CRITICAL_VULN=$(jq '.Results[]?.Vulnerabilities[]? | select(.Severity == "CRITICAL") | length' scan-results.json | wc -l)
if [ "$CRITICAL_VULN" -gt 0 ]; then
echo " Critical vulnerabilities found. Deployment blocked."
exit 1
fi
# License compliance check
docker run --rm -v $(pwd):/workspace fossa/fossa analyze
# Secret detection
docker run --rm -v $(pwd):/workspace trufflesecurity/trufflehog:latest filesystem /workspace
echo " Security scan passed."
Policy as Code:
# Open Policy Agent (OPA) policy
package docker.security
deny[msg] {
input.User == "root"
msg := "Container cannot run as root user"
}
deny[msg] {
input.Image.tag == "latest"
msg := "Image must use specific version tags, not 'latest'"
}
deny[msg] {
not input.HealthCheck
msg := "Container must define a health check"
}
Troubleshooting Guide
Common Production Issues
Container Won’t Start:
# Check container logs
docker logs container_name
# Run interactively to debug
docker run -it --entrypoint /bin/sh image_name
# Check resource constraints
docker stats
docker system df
Performance Issues:
# Monitor resource usage
docker stats --no-stream
# Check container processes
docker exec container_name ps aux
# Memory analysis
docker exec container_name cat /proc/meminfo
# Check for memory leaks
docker exec container_name pmap -x 1
Network Issues:
# Test connectivity between containers
docker exec container1 ping container2
# Check DNS resolution
docker exec container_name nslookup service_name
# Inspect network configuration
docker network inspect network_name
Storage Issues:
# Check disk usage
docker system df
# Clean up unused resources
docker system prune -a
# Check volume mounts
docker inspect container_name | jq '.[].Mounts'
Operational Excellence
Day-2 Operations Checklist
Daily Tasks:
Monitor container health and resource usage
Review security scan results
Check backup completion
Monitor application metrics and alerts
Weekly Tasks:
Update base images for security patches
Review and clean up unused images/containers
Performance baseline comparison
Security policy compliance audit
Monthly Tasks:
Disaster recovery testing
Capacity planning review
Security training and awareness
Tool and process optimization
Automated Monitoring:
# Comprehensive monitoring stack
version: '3.8'
services:
# Log aggregation
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
# Metrics collection
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
# Alerting
alertmanager:
image: prom/alertmanager:latest
ports:
- "9093:9093"
# Visualization
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
Future-Proofing
Container Technology Evolution
WebAssembly (WASM) Containers:
Smaller, faster, more secure than traditional containers
Language-agnostic runtime
Better isolation and portability
Confidential Computing:
Hardware-encrypted container execution
Protection against privileged access attacks
Secure multi-party computation
GitOps and Infrastructure as Code:
Declarative container configuration
Version-controlled infrastructure
Automated drift detection and correction
What’s Next?
You now have the knowledge to run containers securely and efficiently in production. These practices form the foundation for scaling to Kubernetes, implementing microservices architectures, and building robust cloud-native applications.
Key takeaways:
Security is multi-layered and must be built in from the start
Image optimization reduces costs and improves performance
Monitoring and logging are essential for production operations
Automation prevents human error and improves reliability
Best practices evolve - stay current with the container ecosystem
Warning
Continuous Learning: Container technology evolves rapidly. Join the community, follow security advisories, and regularly update your practices. What’s secure today may be vulnerable tomorrow.