7.4 CI/CD Best Practices
Fast Feedback & Error Clarity
Optimal Job Ordering:
# Fast validation first (1-2 min)
- name: Lint & format check
run: |
uv run ruff check .
uv run ruff format --check .
# Core tests next (3-5 min)
- name: Unit tests
run: uv run pytest tests/unit/ -v --tb=short
# Expensive tests last (10+ min)
- name: Integration tests
run: uv run pytest tests/integration/
Security Integration (DevSecOps)
Essential Security Checks:
security:
steps:
- uses: actions/checkout@v4
# Vulnerability scanning
- name: Scan dependencies
run: |
uv run safety check
uv run pip-audit
# Static security analysis
- name: Security linting
run: uv run bandit -r src/ -f json -o security.json
# Secret detection
- name: Scan for secrets
uses: trufflesecurity/trufflehog@main
with:
path: ./
base: main
head: HEAD
with:
path: ./
base: main
head: HEAD
Business value: Finding security issues in development costs $100. Finding them in production costs $10,000+.
4. Make Pipelines Observable
What you can’t measure, you can’t improve. Successful teams track pipeline metrics as carefully as application metrics.
Key metrics to monitor:
- name: Record pipeline metrics
run: |
echo "PIPELINE_START_TIME=$(date +%s)" >> $GITHUB_ENV
echo "COMMIT_SHA=${GITHUB_SHA}" >> $GITHUB_ENV
echo "BUILD_NUMBER=${GITHUB_RUN_NUMBER}" >> $GITHUB_ENV
# At the end of your pipeline
- name: Report pipeline success
if: success()
run: |
DURATION=$(($(date +%s) - $PIPELINE_START_TIME))
curl -X POST "$METRICS_ENDPOINT" \
-d "pipeline_duration_seconds=$DURATION" \
-d "pipeline_result=success" \
-d "commit_sha=$COMMIT_SHA"
Metrics that matter:
Pipeline duration: How long builds take (optimize the slowest stages first)
Success rate: What percentage of builds pass (target >95%)
Flaky test rate: Tests that sometimes fail (fix these immediately)
Queue time: How long builds wait to start (indicates resource constraints)
steps:
- run: uv run pytest
3. Make Pipelines Self-Contained
Each pipeline run should be completely independent
Don’t rely on previous build artifacts
Use fresh environments for each run
Python-Specific Best Practices
Modern Dependency Management:
- uses: astral-sh/setup-uv@v3
with:
enable-cache: true
cache-dependency-glob: "uv.lock"
- run: uv sync --dev
Multi-Version Testing:
strategy:
matrix:
python-version: ["3.11", "3.12", "3.13"]
exclude:
- python-version: "3.12"
os: windows-latest
Quality Gates:
- name: Quality checks
run: |
uv run ruff check . # Linting
uv run ruff format --check . # Formatting
uv run mypy src/ # Type checking
uv run bandit -r src/ # Security
Security Best Practices
Secret Management:
- name: Deploy to production
env:
API_KEY: ${{ secrets.PRODUCTION_API_KEY }}
run: deploy.sh
Dependency Security:
- name: Security audit
run: |
uv run bandit -r src/
uv run safety check
uv run pip-audit
Container Security:
FROM python:3.12-slim
RUN adduser --disabled-password --gecos '' appuser
USER appuser # Don't run as root
COPY --chown=appuser:appuser . /app
Testing Best Practices
Test Pyramid Structure:
# Fast unit tests (70% of tests)
- name: Unit tests
run: uv run pytest tests/unit/ --maxfail=1
# Medium integration tests (20% of tests)
- name: Integration tests
run: uv run pytest tests/integration/ -v
# Slow E2E tests (10% of tests)
- name: End-to-end tests
run: uv run pytest tests/e2e/ --timeout=300
# Good: Layered testing approach
- name: Unit tests
run: uv run pytest tests/unit/ -v
- name: Integration tests
run: uv run pytest tests/integration/ -v
- name: E2E tests
if: github.ref == 'refs/heads/main'
run: uv run pytest tests/e2e/ -v
Test Coverage:
- name: Test with coverage
run: |
uv run pytest --cov=src --cov-report=xml --cov-fail-under=80
uv run coverage report
Environment Management
Container-Based Consistency:
jobs:
test:
runs-on: ubuntu-latest
container:
image: python:3.12-slim # Match production
env:
DATABASE_URL: postgresql://test:test@postgres:5432/testdb
services:
postgres:
image: postgres:15 # Same as production
env:
POSTGRES_PASSWORD: test
POSTGRES_DB: testdb
Environment Promotion:
deploy:
strategy:
matrix:
environment: [staging, production]
include:
- environment: staging
url: https://staging.myapp.com
requires_approval: false
- environment: production
url: https://myapp.com
requires_approval: true
Deployment Best Practices
Environment Strategy:
deploy-staging:
if: github.ref == 'refs/heads/develop'
environment: staging
steps:
- name: Deploy to staging
run: kubectl apply -f k8s/staging/
deploy-production:
if: startsWith(github.ref, 'refs/tags/v')
environment: production
needs: [test, security-scan]
2. Blue-Green and Canary Deployments
Minimize downtime with blue-green deployments
Reduce risk with canary releases
Always have a rollback plan
3. Database Migration Safety
Make migrations backward-compatible
Test migrations on production-like data
Have rollback procedures for schema changes
Monitoring and Observability
1. Pipeline Monitoring
Track pipeline success rates
Monitor pipeline duration trends
Alert on failures
# Good: Failure notifications
- name: Notify on failure
if: failure()
uses: actions/slack@v1
with:
webhook: ${{ secrets.SLACK_WEBHOOK }}
message: "Pipeline failed on ${{ github.ref }}"
2. Key Metrics to Track
Lead Time: Code commit to production
Deployment Frequency: How often you deploy
Mean Time to Recovery: How quickly you fix issues
Change Failure Rate: Percentage of deployments causing issues
Workflow Organization
1. Branching Strategy Alignment
# Good: Strategy-aligned triggers
on:
push:
branches: [main] # Production deployments
pull_request:
branches: [main] # PR validation
push:
branches: [develop] # Staging deployments
2. Job Dependencies and Parallelization
# Good: Optimal job organization
jobs:
# Fast parallel checks
lint:
runs-on: ubuntu-latest
test:
runs-on: ubuntu-latest
security:
runs-on: ubuntu-latest
# Build only after checks pass
build:
needs: [lint, test, security]
runs-on: ubuntu-latest
# Deploy only after build succeeds
deploy:
needs: build
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
Cost Optimization
1. Runner Selection
Use ubuntu-latest for most jobs (cheapest)
Use macOS/Windows only when necessary
Consider self-hosted runners for heavy workloads
2. Cache Strategy
# Good: Effective caching
- uses: actions/cache@v3
with:
path: ~/.cache/uv
key: ${{ runner.os }}-uv-${{ hashFiles('uv.lock') }}
restore-keys: ${{ runner.os }}-uv-
3. Conditional Execution
# Good: Skip unnecessary work
- name: Deploy docs
if: contains(github.event.head_commit.modified, 'docs/')
run: deploy-docs.sh
Performance Optimization
Advanced Caching Patterns:
- name: Cache dependencies
uses: actions/cache@v4
with:
path: |
~/.cache/uv
.venv
key: ${{ runner.os }}-uv-${{ hashFiles('uv.lock') }}
restore-keys: |
${{ runner.os }}-uv-
Conditional Execution:
check-changes:
outputs:
backend-changed: ${{ steps.changes.outputs.backend }}
steps:
- uses: dorny/paths-filter@v2
id: changes
with:
filters: |
backend: ['src/**', 'requirements.txt']
frontend: ['frontend/**', 'package.json']
test-backend:
needs: check-changes
if: needs.check-changes.outputs.backend-changed == 'true'
run: uv run pytest tests/
Resource Right-Sizing:
lint:
runs-on: ubuntu-latest # Basic tasks
integration-tests:
runs-on: ubuntu-latest-4-cores # CPU intensive
Pipeline Monitoring
Metrics Collection:
- name: Record metrics
run: |
echo "START_TIME=$(date +%s)" >> $GITHUB_ENV
curl -X POST "$METRICS_ENDPOINT" \
-d "pipeline_start=$(date +%s)" \
-d "repo=${{ github.repository }}"
- name: Report completion
if: always()
run: |
DURATION=$(($(date +%s) - $START_TIME))
curl -X POST "$METRICS_ENDPOINT" \
-d "duration=$DURATION" \
-d "status=${{ job.status }}"
Key Metrics: - Lead Time: Commit → production - Deploy Frequency: Daily deployments - Change Failure Rate: <5% rollbacks - MTTR: <1 hour recovery
Failure Alerts:
- name: Failure notification
if: failure()
uses: 8398a7/action-slack@v3
with:
status: failure
channel: '#dev-alerts'
message: |
🚨 Pipeline failed: ${{ github.repository }}
📊 **Failure Context:**
- **Commit**: ${{ github.sha }} by ${{ github.actor }}
- **Branch**: ${{ github.ref_name }}
- **Failed Job**: ${{ github.job }}
- **Failure Rate**: Check if this is a pattern or one-off
🔗 **Quick Actions:**
- [View Logs](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})
- [Compare Changes](${{ github.event.compare }})
- [Rollback Procedure](https://wiki.company.com/rollback)
Pipeline Health Dashboards
Essential Dashboard Widgets:
Pipeline success rate by repository (rolling 7 days)
Average build duration trends
Most frequent failure causes and remediation times
Developer productivity metrics (PRs merged per developer per week)
Infrastructure costs by repository and team
Implementation using GitHub APIs:
# Pipeline metrics collection script
import requests
from datetime import datetime, timedelta
def collect_pipeline_metrics(repo, token):
"""Collect pipeline metrics for dashboard."""
headers = {'Authorization': f'token {token}'}
url = f"https://api.github.com/repos/{repo}/actions/runs"
# Get last 100 workflow runs
response = requests.get(url, headers=headers, params={'per_page': 100})
runs = response.json()['workflow_runs']
# Calculate metrics
total_runs = len(runs)
successful_runs = len([r for r in runs if r['conclusion'] == 'success'])
success_rate = (successful_runs / total_runs) * 100
# Average duration for successful runs
durations = [
(datetime.fromisoformat(r['updated_at'].replace('Z', '+00:00')) -
datetime.fromisoformat(r['created_at'].replace('Z', '+00:00'))).total_seconds()
for r in runs if r['conclusion'] == 'success'
]
avg_duration = sum(durations) / len(durations) if durations else 0
return {
'success_rate': success_rate,
'average_duration_minutes': avg_duration / 60,
'total_runs': total_runs
}
Infrastructure Pipeline Integration
Treating Infrastructure as Code in Pipelines
Modern applications don’t deploy in isolation - they require databases, load balancers, monitoring systems, and cloud resources. Production-ready pipelines automate infrastructure changes alongside application deployments.
Infrastructure-Aware Deployment Patterns
Terraform Integration Example:
# Infrastructure deployment as part of application pipeline
deploy-infrastructure:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.0
- name: Terraform Plan
working-directory: ./infrastructure
env:
TF_VAR_app_version: ${{ github.sha }}
run: |
terraform init
terraform plan -out=tfplan
- name: Terraform Apply
if: github.ref == 'refs/heads/main'
working-directory: ./infrastructure
run: terraform apply -auto-approve tfplan
- name: Output infrastructure details
run: |
echo "DATABASE_URL=$(terraform output -raw database_url)" >> $GITHUB_ENV
echo "API_ENDPOINT=$(terraform output -raw api_endpoint)" >> $GITHUB_ENV
Database Migration Integration:
# Safe database migrations in pipelines
database-migration:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
# Always backup before migrations
- name: Create database backup
run: |
pg_dump $DATABASE_URL > backup-$(date +%Y%m%d-%H%M%S).sql
aws s3 cp backup-*.sql s3://backups-bucket/
# Run migrations with rollback capability
- name: Run database migrations
run: |
# Run migrations
uv run python manage.py migrate
# Verify migration success
if uv run python manage.py showmigrations --plan | grep -q "UNAPPLIED"; then
echo " Migrations failed - rolling back"
uv run python manage.py migrate --fake-initial
exit 1
fi
echo " Database migrations completed successfully"
Container Orchestration Integration
Kubernetes Deployment with Health Checks:
deploy-kubernetes:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
kubeconfig: ${{ secrets.KUBE_CONFIG }}
- name: Deploy to Kubernetes
run: |
# Apply configuration changes
kubectl apply -f k8s/
# Update deployment with new image
kubectl set image deployment/myapp \
myapp=myregistry/myapp:${{ github.sha }}
# Wait for rollout to complete
kubectl rollout status deployment/myapp --timeout=300s
- name: Verify deployment health
run: |
# Check pod status
kubectl get pods -l app=myapp
# Verify health check endpoints
API_URL=$(kubectl get service myapp -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
curl -f http://$API_URL/health || exit 1
echo " Deployment successful and healthy"
Multi-Environment Infrastructure Management
Environment-Specific Infrastructure Patterns:
# Environment promotion with infrastructure validation
deploy:
strategy:
matrix:
environment: [staging, production]
include:
- environment: staging
tf_workspace: staging
cluster_name: staging-cluster
requires_approval: false
- environment: production
tf_workspace: production
cluster_name: prod-cluster
requires_approval: true
environment: ${{ matrix.environment }}
runs-on: ubuntu-latest
steps:
# Infrastructure provisioning
- name: Deploy infrastructure
working-directory: ./terraform
env:
TF_WORKSPACE: ${{ matrix.tf_workspace }}
run: |
terraform init
terraform plan -var="environment=${{ matrix.environment }}"
terraform apply -auto-approve
# Application deployment to infrastructure
- name: Deploy application
env:
CLUSTER_NAME: ${{ matrix.cluster_name }}
run: |
aws eks update-kubeconfig --name $CLUSTER_NAME
helm upgrade --install myapp ./helm-chart \
--set image.tag=${{ github.sha }} \
--set environment=${{ matrix.environment }}
Disaster Recovery & Compliance
Pipeline Resilience Patterns
Production pipelines must handle failures gracefully and provide audit trails for compliance requirements.
Automated Rollback Strategies
# Automated rollback on deployment failure
deploy-with-rollback:
runs-on: ubuntu-latest
environment: production
steps:
- name: Record current version
run: |
CURRENT_VERSION=$(kubectl get deployment myapp -o jsonpath='{.spec.template.spec.containers[0].image}')
echo "PREVIOUS_VERSION=$CURRENT_VERSION" >> $GITHUB_ENV
- name: Deploy new version
id: deploy
run: |
kubectl set image deployment/myapp myapp=myregistry/myapp:${{ github.sha }}
kubectl rollout status deployment/myapp --timeout=300s
- name: Health check post-deployment
id: health_check
run: |
sleep 30 # Allow time for startup
curl -f $API_ENDPOINT/health || exit 1
# Check error rates in monitoring
ERROR_RATE=$(curl -s "$METRICS_API/error_rate?minutes=5")
if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )); then
echo " Error rate too high: $ERROR_RATE"
exit 1
fi
- name: Automatic rollback on failure
if: failure() && (steps.deploy.outcome == 'success' || steps.health_check.outcome == 'failure')
run: |
echo " Initiating automatic rollback to $PREVIOUS_VERSION"
kubectl set image deployment/myapp myapp=$PREVIOUS_VERSION
kubectl rollout status deployment/myapp --timeout=300s
# Notify team of rollback
curl -X POST $SLACK_WEBHOOK \
-d "payload={'text': '🚨 Auto-rollback triggered for ${{ github.repository }}. Previous version restored.'}"
Compliance and Audit Patterns
SOX/SOC2 Compliance Example:
# Compliance-focused deployment with audit trail
compliance-deployment:
runs-on: ubuntu-latest
environment: production
steps:
# Approval gate for production changes
- name: Wait for deployment approval
uses: trstringer/manual-approval@v1
with:
secret: ${{ secrets.GITHUB_TOKEN }}
approvers: user1,user2,user3
minimum-approvals: 2
issue-title: "Production Deployment: ${{ github.ref_name }}"
# Create audit record
- name: Log deployment attempt
run: |
curl -X POST "$AUDIT_API/deployments" \
-H "Authorization: Bearer ${{ secrets.AUDIT_TOKEN }}" \
-d '{
"timestamp": "'$(date -Iseconds)'",
"repository": "${{ github.repository }}",
"commit_sha": "${{ github.sha }}",
"deployer": "${{ github.actor }}",
"environment": "production",
"status": "initiated"
}'
# Deployment with security validation
- name: Security scan before deployment
run: |
# Container image security scan
trivy image myregistry/myapp:${{ github.sha }} \
--severity HIGH,CRITICAL \
--exit-code 1
# Dependency vulnerability check
uv run safety check --audit-and-monitor
- name: Deploy with verification
run: |
# Deploy with gradual rollout
kubectl patch deployment myapp -p '{
"spec": {
"strategy": {
"rollingUpdate": {
"maxUnavailable": 1,
"maxSurge": 1
}
},
"template": {
"spec": {
"containers": [{
"name": "myapp",
"image": "myregistry/myapp:${{ github.sha }}"
}]
}
}
}
}'
# Record successful deployment
- name: Log deployment success
if: success()
run: |
curl -X POST "$AUDIT_API/deployments" \
-H "Authorization: Bearer ${{ secrets.AUDIT_TOKEN }}" \
-d '{
"timestamp": "'$(date -Iseconds)'",
"commit_sha": "${{ github.sha }}",
"status": "completed",
"deployment_id": "'$GITHUB_RUN_ID'"
}'
Optimize pipeline speed with caching
Set up monitoring and alerts
Document troubleshooting procedures
Train team on CI/CD best practices