10. Infrastructure as Code - Introduction
Learning Objectives
By the end of this chapter, you will be able to:
Define Infrastructure as Code (IaC) principles and understand its business value
Compare different IaC approaches: declarative vs imperative, provisioning vs configuration
Identify when to use Terraform vs Ansible and how they complement each other
Design infrastructure workflows that combine provisioning and configuration management
Apply IaC best practices for version control, testing, and collaboration
Implement security and compliance strategies in infrastructure code
Understand the modern DevOps toolchain and where IaC fits
Prerequisites: Basic understanding of cloud computing concepts, version control (Git), and command-line operations.
Chapter Scope: This chapter provides a comprehensive foundation for Infrastructure as Code, covering both infrastructure provisioning (Terraform) and configuration management (Ansible) with hands-on examples using Google Cloud Platform.
What is Infrastructure as Code (IaC)?
Infrastructure as Code (IaC) is the practice of managing and provisioning computing infrastructure through machine-readable definition files, rather than through physical hardware configuration or interactive configuration tools.
The Traditional Problem
Before IaC, infrastructure management was typically:
Manual: Clicking through web consoles or running ad-hoc commands
Error-prone: Human mistakes in configuration and deployment
Inconsistent: Different configurations across development, staging, and production
Undocumented: No clear record of how infrastructure was configured
Slow: Hours or days to provision new environments
Difficult to scale: Hard to replicate successful configurations
The IaC Solution
Infrastructure as Code addresses these challenges by treating infrastructure the same way we treat application code:
Traditional Approach:
Manual Console → Click, Click, Click → Infrastructure
IaC Approach:
Code → Version Control → Automated Deployment → Infrastructure
Core IaC Principles:
Declarative Configuration
Specify the desired end state, not the steps to achieve it:
# Declare what you want resource "google_compute_instance" "web_server" { name = "production-web" machine_type = "e2-standard-2" zone = "us-central1-a" } # Tool figures out how to create it
Version Control Integration
Infrastructure definitions are stored in Git alongside application code:
git log --oneline infrastructure/ a1b2c3d Add load balancer for high availability e4f5g6h Update instance types for cost optimization h7i8j9k Initial infrastructure setup
Automated Testing and Deployment
Infrastructure changes go through the same CI/CD process as code:
# .github/workflows/infrastructure.yml - name: Validate Infrastructure run: terraform validate - name: Plan Changes run: terraform plan - name: Apply on Approval run: terraform apply -auto-approve
Idempotency
Running the same configuration multiple times produces the same result:
terraform apply # Creates resources terraform apply # No changes needed terraform apply # Still no changes needed
Business Benefits of IaC:
Business Value |
Technical Implementation |
|---|---|
Faster Time-to- Market |
Automated provisioning reduces deployment time from days to minutes |
Cost Optimization |
Infrastructure as code enables right-sizing and automatic scaling |
Risk Reduction |
Consistent, tested configurations eliminate human error |
Compliance |
Auditable infrastructure changes with full history and approval process |
Disaster Recovery |
Infrastructure can be rebuilt quickly from code definitions |
IaC Tool Categories and Use Cases
Infrastructure as Code tools fall into several categories, each optimized for different tasks:
1. Infrastructure Provisioning Tools
These tools create and manage cloud resources (servers, networks, databases):
Terraform/OpenTofu: Multi-cloud, declarative, state-managed
SST/Pulumi: Multi-cloud, imperative, programming languages
AWS CloudFormation: AWS-specific, JSON/YAML
Google Cloud Deployment Manager: GCP-specific
Azure Resource Manager: Azure-specific
2. Configuration Management Tools
These tools configure software and settings on existing infrastructure:
Ansible: Agentless, push-based, YAML playbooks
Chef: Agent-based, Ruby DSL, pull-based
Puppet: Agent-based, declarative, pull-based
SaltStack: Agent-based, Python, event-driven
3. Container Orchestration
These tools manage containerized applications:
Kubernetes: Container orchestration platform
Docker Compose: Local multi-container applications
Helm: Kubernetes package manager
4. Immutable Infrastructure Tools
These tools create complete, immutable system images:
Packer: Multi-platform image building
Docker: Container images
Vagrant: Development environment provisioning
Infrastructure Provisioning: Terraform vs Pulumi
When choosing an infrastructure provisioning tool, Terraform and Pulumi are the two leading options. Here’s a comprehensive comparison:
Terraform: Declarative HCL-Based Provisioning
Aspect |
Terraform |
|---|---|
Language |
HashiCorp Configuration Language (HCL) |
Approach |
Declarative configuration files |
State Management |
Built-in state tracking (.tfstate) |
Provider Ecosystem |
3000+ providers, mature ecosystem |
Learning Curve |
Moderate, domain-specific language |
Team Adoption |
Easy for ops teams, new syntax |
Version Control |
Native support, diff-friendly |
Testing |
Limited, requires external tools |
Pulumi: General-Purpose Programming Languages
Aspect |
Pulumi |
|---|---|
Language |
Python, TypeScript, Go, C#, Java |
Approach |
Imperative programming with SDK |
State Management |
Pulumi Cloud or self-managed backend |
Provider Ecosystem |
Growing, leverages Terraform providers |
Learning Curve |
Easy for developers, familiar syntax |
Team Adoption |
Great for dev teams, programming model |
Version Control |
Standard programming practices |
Testing |
Full unit testing capabilities |
Detailed Feature Comparison:
Configuration Syntax:
# Terraform HCL
resource "google_compute_instance" "web_server" {
name = "web-server"
machine_type = "e2-standard-2"
zone = "us-central1-a"
boot_disk {
initialize_params {
image = "ubuntu-os-cloud/ubuntu-2204-lts"
}
}
network_interface {
network = google_compute_network.main.id
access_config {}
}
count = var.instance_count
}
# Pulumi Python
import pulumi_gcp as gcp
web_server = gcp.compute.Instance("web-server",
machine_type="e2-standard-2",
zone="us-central1-a",
boot_disk=gcp.compute.InstanceBootDiskArgs(
initialize_params=gcp.compute.InstanceBootDiskInitializeParamsArgs(
image="ubuntu-os-cloud/ubuntu-2204-lts"
)
),
network_interfaces=[gcp.compute.InstanceNetworkInterfaceArgs(
network=main_network.id,
access_configs=[gcp.compute.InstanceNetworkInterfaceAccessConfigArgs()]
)]
)
When to Choose Terraform:
Operations-focused teams comfortable with configuration languages
Multi-cloud environments requiring extensive provider support
Established workflows with existing Terraform expertise
Regulatory environments requiring mature, battle-tested solutions
Simple to moderate complexity infrastructure requirements
Strong community ecosystem needs (modules, examples)
When to Choose Pulumi:
Developer-heavy teams familiar with modern programming languages
Complex infrastructure logic requiring loops, conditionals, functions
Testing requirements needing unit tests for infrastructure code
Dynamic infrastructure based on external data sources or APIs
Integration needs with existing CI/CD and development workflows
Type safety requirements and IDE support (IntelliSense, debugging)
Real-World Example: Dynamic Infrastructure
Terraform approach (limited dynamic capabilities):
# Terraform - requires external data sources
data "external" "regions" {
program = ["python", "get-regions.py"]
}
resource "google_compute_instance" "regional_servers" {
for_each = toset(jsondecode(data.external.regions.result.regions))
name = "server-${each.key}"
zone = "${each.key}-a"
machine_type = "e2-micro"
}
Pulumi approach (native programming logic):
# Pulumi - native Python logic
import requests
import pulumi_gcp as gcp
# Fetch regions dynamically
response = requests.get("https://api.example.com/regions")
regions = response.json()["regions"]
# Create instances with complex logic
servers = []
for region in regions:
if region["capacity"] > 100: # Complex conditional logic
instance_type = "e2-standard-2"
else:
instance_type = "e2-micro"
server = gcp.compute.Instance(f"server-{region['name']}",
machine_type=instance_type,
zone=f"{region['name']}-a",
# Complex configuration based on region data
metadata={
"region-tier": region["tier"],
"compliance-zone": region["compliance_requirements"]
}
)
servers.append(server)
Migration Considerations:
Migration Path |
Considerations |
|---|---|
Terraform → Pulumi |
Import existing state Rewrite configurations in code Team training on programming model |
Pulumi → Terraform |
Export state to Terraform format Convert code to HCL Simplify complex logic |
Configuration Management: Ansible vs Puppet
For configuration management, Ansible and Puppet represent different philosophies and operational models:
Ansible: Agentless Push-Based Management
Aspect |
Ansible |
|---|---|
Architecture |
Agentless, SSH-based communication |
Execution Model |
Push-based from control node |
Language |
YAML playbooks, Python modules |
Learning Curve |
Easy, human-readable YAML |
State Management |
Stateless, no central database |
Deployment |
Simple, no agent installation |
Scalability |
Good for small-medium environments |
Real-time Enforcement |
Manual or scheduled execution |
Puppet: Agent-Based Pull-Based Management
Aspect |
Puppet |
|---|---|
Architecture |
Agent-based with central Puppet Master |
Execution Model |
Pull-based, agents check for changes |
Language |
Puppet DSL (Domain Specific Language) |
Learning Curve |
Steep, requires learning Puppet DSL |
State Management |
Central PuppetDB with full state info |
Deployment |
Complex, requires agent on every node |
Scalability |
Excellent for large environments |
Real-time Enforcement |
Continuous, automatic drift correction |
Architectural Differences:
Ansible Architecture (Push Model):
┌─────────────────┐ SSH/WinRM ┌─────────────────┐
│ Control Node │ ───────────────► │ Managed Node │
│ │ │ │
│ • Playbooks │ ───────────────► │ • Python │
│ • Inventory │ Execute │ • Target Apps │
│ • Vault │ Tasks │ • No Agent │
└─────────────────┘ └─────────────────┘
Puppet Architecture (Pull Model):
┌─────────────────┐ ┌─────────────────┐
│ Puppet Master │ ◄─────────────── │ Managed Node │
│ │ Request │ │
│ • Manifests │ Catalog │ • Puppet Agent │
│ • Modules │ ────────────────► │ • Facter │
│ • PuppetDB │ Send Catalog │ • Target Apps │
└─────────────────┘ └─────────────────┘
Configuration Syntax Comparison:
Ansible YAML Playbook:
---
- name: Configure web server
hosts: web_servers
become: true
vars:
nginx_port: 80
app_name: "my-web-app"
tasks:
- name: Install Nginx
package:
name: nginx
state: present
- name: Configure Nginx
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify: restart nginx
- name: Start and enable Nginx
service:
name: nginx
state: started
enabled: yes
handlers:
- name: restart nginx
service:
name: nginx
state: restarted
Puppet Manifest:
# manifests/webserver.pp
class webserver (
$nginx_port = 80,
$app_name = 'my-web-app',
) {
package { 'nginx':
ensure => present,
}
file { '/etc/nginx/nginx.conf':
ensure => file,
content => template('webserver/nginx.conf.erb'),
require => Package['nginx'],
notify => Service['nginx'],
}
service { 'nginx':
ensure => running,
enable => true,
require => Package['nginx'],
subscribe => File['/etc/nginx/nginx.conf'],
}
}
node 'web-server-01.example.com' {
include webserver
}
Operational Model Comparison:
Ansible Execution Flow:
# Manual execution from control node
ansible-playbook -i inventory playbook.yml
# Scheduled execution (cron)
0 2 * * * /usr/bin/ansible-playbook -i /etc/ansible/inventory /etc/ansible/maintenance.yml
# Event-driven execution (CI/CD triggered)
# Runs when code changes or infrastructure events occur
Puppet Execution Flow:
# Automatic agent runs (default every 30 minutes)
# Puppet agent automatically contacts master
puppet agent --test
# Continuous enforcement
# Agents continuously ensure desired state
# Automatic drift correction without human intervention
When to Choose Ansible:
Simple to moderate environments (< 1000 nodes)
Agentless architecture preferred (security, simplicity)
Ad-hoc task execution and orchestration workflows
Developer-friendly teams comfortable with YAML
Integration with existing SSH infrastructure
Event-driven automation (CI/CD pipelines)
Multi-platform environments with diverse systems
Quick setup and deployment requirements
When to Choose Puppet:
Large-scale environments (1000+ nodes)
Continuous compliance and drift correction needs
Enterprise governance and reporting requirements
Dedicated operations teams with configuration management expertise
Complex dependency management and ordering requirements
Centralized policy enforcement and auditing
High availability and disaster recovery needs
Long-term infrastructure lifecycle management
Hybrid Approaches:
Many organizations use both tools in complementary ways:
Common Hybrid Pattern:
1. Terraform provisions infrastructure
2. Ansible performs initial configuration and application deployment
3. Puppet maintains ongoing configuration compliance
4. Ansible handles application updates and orchestration
Migration Strategies:
Migration Path |
Strategy |
|---|---|
Puppet → Ansible |
|
Ansible → Puppet |
|
Decision Matrix:
Requirement |
Ansible |
Puppet |
|---|---|---|
Quick Setup |
Excellent |
Complex |
Large Scale (5000+) |
Challenging |
Excellent |
Continuous Compliance |
Manual |
Automatic |
Learning Curve |
Easy |
Steep |
Agent Requirements |
Agentless |
Agent Required |
Orchestration |
Excellent |
Limited |
Reporting/Auditing |
Basic |
Comprehensive |
The Modern IaC Stack
In practice, organizations use multiple tools together:
Layer 4: Applications
├── Kubernetes (Container Orchestration)
├── Helm (Package Management)
Layer 3: Configuration Management
├── Ansible (Software Installation & Configuration)
├── Docker (Application Packaging)
Layer 2: Infrastructure Provisioning
├── Terraform (Cloud Resources)
├── Packer (Machine Images)
Layer 1: Foundation
├── Git (Version Control)
├── CI/CD (Automated Deployment)
Terraform vs Ansible: When to Use Each
Understanding when to use Terraform versus Ansible is crucial for building effective IaC workflows:
Terraform: Infrastructure Provisioning
What Terraform Does |
Example Use Cases |
|---|---|
Creates cloud resources |
|
Manages resource lifecycle |
|
Handles dependencies |
|
Ansible: Configuration Management
What Ansible Does |
Example Use Cases |
|---|---|
Configures existing infrastructure |
|
Deploys applications |
|
Orchestrates procedures |
|
Typical Workflow: Terraform + Ansible
# Step 1: Provision infrastructure with Terraform
cd infrastructure/
terraform init
terraform plan
terraform apply
# Step 2: Configure the infrastructure with Ansible
cd ../configuration/
ansible-playbook -i gcp_inventory.yml site.yml
# Step 3: Deploy applications
ansible-playbook -i gcp_inventory.yml deploy.yml
Example: Web Application Deployment
Terraform handles the “WHAT” (what infrastructure exists):
# Create the infrastructure
resource "google_compute_instance" "web_servers" {
count = 3
name = "web-server-${count.index + 1}"
machine_type = "e2-standard-2"
zone = "us-central1-a"
boot_disk {
initialize_params {
image = "ubuntu-os-cloud/ubuntu-2204-lts"
}
}
network_interface {
network = google_compute_network.main.id
access_config {}
}
}
Ansible handles the “HOW” (how software is configured):
# Configure the infrastructure
- name: Configure web servers
hosts: web_servers
become: true
tasks:
- name: Install Nginx
apt:
name: nginx
state: present
update_cache: yes
- name: Configure Nginx
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify: restart nginx
- name: Deploy application
copy:
src: "{{ app_files }}"
dest: /var/www/html/
IaC Best Practices Overview
1. Version Control Everything
project/
├── infrastructure/ # Terraform code
│ ├── main.tf
│ ├── variables.tf
│ └── terraform.tfvars
├── configuration/ # Ansible playbooks
│ ├── site.yml
│ ├── inventory/
│ └── roles/
└── .github/workflows/ # CI/CD pipelines
└── deploy.yml
2. Environment Separation
# Use workspaces or separate state files
terraform workspace select production
terraform workspace select staging
terraform workspace select development
3. Modular Design
# Use modules for reusability
module "web_servers" {
source = "./modules/compute"
instance_count = var.web_server_count
machine_type = var.web_machine_type
environment = var.environment
}
4. Security and Secrets Management
# Never commit secrets to version control
echo "*.tfvars" >> .gitignore
echo "secrets/" >> .gitignore
# Use secure secret management
ansible-vault create secrets.yml
5. Testing and Validation
# Validate before applying
terraform validate
terraform plan
# Test Ansible playbooks
ansible-playbook --check --diff site.yml
Chapter Structure and Learning Approach
This Infrastructure as Code section is organized into focused chapters that build upon each other:
Part A: Terraform - Infrastructure Provisioning
10.1 Terraform Introduction: Understanding declarative infrastructure
10.2 Terraform Core Concepts: Resources, variables, and state management
10.3 Terraform Workflow & GCP: Hands-on Google Cloud integration
10.4 Terraform Production Challenges: Real-world problems and solutions
10.5 Terraform Practical Examples: 14 comprehensive GCP examples
Part B: Ansible - Configuration Management
10.6 Ansible Introduction: Agentless configuration management
10.7 Ansible Core Concepts: Playbooks, roles, and inventory
10.8 Ansible Advanced Features: Templating, vaults, and orchestration
10.9 Ansible Production Patterns: Best practices and real-world examples
Learning Methodology:
Conceptual Understanding: Each chapter starts with theory and principles
Hands-on Examples: Practical examples using Google Cloud Platform
Production Readiness: Real-world challenges and enterprise patterns
Best Practices: Security, maintainability, and team collaboration
Prerequisites for Success:
GCP Account: Free tier provides sufficient resources for all examples
Local Development Environment: VS Code, Git, and terminal access
Basic Cloud Knowledge: Understanding of VMs, networks, and databases
Command Line Comfort: Ability to run commands and navigate directories
Next Steps:
Begin with Chapter 10.1: Terraform Introduction to start your Infrastructure as Code journey. Each chapter includes:
Theoretical concepts with clear explanations
Step-by-step practical examples
Production-ready code templates
Troubleshooting guides and common pitfalls
Review questions and further reading
Note
Infrastructure as Code is a journey, not a destination. Start with simple examples and gradually build complexity as you become more comfortable with the tools and concepts.