10. Infrastructure as Code - Introduction

Learning Objectives

By the end of this chapter, you will be able to:

Define Infrastructure as Code (IaC) principles and understand its business value
Compare different IaC approaches: declarative vs imperative, provisioning vs configuration
Identify when to use Terraform vs Ansible and how they complement each other
Design infrastructure workflows that combine provisioning and configuration management
Apply IaC best practices for version control, testing, and collaboration
Implement security and compliance strategies in infrastructure code
Understand the modern DevOps toolchain and where IaC fits

Prerequisites: Basic understanding of cloud computing concepts, version control (Git), and command-line operations.

Chapter Scope: This chapter provides a comprehensive foundation for Infrastructure as Code, covering both infrastructure provisioning (Terraform) and configuration management (Ansible) with hands-on examples using Google Cloud Platform.

What is Infrastructure as Code (IaC)?

Infrastructure as Code (IaC) is the practice of managing and provisioning computing infrastructure through machine-readable definition files, rather than through physical hardware configuration or interactive configuration tools.

The Traditional Problem

Before IaC, infrastructure management was typically:

Manual: Clicking through web consoles or running ad-hoc commands
Error-prone: Human mistakes in configuration and deployment
Inconsistent: Different configurations across development, staging, and production
Undocumented: No clear record of how infrastructure was configured
Slow: Hours or days to provision new environments
Difficult to scale: Hard to replicate successful configurations

The IaC Solution

Infrastructure as Code addresses these challenges by treating infrastructure the same way we treat application code:

Traditional Approach:
Manual Console → Click, Click, Click → Infrastructure

IaC Approach:
Code → Version Control → Automated Deployment → Infrastructure

Core IaC Principles:

Declarative Configuration

Specify the desired end state, not the steps to achieve it:

# Declare what you want
resource "google_compute_instance" "web_server" {
  name         = "production-web"
  machine_type = "e2-standard-2"
  zone         = "us-central1-a"
}

# Tool figures out how to create it

Version Control Integration

Infrastructure definitions are stored in Git alongside application code:

git log --oneline infrastructure/
a1b2c3d Add load balancer for high availability
e4f5g6h Update instance types for cost optimization
h7i8j9k Initial infrastructure setup

Automated Testing and Deployment

Infrastructure changes go through the same CI/CD process as code:

# .github/workflows/infrastructure.yml
- name: Validate Infrastructure
  run: terraform validate

- name: Plan Changes
  run: terraform plan

- name: Apply on Approval
  run: terraform apply -auto-approve

Idempotency

Running the same configuration multiple times produces the same result:

terraform apply  # Creates resources
terraform apply  # No changes needed
terraform apply  # Still no changes needed

Business Benefits of IaC:

Business Value	Technical Implementation
Faster Time-to- Market	Automated provisioning reduces deployment time from days to minutes
Cost Optimization	Infrastructure as code enables right-sizing and automatic scaling
Risk Reduction	Consistent, tested configurations eliminate human error
Compliance	Auditable infrastructure changes with full history and approval process
Disaster Recovery	Infrastructure can be rebuilt quickly from code definitions

IaC Tool Categories and Use Cases

Infrastructure as Code tools fall into several categories, each optimized for different tasks:

1. Infrastructure Provisioning Tools

These tools create and manage cloud resources (servers, networks, databases):

Terraform/OpenTofu: Multi-cloud, declarative, state-managed
SST/Pulumi: Multi-cloud, imperative, programming languages
AWS CloudFormation: AWS-specific, JSON/YAML
Google Cloud Deployment Manager: GCP-specific
Azure Resource Manager: Azure-specific

2. Configuration Management Tools

These tools configure software and settings on existing infrastructure:

Ansible: Agentless, push-based, YAML playbooks
Chef: Agent-based, Ruby DSL, pull-based
Puppet: Agent-based, declarative, pull-based
SaltStack: Agent-based, Python, event-driven

3. Container Orchestration

These tools manage containerized applications:

Kubernetes: Container orchestration platform
Docker Compose: Local multi-container applications
Helm: Kubernetes package manager

4. Immutable Infrastructure Tools

These tools create complete, immutable system images:

Packer: Multi-platform image building
Docker: Container images
Vagrant: Development environment provisioning

Infrastructure Provisioning: Terraform vs Pulumi

When choosing an infrastructure provisioning tool, Terraform and Pulumi are the two leading options. Here’s a comprehensive comparison:

Terraform: Declarative HCL-Based Provisioning

Aspect	Terraform
Language	HashiCorp Configuration Language (HCL)
Approach	Declarative configuration files
State Management	Built-in state tracking (.tfstate)
Provider Ecosystem	3000+ providers, mature ecosystem
Learning Curve	Moderate, domain-specific language
Team Adoption	Easy for ops teams, new syntax
Version Control	Native support, diff-friendly
Testing	Limited, requires external tools

Pulumi: General-Purpose Programming Languages

Aspect	Pulumi
Language	Python, TypeScript, Go, C#, Java
Approach	Imperative programming with SDK
State Management	Pulumi Cloud or self-managed backend
Provider Ecosystem	Growing, leverages Terraform providers
Learning Curve	Easy for developers, familiar syntax
Team Adoption	Great for dev teams, programming model
Version Control	Standard programming practices
Testing	Full unit testing capabilities

Detailed Feature Comparison:

Configuration Syntax:

# Terraform HCL
resource "google_compute_instance" "web_server" {
  name         = "web-server"
  machine_type = "e2-standard-2"
  zone         = "us-central1-a"

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-lts"
    }
  }

  network_interface {
    network = google_compute_network.main.id
    access_config {}
  }

  count = var.instance_count
}

# Pulumi Python
import pulumi_gcp as gcp

web_server = gcp.compute.Instance("web-server",
    machine_type="e2-standard-2",
    zone="us-central1-a",
    boot_disk=gcp.compute.InstanceBootDiskArgs(
        initialize_params=gcp.compute.InstanceBootDiskInitializeParamsArgs(
            image="ubuntu-os-cloud/ubuntu-2204-lts"
        )
    ),
    network_interfaces=[gcp.compute.InstanceNetworkInterfaceArgs(
        network=main_network.id,
        access_configs=[gcp.compute.InstanceNetworkInterfaceAccessConfigArgs()]
    )]
)

When to Choose Terraform:

Operations-focused teams comfortable with configuration languages
Multi-cloud environments requiring extensive provider support
Established workflows with existing Terraform expertise
Regulatory environments requiring mature, battle-tested solutions
Simple to moderate complexity infrastructure requirements
Strong community ecosystem needs (modules, examples)

When to Choose Pulumi:

Developer-heavy teams familiar with modern programming languages
Complex infrastructure logic requiring loops, conditionals, functions
Testing requirements needing unit tests for infrastructure code
Dynamic infrastructure based on external data sources or APIs
Integration needs with existing CI/CD and development workflows
Type safety requirements and IDE support (IntelliSense, debugging)

Real-World Example: Dynamic Infrastructure

Terraform approach (limited dynamic capabilities):

# Terraform - requires external data sources
data "external" "regions" {
  program = ["python", "get-regions.py"]
}

resource "google_compute_instance" "regional_servers" {
  for_each = toset(jsondecode(data.external.regions.result.regions))

  name         = "server-${each.key}"
  zone         = "${each.key}-a"
  machine_type = "e2-micro"
}

Pulumi approach (native programming logic):

# Pulumi - native Python logic
import requests
import pulumi_gcp as gcp

# Fetch regions dynamically
response = requests.get("https://api.example.com/regions")
regions = response.json()["regions"]

# Create instances with complex logic
servers = []
for region in regions:
    if region["capacity"] > 100:  # Complex conditional logic
        instance_type = "e2-standard-2"
    else:
        instance_type = "e2-micro"

    server = gcp.compute.Instance(f"server-{region['name']}",
        machine_type=instance_type,
        zone=f"{region['name']}-a",
        # Complex configuration based on region data
        metadata={
            "region-tier": region["tier"],
            "compliance-zone": region["compliance_requirements"]
        }
    )
    servers.append(server)

Migration Considerations:

Migration Path	Considerations
Terraform → Pulumi	Import existing state Rewrite configurations in code Team training on programming model
Pulumi → Terraform	Export state to Terraform format Convert code to HCL Simplify complex logic

Configuration Management: Ansible vs Puppet

For configuration management, Ansible and Puppet represent different philosophies and operational models:

Ansible: Agentless Push-Based Management

Aspect	Ansible
Architecture	Agentless, SSH-based communication
Execution Model	Push-based from control node
Language	YAML playbooks, Python modules
Learning Curve	Easy, human-readable YAML
State Management	Stateless, no central database
Deployment	Simple, no agent installation
Scalability	Good for small-medium environments
Real-time Enforcement	Manual or scheduled execution

Puppet: Agent-Based Pull-Based Management

Aspect	Puppet
Architecture	Agent-based with central Puppet Master
Execution Model	Pull-based, agents check for changes
Language	Puppet DSL (Domain Specific Language)
Learning Curve	Steep, requires learning Puppet DSL
State Management	Central PuppetDB with full state info
Deployment	Complex, requires agent on every node
Scalability	Excellent for large environments
Real-time Enforcement	Continuous, automatic drift correction

Architectural Differences:

Ansible Architecture (Push Model):

┌─────────────────┐     SSH/WinRM     ┌─────────────────┐
│  Control Node   │ ───────────────►  │   Managed Node  │
│                 │                   │                 │
│ • Playbooks     │ ───────────────►  │ • Python        │
│ • Inventory     │     Execute       │ • Target Apps   │
│ • Vault         │     Tasks         │ • No Agent      │
└─────────────────┘                   └─────────────────┘

Puppet Architecture (Pull Model):

┌─────────────────┐                   ┌─────────────────┐
│  Puppet Master  │ ◄───────────────  │   Managed Node  │
│                 │   Request         │                 │
│ • Manifests     │   Catalog         │ • Puppet Agent  │
│ • Modules       │ ────────────────► │ • Facter        │
│ • PuppetDB      │   Send Catalog    │ • Target Apps   │
└─────────────────┘                   └─────────────────┘

Configuration Syntax Comparison:

Ansible YAML Playbook:

---
- name: Configure web server
  hosts: web_servers
  become: true

  vars:
    nginx_port: 80
    app_name: "my-web-app"

  tasks:
    - name: Install Nginx
      package:
        name: nginx
        state: present

    - name: Configure Nginx
      template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
      notify: restart nginx

    - name: Start and enable Nginx
      service:
        name: nginx
        state: started
        enabled: yes

  handlers:
    - name: restart nginx
      service:
        name: nginx
        state: restarted

Puppet Manifest:

# manifests/webserver.pp
class webserver (
  $nginx_port = 80,
  $app_name = 'my-web-app',
) {

  package { 'nginx':
    ensure => present,
  }

  file { '/etc/nginx/nginx.conf':
    ensure  => file,
    content => template('webserver/nginx.conf.erb'),
    require => Package['nginx'],
    notify  => Service['nginx'],
  }

  service { 'nginx':
    ensure    => running,
    enable    => true,
    require   => Package['nginx'],
    subscribe => File['/etc/nginx/nginx.conf'],
  }
}

node 'web-server-01.example.com' {
  include webserver
}

Operational Model Comparison:

Ansible Execution Flow:

# Manual execution from control node
ansible-playbook -i inventory playbook.yml

# Scheduled execution (cron)
0 2 * * * /usr/bin/ansible-playbook -i /etc/ansible/inventory /etc/ansible/maintenance.yml

# Event-driven execution (CI/CD triggered)
# Runs when code changes or infrastructure events occur

Puppet Execution Flow:

# Automatic agent runs (default every 30 minutes)
# Puppet agent automatically contacts master
puppet agent --test

# Continuous enforcement
# Agents continuously ensure desired state
# Automatic drift correction without human intervention

When to Choose Ansible:

Simple to moderate environments (< 1000 nodes)
Agentless architecture preferred (security, simplicity)
Ad-hoc task execution and orchestration workflows
Developer-friendly teams comfortable with YAML
Integration with existing SSH infrastructure
Event-driven automation (CI/CD pipelines)
Multi-platform environments with diverse systems
Quick setup and deployment requirements

When to Choose Puppet:

Large-scale environments (1000+ nodes)
Continuous compliance and drift correction needs
Enterprise governance and reporting requirements
Dedicated operations teams with configuration management expertise
Complex dependency management and ordering requirements
Centralized policy enforcement and auditing
High availability and disaster recovery needs
Long-term infrastructure lifecycle management

Hybrid Approaches:

Many organizations use both tools in complementary ways:

Common Hybrid Pattern:

Terraform provisions infrastructure
Ansible performs initial configuration and application deployment
Puppet maintains ongoing configuration compliance
Ansible handles application updates and orchestration

Migration Strategies:

Migration Path	Strategy
Puppet → Ansible	Start with new projects in Ansible Gradually convert existing manifests Maintain hybrid during transition
Ansible → Puppet	Implement Puppet for new compliance Keep Ansible for orchestration Focus Puppet on state enforcement

Decision Matrix:

Requirement	Ansible	Puppet
Quick Setup	Excellent	Complex
Large Scale (5000+)	Challenging	Excellent
Continuous Compliance	Manual	Automatic
Learning Curve	Easy	Steep
Agent Requirements	Agentless	Agent Required
Orchestration	Excellent	Limited
Reporting/Auditing	Basic	Comprehensive

The Modern IaC Stack

In practice, organizations use multiple tools together:

Layer 4: Applications
├── Kubernetes (Container Orchestration)
├── Helm (Package Management)

Layer 3: Configuration Management
├── Ansible (Software Installation & Configuration)
├── Docker (Application Packaging)

Layer 2: Infrastructure Provisioning
├── Terraform (Cloud Resources)
├── Packer (Machine Images)

Layer 1: Foundation
├── Git (Version Control)
├── CI/CD (Automated Deployment)

Terraform vs Ansible: When to Use Each

Understanding when to use Terraform versus Ansible is crucial for building effective IaC workflows:

Terraform: Infrastructure Provisioning

What Terraform Does	Example Use Cases
Creates cloud resources	GCP Compute Engine instances VPC networks and subnets Cloud SQL databases Load balancers and firewalls
Manages resource lifecycle	Scaling instance groups up/down Updating firewall rules Destroying unused resources
Handles dependencies	Ensures network exists before VMs Creates database before app servers

Ansible: Configuration Management

What Ansible Does	Example Use Cases
Configures existing infrastructure	Installing Nginx on web servers Configuring SSL certificates Setting up monitoring agents
Deploys applications	Deploying web applications Updating application configurations Rolling application updates
Orchestrates procedures	Multi-step deployment workflows Backup and maintenance tasks Emergency response procedures

Typical Workflow: Terraform + Ansible

# Step 1: Provision infrastructure with Terraform
cd infrastructure/
terraform init
terraform plan
terraform apply

# Step 2: Configure the infrastructure with Ansible
cd ../configuration/
ansible-playbook -i gcp_inventory.yml site.yml

# Step 3: Deploy applications
ansible-playbook -i gcp_inventory.yml deploy.yml

Example: Web Application Deployment

Terraform handles the “WHAT” (what infrastructure exists):

# Create the infrastructure
resource "google_compute_instance" "web_servers" {
  count        = 3
  name         = "web-server-${count.index + 1}"
  machine_type = "e2-standard-2"
  zone         = "us-central1-a"

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-lts"
    }
  }

  network_interface {
    network = google_compute_network.main.id
    access_config {}
  }
}

Ansible handles the “HOW” (how software is configured):

# Configure the infrastructure
- name: Configure web servers
  hosts: web_servers
  become: true

  tasks:
    - name: Install Nginx
      apt:
        name: nginx
        state: present
        update_cache: yes

    - name: Configure Nginx
      template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
      notify: restart nginx

    - name: Deploy application
      copy:
        src: "{{ app_files }}"
        dest: /var/www/html/

IaC Best Practices Overview

1. Version Control Everything

project/
├── infrastructure/          # Terraform code
│   ├── main.tf
│   ├── variables.tf
│   └── terraform.tfvars
├── configuration/           # Ansible playbooks
│   ├── site.yml
│   ├── inventory/
│   └── roles/
└── .github/workflows/       # CI/CD pipelines
    └── deploy.yml

2. Environment Separation

# Use workspaces or separate state files
terraform workspace select production
terraform workspace select staging
terraform workspace select development

3. Modular Design

# Use modules for reusability
module "web_servers" {
  source = "./modules/compute"

  instance_count = var.web_server_count
  machine_type   = var.web_machine_type
  environment    = var.environment
}

4. Security and Secrets Management

# Never commit secrets to version control
echo "*.tfvars" >> .gitignore
echo "secrets/" >> .gitignore

# Use secure secret management
ansible-vault create secrets.yml

5. Testing and Validation

# Validate before applying
terraform validate
terraform plan

# Test Ansible playbooks
ansible-playbook --check --diff site.yml

Chapter Structure and Learning Approach

This Infrastructure as Code section is organized into focused chapters that build upon each other:

Part A: Terraform - Infrastructure Provisioning

10.1 Terraform Introduction: Understanding declarative infrastructure
10.2 Terraform Core Concepts: Resources, variables, and state management
10.3 Terraform Workflow & GCP: Hands-on Google Cloud integration
10.4 Terraform Production Challenges: Real-world problems and solutions
10.5 Terraform Practical Examples: 14 comprehensive GCP examples

Part B: Ansible - Configuration Management

10.6 Ansible Introduction: Agentless configuration management
10.7 Ansible Core Concepts: Playbooks, roles, and inventory
10.8 Ansible Advanced Features: Templating, vaults, and orchestration
10.9 Ansible Production Patterns: Best practices and real-world examples

Learning Methodology:

Conceptual Understanding: Each chapter starts with theory and principles
Hands-on Examples: Practical examples using Google Cloud Platform
Production Readiness: Real-world challenges and enterprise patterns
Best Practices: Security, maintainability, and team collaboration

Prerequisites for Success:

GCP Account: Free tier provides sufficient resources for all examples
Local Development Environment: VS Code, Git, and terminal access
Basic Cloud Knowledge: Understanding of VMs, networks, and databases
Command Line Comfort: Ability to run commands and navigate directories

Next Steps:

Begin with Chapter 10.1: Terraform Introduction to start your Infrastructure as Code journey. Each chapter includes:

Theoretical concepts with clear explanations
Step-by-step practical examples
Production-ready code templates
Troubleshooting guides and common pitfalls
Review questions and further reading

Note

Infrastructure as Code is a journey, not a destination. Start with simple examples and gradually build complexity as you become more comfortable with the tools and concepts.