10. Infrastructure as Code - Introduction

Learning Objectives

By the end of this chapter, you will be able to:

  • Define Infrastructure as Code (IaC) principles and understand its business value

  • Compare different IaC approaches: declarative vs imperative, provisioning vs configuration

  • Identify when to use Terraform vs Ansible and how they complement each other

  • Design infrastructure workflows that combine provisioning and configuration management

  • Apply IaC best practices for version control, testing, and collaboration

  • Implement security and compliance strategies in infrastructure code

  • Understand the modern DevOps toolchain and where IaC fits

Prerequisites: Basic understanding of cloud computing concepts, version control (Git), and command-line operations.

Chapter Scope: This chapter provides a comprehensive foundation for Infrastructure as Code, covering both infrastructure provisioning (Terraform) and configuration management (Ansible) with hands-on examples using Google Cloud Platform.

What is Infrastructure as Code (IaC)?

Infrastructure as Code (IaC) is the practice of managing and provisioning computing infrastructure through machine-readable definition files, rather than through physical hardware configuration or interactive configuration tools.

The Traditional Problem

Before IaC, infrastructure management was typically:

  • Manual: Clicking through web consoles or running ad-hoc commands

  • Error-prone: Human mistakes in configuration and deployment

  • Inconsistent: Different configurations across development, staging, and production

  • Undocumented: No clear record of how infrastructure was configured

  • Slow: Hours or days to provision new environments

  • Difficult to scale: Hard to replicate successful configurations

The IaC Solution

Infrastructure as Code addresses these challenges by treating infrastructure the same way we treat application code:

Traditional Approach:
Manual Console → Click, Click, Click → Infrastructure

IaC Approach:
Code → Version Control → Automated Deployment → Infrastructure

Core IaC Principles:

  1. Declarative Configuration

    Specify the desired end state, not the steps to achieve it:

    # Declare what you want
    resource "google_compute_instance" "web_server" {
      name         = "production-web"
      machine_type = "e2-standard-2"
      zone         = "us-central1-a"
    }
    
    # Tool figures out how to create it
    
  2. Version Control Integration

    Infrastructure definitions are stored in Git alongside application code:

    git log --oneline infrastructure/
    a1b2c3d Add load balancer for high availability
    e4f5g6h Update instance types for cost optimization
    h7i8j9k Initial infrastructure setup
    
  3. Automated Testing and Deployment

    Infrastructure changes go through the same CI/CD process as code:

    # .github/workflows/infrastructure.yml
    - name: Validate Infrastructure
      run: terraform validate
    
    - name: Plan Changes
      run: terraform plan
    
    - name: Apply on Approval
      run: terraform apply -auto-approve
    
  4. Idempotency

    Running the same configuration multiple times produces the same result:

    terraform apply  # Creates resources
    terraform apply  # No changes needed
    terraform apply  # Still no changes needed
    

Business Benefits of IaC:

Business Value

Technical Implementation

Faster Time-to- Market

Automated provisioning reduces deployment time from days to minutes

Cost Optimization

Infrastructure as code enables right-sizing and automatic scaling

Risk Reduction

Consistent, tested configurations eliminate human error

Compliance

Auditable infrastructure changes with full history and approval process

Disaster Recovery

Infrastructure can be rebuilt quickly from code definitions

IaC Tool Categories and Use Cases

Infrastructure as Code tools fall into several categories, each optimized for different tasks:

1. Infrastructure Provisioning Tools

These tools create and manage cloud resources (servers, networks, databases):

  • Terraform/OpenTofu: Multi-cloud, declarative, state-managed

  • SST/Pulumi: Multi-cloud, imperative, programming languages

  • AWS CloudFormation: AWS-specific, JSON/YAML

  • Google Cloud Deployment Manager: GCP-specific

  • Azure Resource Manager: Azure-specific

2. Configuration Management Tools

These tools configure software and settings on existing infrastructure:

  • Ansible: Agentless, push-based, YAML playbooks

  • Chef: Agent-based, Ruby DSL, pull-based

  • Puppet: Agent-based, declarative, pull-based

  • SaltStack: Agent-based, Python, event-driven

3. Container Orchestration

These tools manage containerized applications:

  • Kubernetes: Container orchestration platform

  • Docker Compose: Local multi-container applications

  • Helm: Kubernetes package manager

4. Immutable Infrastructure Tools

These tools create complete, immutable system images:

  • Packer: Multi-platform image building

  • Docker: Container images

  • Vagrant: Development environment provisioning

Infrastructure Provisioning: Terraform vs Pulumi

When choosing an infrastructure provisioning tool, Terraform and Pulumi are the two leading options. Here’s a comprehensive comparison:

Terraform: Declarative HCL-Based Provisioning

Aspect

Terraform

Language

HashiCorp Configuration Language (HCL)

Approach

Declarative configuration files

State Management

Built-in state tracking (.tfstate)

Provider Ecosystem

3000+ providers, mature ecosystem

Learning Curve

Moderate, domain-specific language

Team Adoption

Easy for ops teams, new syntax

Version Control

Native support, diff-friendly

Testing

Limited, requires external tools

Pulumi: General-Purpose Programming Languages

Aspect

Pulumi

Language

Python, TypeScript, Go, C#, Java

Approach

Imperative programming with SDK

State Management

Pulumi Cloud or self-managed backend

Provider Ecosystem

Growing, leverages Terraform providers

Learning Curve

Easy for developers, familiar syntax

Team Adoption

Great for dev teams, programming model

Version Control

Standard programming practices

Testing

Full unit testing capabilities

Detailed Feature Comparison:

Configuration Syntax:

# Terraform HCL
resource "google_compute_instance" "web_server" {
  name         = "web-server"
  machine_type = "e2-standard-2"
  zone         = "us-central1-a"

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-lts"
    }
  }

  network_interface {
    network = google_compute_network.main.id
    access_config {}
  }

  count = var.instance_count
}
# Pulumi Python
import pulumi_gcp as gcp

web_server = gcp.compute.Instance("web-server",
    machine_type="e2-standard-2",
    zone="us-central1-a",
    boot_disk=gcp.compute.InstanceBootDiskArgs(
        initialize_params=gcp.compute.InstanceBootDiskInitializeParamsArgs(
            image="ubuntu-os-cloud/ubuntu-2204-lts"
        )
    ),
    network_interfaces=[gcp.compute.InstanceNetworkInterfaceArgs(
        network=main_network.id,
        access_configs=[gcp.compute.InstanceNetworkInterfaceAccessConfigArgs()]
    )]
)

When to Choose Terraform:

  • Operations-focused teams comfortable with configuration languages

  • Multi-cloud environments requiring extensive provider support

  • Established workflows with existing Terraform expertise

  • Regulatory environments requiring mature, battle-tested solutions

  • Simple to moderate complexity infrastructure requirements

  • Strong community ecosystem needs (modules, examples)

When to Choose Pulumi:

  • Developer-heavy teams familiar with modern programming languages

  • Complex infrastructure logic requiring loops, conditionals, functions

  • Testing requirements needing unit tests for infrastructure code

  • Dynamic infrastructure based on external data sources or APIs

  • Integration needs with existing CI/CD and development workflows

  • Type safety requirements and IDE support (IntelliSense, debugging)

Real-World Example: Dynamic Infrastructure

Terraform approach (limited dynamic capabilities):

# Terraform - requires external data sources
data "external" "regions" {
  program = ["python", "get-regions.py"]
}

resource "google_compute_instance" "regional_servers" {
  for_each = toset(jsondecode(data.external.regions.result.regions))

  name         = "server-${each.key}"
  zone         = "${each.key}-a"
  machine_type = "e2-micro"
}

Pulumi approach (native programming logic):

# Pulumi - native Python logic
import requests
import pulumi_gcp as gcp

# Fetch regions dynamically
response = requests.get("https://api.example.com/regions")
regions = response.json()["regions"]

# Create instances with complex logic
servers = []
for region in regions:
    if region["capacity"] > 100:  # Complex conditional logic
        instance_type = "e2-standard-2"
    else:
        instance_type = "e2-micro"

    server = gcp.compute.Instance(f"server-{region['name']}",
        machine_type=instance_type,
        zone=f"{region['name']}-a",
        # Complex configuration based on region data
        metadata={
            "region-tier": region["tier"],
            "compliance-zone": region["compliance_requirements"]
        }
    )
    servers.append(server)

Migration Considerations:

Migration Path

Considerations

Terraform → Pulumi

Import existing state Rewrite configurations in code Team training on programming model

Pulumi → Terraform

Export state to Terraform format Convert code to HCL Simplify complex logic

Configuration Management: Ansible vs Puppet

For configuration management, Ansible and Puppet represent different philosophies and operational models:

Ansible: Agentless Push-Based Management

Aspect

Ansible

Architecture

Agentless, SSH-based communication

Execution Model

Push-based from control node

Language

YAML playbooks, Python modules

Learning Curve

Easy, human-readable YAML

State Management

Stateless, no central database

Deployment

Simple, no agent installation

Scalability

Good for small-medium environments

Real-time Enforcement

Manual or scheduled execution

Puppet: Agent-Based Pull-Based Management

Aspect

Puppet

Architecture

Agent-based with central Puppet Master

Execution Model

Pull-based, agents check for changes

Language

Puppet DSL (Domain Specific Language)

Learning Curve

Steep, requires learning Puppet DSL

State Management

Central PuppetDB with full state info

Deployment

Complex, requires agent on every node

Scalability

Excellent for large environments

Real-time Enforcement

Continuous, automatic drift correction

Architectural Differences:

Ansible Architecture (Push Model):

┌─────────────────┐     SSH/WinRM     ┌─────────────────┐
│  Control Node   │ ───────────────►  │   Managed Node  │
│                 │                   │                 │
│ • Playbooks     │ ───────────────►  │ • Python        │
│ • Inventory     │     Execute       │ • Target Apps   │
│ • Vault         │     Tasks         │ • No Agent      │
└─────────────────┘                   └─────────────────┘

Puppet Architecture (Pull Model):

┌─────────────────┐                   ┌─────────────────┐
│  Puppet Master  │ ◄───────────────  │   Managed Node  │
│                 │   Request         │                 │
│ • Manifests     │   Catalog         │ • Puppet Agent  │
│ • Modules       │ ────────────────► │ • Facter        │
│ • PuppetDB      │   Send Catalog    │ • Target Apps   │
└─────────────────┘                   └─────────────────┘

Configuration Syntax Comparison:

Ansible YAML Playbook:

---
- name: Configure web server
  hosts: web_servers
  become: true

  vars:
    nginx_port: 80
    app_name: "my-web-app"

  tasks:
    - name: Install Nginx
      package:
        name: nginx
        state: present

    - name: Configure Nginx
      template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
      notify: restart nginx

    - name: Start and enable Nginx
      service:
        name: nginx
        state: started
        enabled: yes

  handlers:
    - name: restart nginx
      service:
        name: nginx
        state: restarted

Puppet Manifest:

# manifests/webserver.pp
class webserver (
  $nginx_port = 80,
  $app_name = 'my-web-app',
) {

  package { 'nginx':
    ensure => present,
  }

  file { '/etc/nginx/nginx.conf':
    ensure  => file,
    content => template('webserver/nginx.conf.erb'),
    require => Package['nginx'],
    notify  => Service['nginx'],
  }

  service { 'nginx':
    ensure    => running,
    enable    => true,
    require   => Package['nginx'],
    subscribe => File['/etc/nginx/nginx.conf'],
  }
}

node 'web-server-01.example.com' {
  include webserver
}

Operational Model Comparison:

Ansible Execution Flow:

# Manual execution from control node
ansible-playbook -i inventory playbook.yml

# Scheduled execution (cron)
0 2 * * * /usr/bin/ansible-playbook -i /etc/ansible/inventory /etc/ansible/maintenance.yml

# Event-driven execution (CI/CD triggered)
# Runs when code changes or infrastructure events occur

Puppet Execution Flow:

# Automatic agent runs (default every 30 minutes)
# Puppet agent automatically contacts master
puppet agent --test

# Continuous enforcement
# Agents continuously ensure desired state
# Automatic drift correction without human intervention

When to Choose Ansible:

  • Simple to moderate environments (< 1000 nodes)

  • Agentless architecture preferred (security, simplicity)

  • Ad-hoc task execution and orchestration workflows

  • Developer-friendly teams comfortable with YAML

  • Integration with existing SSH infrastructure

  • Event-driven automation (CI/CD pipelines)

  • Multi-platform environments with diverse systems

  • Quick setup and deployment requirements

When to Choose Puppet:

  • Large-scale environments (1000+ nodes)

  • Continuous compliance and drift correction needs

  • Enterprise governance and reporting requirements

  • Dedicated operations teams with configuration management expertise

  • Complex dependency management and ordering requirements

  • Centralized policy enforcement and auditing

  • High availability and disaster recovery needs

  • Long-term infrastructure lifecycle management

Hybrid Approaches:

Many organizations use both tools in complementary ways:

Common Hybrid Pattern:

1. Terraform provisions infrastructure
2. Ansible performs initial configuration and application deployment
3. Puppet maintains ongoing configuration compliance
4. Ansible handles application updates and orchestration

Migration Strategies:

Migration Path

Strategy

Puppet → Ansible

  • Start with new projects in Ansible

  • Gradually convert existing manifests

  • Maintain hybrid during transition

Ansible → Puppet

  • Implement Puppet for new compliance

  • Keep Ansible for orchestration

  • Focus Puppet on state enforcement

Decision Matrix:

Requirement

Ansible

Puppet

Quick Setup

Excellent

Complex

Large Scale (5000+)

Challenging

Excellent

Continuous Compliance

Manual

Automatic

Learning Curve

Easy

Steep

Agent Requirements

Agentless

Agent Required

Orchestration

Excellent

Limited

Reporting/Auditing

Basic

Comprehensive

The Modern IaC Stack

In practice, organizations use multiple tools together:

Layer 4: Applications
├── Kubernetes (Container Orchestration)
├── Helm (Package Management)

Layer 3: Configuration Management
├── Ansible (Software Installation & Configuration)
├── Docker (Application Packaging)

Layer 2: Infrastructure Provisioning
├── Terraform (Cloud Resources)
├── Packer (Machine Images)

Layer 1: Foundation
├── Git (Version Control)
├── CI/CD (Automated Deployment)

Terraform vs Ansible: When to Use Each

Understanding when to use Terraform versus Ansible is crucial for building effective IaC workflows:

Terraform: Infrastructure Provisioning

What Terraform Does

Example Use Cases

Creates cloud resources

  • GCP Compute Engine instances

  • VPC networks and subnets

  • Cloud SQL databases

  • Load balancers and firewalls

Manages resource lifecycle

  • Scaling instance groups up/down

  • Updating firewall rules

  • Destroying unused resources

Handles dependencies

  • Ensures network exists before VMs

  • Creates database before app servers

Ansible: Configuration Management

What Ansible Does

Example Use Cases

Configures existing infrastructure

  • Installing Nginx on web servers

  • Configuring SSL certificates

  • Setting up monitoring agents

Deploys applications

  • Deploying web applications

  • Updating application configurations

  • Rolling application updates

Orchestrates procedures

  • Multi-step deployment workflows

  • Backup and maintenance tasks

  • Emergency response procedures

Typical Workflow: Terraform + Ansible

# Step 1: Provision infrastructure with Terraform
cd infrastructure/
terraform init
terraform plan
terraform apply

# Step 2: Configure the infrastructure with Ansible
cd ../configuration/
ansible-playbook -i gcp_inventory.yml site.yml

# Step 3: Deploy applications
ansible-playbook -i gcp_inventory.yml deploy.yml

Example: Web Application Deployment

Terraform handles the “WHAT” (what infrastructure exists):

# Create the infrastructure
resource "google_compute_instance" "web_servers" {
  count        = 3
  name         = "web-server-${count.index + 1}"
  machine_type = "e2-standard-2"
  zone         = "us-central1-a"

  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2204-lts"
    }
  }

  network_interface {
    network = google_compute_network.main.id
    access_config {}
  }
}

Ansible handles the “HOW” (how software is configured):

# Configure the infrastructure
- name: Configure web servers
  hosts: web_servers
  become: true

  tasks:
    - name: Install Nginx
      apt:
        name: nginx
        state: present
        update_cache: yes

    - name: Configure Nginx
      template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
      notify: restart nginx

    - name: Deploy application
      copy:
        src: "{{ app_files }}"
        dest: /var/www/html/

IaC Best Practices Overview

1. Version Control Everything

project/
├── infrastructure/          # Terraform code
│   ├── main.tf
│   ├── variables.tf
│   └── terraform.tfvars
├── configuration/           # Ansible playbooks
│   ├── site.yml
│   ├── inventory/
│   └── roles/
└── .github/workflows/       # CI/CD pipelines
    └── deploy.yml

2. Environment Separation

# Use workspaces or separate state files
terraform workspace select production
terraform workspace select staging
terraform workspace select development

3. Modular Design

# Use modules for reusability
module "web_servers" {
  source = "./modules/compute"

  instance_count = var.web_server_count
  machine_type   = var.web_machine_type
  environment    = var.environment
}

4. Security and Secrets Management

# Never commit secrets to version control
echo "*.tfvars" >> .gitignore
echo "secrets/" >> .gitignore

# Use secure secret management
ansible-vault create secrets.yml

5. Testing and Validation

# Validate before applying
terraform validate
terraform plan

# Test Ansible playbooks
ansible-playbook --check --diff site.yml

Chapter Structure and Learning Approach

This Infrastructure as Code section is organized into focused chapters that build upon each other:

Part A: Terraform - Infrastructure Provisioning

  • 10.1 Terraform Introduction: Understanding declarative infrastructure

  • 10.2 Terraform Core Concepts: Resources, variables, and state management

  • 10.3 Terraform Workflow & GCP: Hands-on Google Cloud integration

  • 10.4 Terraform Production Challenges: Real-world problems and solutions

  • 10.5 Terraform Practical Examples: 14 comprehensive GCP examples

Part B: Ansible - Configuration Management

  • 10.6 Ansible Introduction: Agentless configuration management

  • 10.7 Ansible Core Concepts: Playbooks, roles, and inventory

  • 10.8 Ansible Advanced Features: Templating, vaults, and orchestration

  • 10.9 Ansible Production Patterns: Best practices and real-world examples

Learning Methodology:

  1. Conceptual Understanding: Each chapter starts with theory and principles

  2. Hands-on Examples: Practical examples using Google Cloud Platform

  3. Production Readiness: Real-world challenges and enterprise patterns

  4. Best Practices: Security, maintainability, and team collaboration

Prerequisites for Success:

  • GCP Account: Free tier provides sufficient resources for all examples

  • Local Development Environment: VS Code, Git, and terminal access

  • Basic Cloud Knowledge: Understanding of VMs, networks, and databases

  • Command Line Comfort: Ability to run commands and navigate directories

Next Steps:

Begin with Chapter 10.1: Terraform Introduction to start your Infrastructure as Code journey. Each chapter includes:

  • Theoretical concepts with clear explanations

  • Step-by-step practical examples

  • Production-ready code templates

  • Troubleshooting guides and common pitfalls

  • Review questions and further reading

Note

Infrastructure as Code is a journey, not a destination. Start with simple examples and gradually build complexity as you become more comfortable with the tools and concepts.