Terraform is the industry standard for Infrastructure as Code, but poorly organized Terraform can become a nightmare. This guide covers the practices that keep your infrastructure maintainable, secure, and scalable.

Project Structure

A well-organized Terraform project separates concerns and promotes reusability. Here’s a battle-tested structure:

infrastructure/
├── modules/                    # Reusable modules
│   ├── vpc/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   └── README.md
│   ├── eks-cluster/
│   └── rds-instance/
├── environments/               # Environment-specific configs
│   ├── dev/
│   │   ├── main.tf
│   │   ├── terraform.tfvars
│   │   └── backend.tf
│   ├── staging/
│   └── prod/
├── global/                     # Shared resources (IAM, DNS)
│   ├── iam/
│   └── route53/
└── terraform.tf                # Provider versions

Writing Reusable Modules

Modules are the building blocks of maintainable Terraform. A good module is:

  • Single-purpose: Does one thing well
  • Configurable: Uses variables for all environment-specific values
  • Documented: Has clear inputs/outputs and a README

Module Example: VPC

# modules/vpc/variables.tf
variable "name" {
  description = "Name prefix for all VPC resources"
  type        = string
}

variable "cidr_block" {
  description = "CIDR block for the VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "availability_zones" {
  description = "List of AZs to use for subnets"
  type        = list(string)
}

variable "enable_nat_gateway" {
  description = "Create NAT gateways for private subnets"
  type        = bool
  default     = true
}

variable "single_nat_gateway" {
  description = "Use a single NAT gateway (cheaper, less HA)"
  type        = bool
  default     = false
}

variable "tags" {
  description = "Tags to apply to all resources"
  type        = map(string)
  default     = {}
}
# modules/vpc/main.tf
locals {
  # Calculate subnet CIDRs automatically
  public_subnets  = [for i, az in var.availability_zones : cidrsubnet(var.cidr_block, 8, i)]
  private_subnets = [for i, az in var.availability_zones : cidrsubnet(var.cidr_block, 8, i + 100)]
  
  nat_gateway_count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.availability_zones)) : 0
}

resource "aws_vpc" "main" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(var.tags, {
    Name = "${var.name}-vpc"
  })
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = merge(var.tags, {
    Name = "${var.name}-igw"
  })
}

resource "aws_subnet" "public" {
  count = length(var.availability_zones)

  vpc_id                  = aws_vpc.main.id
  cidr_block              = local.public_subnets[count.index]
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = merge(var.tags, {
    Name = "${var.name}-public-${var.availability_zones[count.index]}"
    Tier = "public"
  })
}

resource "aws_subnet" "private" {
  count = length(var.availability_zones)

  vpc_id            = aws_vpc.main.id
  cidr_block        = local.private_subnets[count.index]
  availability_zone = var.availability_zones[count.index]

  tags = merge(var.tags, {
    Name = "${var.name}-private-${var.availability_zones[count.index]}"
    Tier = "private"
  })
}

resource "aws_eip" "nat" {
  count  = local.nat_gateway_count
  domain = "vpc"

  tags = merge(var.tags, {
    Name = "${var.name}-nat-eip-${count.index}"
  })
}

resource "aws_nat_gateway" "main" {
  count = local.nat_gateway_count

  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = merge(var.tags, {
    Name = "${var.name}-nat-${count.index}"
  })

  depends_on = [aws_internet_gateway.main]
}

# Route tables omitted for brevity...
# modules/vpc/outputs.tf
output "vpc_id" {
  description = "The ID of the VPC"
  value       = aws_vpc.main.id
}

output "public_subnet_ids" {
  description = "List of public subnet IDs"
  value       = aws_subnet.public[*].id
}

output "private_subnet_ids" {
  description = "List of private subnet IDs"
  value       = aws_subnet.private[*].id
}

output "nat_gateway_ips" {
  description = "Public IPs of the NAT gateways"
  value       = aws_eip.nat[*].public_ip
}

Using the Module

# environments/prod/main.tf
module "vpc" {
  source = "../../modules/vpc"

  name               = "prod"
  cidr_block         = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  enable_nat_gateway = true
  single_nat_gateway = false  # High availability in prod

  tags = {
    Environment = "production"
    ManagedBy   = "terraform"
  }
}

State Management

Terraform state is the source of truth for your infrastructure. Mismanaging it leads to drift, conflicts, and data loss.

Remote State with S3 and DynamoDB

# backend.tf
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "prod/vpc/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}
# Setting up the state bucket (run once, then import)
resource "aws_s3_bucket" "terraform_state" {
  bucket = "mycompany-terraform-state"

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

State File Best Practices

  1. Never store state locally in production — always use remote backends
  2. Enable versioning on your S3 bucket for disaster recovery
  3. Use state locking to prevent concurrent modifications
  4. Separate state files by environment and component
  5. Never commit state files to version control (they contain secrets)

State File Organization

terraform-state/
├── global/
│   ├── iam/terraform.tfstate
│   └── route53/terraform.tfstate
├── prod/
│   ├── vpc/terraform.tfstate
│   ├── eks/terraform.tfstate
│   └── rds/terraform.tfstate
├── staging/
└── dev/

Workspaces vs. Directories

Terraform workspaces and separate directories are two approaches to managing multiple environments.

When to Use Workspaces

Workspaces work well when environments are nearly identical:

# main.tf with workspaces
locals {
  env_config = {
    dev = {
      instance_type = "t3.small"
      min_nodes     = 1
      max_nodes     = 3
    }
    staging = {
      instance_type = "t3.medium"
      min_nodes     = 2
      max_nodes     = 5
    }
    prod = {
      instance_type = "t3.large"
      min_nodes     = 3
      max_nodes     = 10
    }
  }

  config = local.env_config[terraform.workspace]
}

resource "aws_eks_node_group" "main" {
  # ...
  instance_types = [local.config.instance_type]

  scaling_config {
    min_size = local.config.min_nodes
    max_size = local.config.max_nodes
  }
}
# Using workspaces
terraform workspace new staging
terraform workspace select staging
terraform apply

When to Use Separate Directories

Use directories when environments have significant differences or you need:

  • Different providers or provider versions
  • Different modules for each environment
  • Independent state files (recommended for production)
environments/
├── dev/
│   ├── main.tf          # Simpler setup
│   └── backend.tf
├── staging/
│   ├── main.tf          # Staging-specific resources
│   └── backend.tf
└── prod/
    ├── main.tf          # Full HA setup
    ├── backend.tf
    └── dr.tf            # Disaster recovery resources

Variable Management

Use .tfvars for Environment Configuration

# environments/prod/terraform.tfvars
environment = "prod"
region      = "us-east-1"

vpc_cidr           = "10.0.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]

eks_cluster_version = "1.28"
eks_node_instance_types = ["t3.large", "t3.xlarge"]

rds_instance_class     = "db.r6g.large"
rds_multi_az           = true
rds_deletion_protection = true

Sensitive Variables

Never put secrets in .tfvars files. Use:

# variables.tf
variable "database_password" {
  description = "RDS master password"
  type        = string
  sensitive   = true
}
# Pass via environment variable
export TF_VAR_database_password="$(aws secretsmanager get-secret-value --secret-id prod/rds/password --query SecretString --output text)"
terraform apply

Resource Naming and Tagging

Consistent naming and tagging is crucial for cost tracking and resource management:

# locals.tf
locals {
  common_tags = {
    Environment = var.environment
    Project     = var.project_name
    ManagedBy   = "terraform"
    Owner       = var.team_name
    CostCenter  = var.cost_center
  }

  name_prefix = "${var.project_name}-${var.environment}"
}

# Usage
resource "aws_instance" "app" {
  # ...
  tags = merge(local.common_tags, {
    Name = "${local.name_prefix}-app-server"
    Role = "application"
  })
}

Lifecycle Rules

Protect critical resources from accidental deletion:

resource "aws_db_instance" "main" {
  # ...
  deletion_protection = true

  lifecycle {
    prevent_destroy = true
    
    # Ignore changes made outside Terraform
    ignore_changes = [
      password,  # Rotated externally
    ]
  }
}

Data Sources for Existing Resources

Use data sources to reference resources managed outside Terraform:

# Reference existing VPC
data "aws_vpc" "existing" {
  tags = {
    Name = "legacy-vpc"
  }
}

# Get latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

# Use in resources
resource "aws_instance" "app" {
  ami           = data.aws_ami.amazon_linux.id
  subnet_id     = data.aws_vpc.existing.id
  # ...
}

Validation and Formatting

Always run these before committing:

# Format code
terraform fmt -recursive

# Validate syntax
terraform validate

# Preview changes
terraform plan -out=plan.tfplan

# Apply from plan file (recommended in CI/CD)
terraform apply plan.tfplan

Key Takeaways

  1. Modularize everything — if you copy-paste, make it a module
  2. Remote state is mandatory — local state doesn’t scale
  3. Separate environments — use directories for production isolation
  4. Tag everything — your future self (and finance team) will thank you
  5. Use prevent_destroy on databases, S3 buckets, and anything stateful
  6. Plan before apply — always review what Terraform will do

“Infrastructure as Code is not about writing code. It’s about making infrastructure reproducible, testable, and versionable.” — Kief Morris