Terraform Best Practices: Modules, State, and Workspaces
A comprehensive guide to organizing Terraform code with reusable modules, managing state safely, and using workspaces for multi-environment deployments.
Terraform is the industry standard for Infrastructure as Code, but poorly organized Terraform can become a nightmare. This guide covers the practices that keep your infrastructure maintainable, secure, and scalable.
Project Structure
A well-organized Terraform project separates concerns and promotes reusability. Here’s a battle-tested structure:
infrastructure/
├── modules/ # Reusable modules
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── README.md
│ ├── eks-cluster/
│ └── rds-instance/
├── environments/ # Environment-specific configs
│ ├── dev/
│ │ ├── main.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf
│ ├── staging/
│ └── prod/
├── global/ # Shared resources (IAM, DNS)
│ ├── iam/
│ └── route53/
└── terraform.tf # Provider versions
Writing Reusable Modules
Modules are the building blocks of maintainable Terraform. A good module is:
- Single-purpose: Does one thing well
- Configurable: Uses variables for all environment-specific values
- Documented: Has clear inputs/outputs and a README
Module Example: VPC
# modules/vpc/variables.tf
variable "name" {
description = "Name prefix for all VPC resources"
type = string
}
variable "cidr_block" {
description = "CIDR block for the VPC"
type = string
default = "10.0.0.0/16"
}
variable "availability_zones" {
description = "List of AZs to use for subnets"
type = list(string)
}
variable "enable_nat_gateway" {
description = "Create NAT gateways for private subnets"
type = bool
default = true
}
variable "single_nat_gateway" {
description = "Use a single NAT gateway (cheaper, less HA)"
type = bool
default = false
}
variable "tags" {
description = "Tags to apply to all resources"
type = map(string)
default = {}
}
# modules/vpc/main.tf
locals {
# Calculate subnet CIDRs automatically
public_subnets = [for i, az in var.availability_zones : cidrsubnet(var.cidr_block, 8, i)]
private_subnets = [for i, az in var.availability_zones : cidrsubnet(var.cidr_block, 8, i + 100)]
nat_gateway_count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.availability_zones)) : 0
}
resource "aws_vpc" "main" {
cidr_block = var.cidr_block
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(var.tags, {
Name = "${var.name}-vpc"
})
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = merge(var.tags, {
Name = "${var.name}-igw"
})
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = local.public_subnets[count.index]
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = merge(var.tags, {
Name = "${var.name}-public-${var.availability_zones[count.index]}"
Tier = "public"
})
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = local.private_subnets[count.index]
availability_zone = var.availability_zones[count.index]
tags = merge(var.tags, {
Name = "${var.name}-private-${var.availability_zones[count.index]}"
Tier = "private"
})
}
resource "aws_eip" "nat" {
count = local.nat_gateway_count
domain = "vpc"
tags = merge(var.tags, {
Name = "${var.name}-nat-eip-${count.index}"
})
}
resource "aws_nat_gateway" "main" {
count = local.nat_gateway_count
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = merge(var.tags, {
Name = "${var.name}-nat-${count.index}"
})
depends_on = [aws_internet_gateway.main]
}
# Route tables omitted for brevity...
# modules/vpc/outputs.tf
output "vpc_id" {
description = "The ID of the VPC"
value = aws_vpc.main.id
}
output "public_subnet_ids" {
description = "List of public subnet IDs"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = aws_subnet.private[*].id
}
output "nat_gateway_ips" {
description = "Public IPs of the NAT gateways"
value = aws_eip.nat[*].public_ip
}
Using the Module
# environments/prod/main.tf
module "vpc" {
source = "../../modules/vpc"
name = "prod"
cidr_block = "10.0.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
enable_nat_gateway = true
single_nat_gateway = false # High availability in prod
tags = {
Environment = "production"
ManagedBy = "terraform"
}
}
State Management
Terraform state is the source of truth for your infrastructure. Mismanaging it leads to drift, conflicts, and data loss.
Remote State with S3 and DynamoDB
# backend.tf
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "prod/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
# Setting up the state bucket (run once, then import)
resource "aws_s3_bucket" "terraform_state" {
bucket = "mycompany-terraform-state"
lifecycle {
prevent_destroy = true
}
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_s3_bucket_public_access_block" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
State File Best Practices
- Never store state locally in production — always use remote backends
- Enable versioning on your S3 bucket for disaster recovery
- Use state locking to prevent concurrent modifications
- Separate state files by environment and component
- Never commit state files to version control (they contain secrets)
State File Organization
terraform-state/
├── global/
│ ├── iam/terraform.tfstate
│ └── route53/terraform.tfstate
├── prod/
│ ├── vpc/terraform.tfstate
│ ├── eks/terraform.tfstate
│ └── rds/terraform.tfstate
├── staging/
└── dev/
Workspaces vs. Directories
Terraform workspaces and separate directories are two approaches to managing multiple environments.
When to Use Workspaces
Workspaces work well when environments are nearly identical:
# main.tf with workspaces
locals {
env_config = {
dev = {
instance_type = "t3.small"
min_nodes = 1
max_nodes = 3
}
staging = {
instance_type = "t3.medium"
min_nodes = 2
max_nodes = 5
}
prod = {
instance_type = "t3.large"
min_nodes = 3
max_nodes = 10
}
}
config = local.env_config[terraform.workspace]
}
resource "aws_eks_node_group" "main" {
# ...
instance_types = [local.config.instance_type]
scaling_config {
min_size = local.config.min_nodes
max_size = local.config.max_nodes
}
}
# Using workspaces
terraform workspace new staging
terraform workspace select staging
terraform apply
When to Use Separate Directories
Use directories when environments have significant differences or you need:
- Different providers or provider versions
- Different modules for each environment
- Independent state files (recommended for production)
environments/
├── dev/
│ ├── main.tf # Simpler setup
│ └── backend.tf
├── staging/
│ ├── main.tf # Staging-specific resources
│ └── backend.tf
└── prod/
├── main.tf # Full HA setup
├── backend.tf
└── dr.tf # Disaster recovery resources
Variable Management
Use .tfvars for Environment Configuration
# environments/prod/terraform.tfvars
environment = "prod"
region = "us-east-1"
vpc_cidr = "10.0.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
eks_cluster_version = "1.28"
eks_node_instance_types = ["t3.large", "t3.xlarge"]
rds_instance_class = "db.r6g.large"
rds_multi_az = true
rds_deletion_protection = true
Sensitive Variables
Never put secrets in .tfvars files. Use:
# variables.tf
variable "database_password" {
description = "RDS master password"
type = string
sensitive = true
}
# Pass via environment variable
export TF_VAR_database_password="$(aws secretsmanager get-secret-value --secret-id prod/rds/password --query SecretString --output text)"
terraform apply
Resource Naming and Tagging
Consistent naming and tagging is crucial for cost tracking and resource management:
# locals.tf
locals {
common_tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "terraform"
Owner = var.team_name
CostCenter = var.cost_center
}
name_prefix = "${var.project_name}-${var.environment}"
}
# Usage
resource "aws_instance" "app" {
# ...
tags = merge(local.common_tags, {
Name = "${local.name_prefix}-app-server"
Role = "application"
})
}
Lifecycle Rules
Protect critical resources from accidental deletion:
resource "aws_db_instance" "main" {
# ...
deletion_protection = true
lifecycle {
prevent_destroy = true
# Ignore changes made outside Terraform
ignore_changes = [
password, # Rotated externally
]
}
}
Data Sources for Existing Resources
Use data sources to reference resources managed outside Terraform:
# Reference existing VPC
data "aws_vpc" "existing" {
tags = {
Name = "legacy-vpc"
}
}
# Get latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
# Use in resources
resource "aws_instance" "app" {
ami = data.aws_ami.amazon_linux.id
subnet_id = data.aws_vpc.existing.id
# ...
}
Validation and Formatting
Always run these before committing:
# Format code
terraform fmt -recursive
# Validate syntax
terraform validate
# Preview changes
terraform plan -out=plan.tfplan
# Apply from plan file (recommended in CI/CD)
terraform apply plan.tfplan
Key Takeaways
- Modularize everything — if you copy-paste, make it a module
- Remote state is mandatory — local state doesn’t scale
- Separate environments — use directories for production isolation
- Tag everything — your future self (and finance team) will thank you
- Use
prevent_destroyon databases, S3 buckets, and anything stateful - Plan before apply — always review what Terraform will do
“Infrastructure as Code is not about writing code. It’s about making infrastructure reproducible, testable, and versionable.” — Kief Morris