AWS VPC Design: Multi-AZ and Multi-Region Patterns
Design production-ready AWS VPCs with proper subnet layouts, high availability across AZs, and multi-region architectures for disaster recovery.
A poorly designed VPC becomes a nightmare to change later. Subnets that are too small, no room for new AZs, or security groups that are impossible to audit. This guide covers VPC design patterns that scale.
VPC Fundamentals
A VPC is your isolated network in AWS. Key components:
| Component | Purpose |
|---|---|
| VPC | Virtual network (10.0.0.0/16) |
| Subnet | Network segment within an AZ |
| Route Table | Controls traffic routing |
| Internet Gateway | Connects VPC to internet |
| NAT Gateway | Outbound internet for private subnets |
| Security Group | Stateful instance-level firewall |
| NACL | Stateless subnet-level firewall |
CIDR Planning
Size Your VPC Appropriately
/16 = 65,536 IPs (recommended for production)
/20 = 4,096 IPs (small applications)
/24 = 256 IPs (too small for most use cases)
Reserve IP Ranges
10.0.0.0/8 - Private range (large)
172.16.0.0/12 - Private range (medium)
192.168.0.0/16 - Private range (small)
100.64.0.0/10 - Carrier-grade NAT (avoid in VPCs)
Multi-VPC CIDR Strategy
Production: 10.0.0.0/16
Staging: 10.1.0.0/16
Development: 10.2.0.0/16
Shared: 10.100.0.0/16
# Leaves room for:
# - VPC peering (non-overlapping required)
# - Transit Gateway connections
# - On-prem connectivity
Standard Multi-AZ Architecture
Subnet Layout
VPC: 10.0.0.0/16
Public Subnets (internet-facing):
10.0.0.0/24 - us-east-1a (public-a)
10.0.1.0/24 - us-east-1b (public-b)
10.0.2.0/24 - us-east-1c (public-c)
Private Subnets (applications):
10.0.10.0/24 - us-east-1a (private-a)
10.0.11.0/24 - us-east-1b (private-b)
10.0.12.0/24 - us-east-1c (private-c)
Database Subnets (isolated):
10.0.20.0/24 - us-east-1a (database-a)
10.0.21.0/24 - us-east-1b (database-b)
10.0.22.0/24 - us-east-1c (database-c)
Terraform Implementation
# variables.tf
variable "vpc_cidr" {
default = "10.0.0.0/16"
}
variable "azs" {
default = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
# main.tf
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "production-vpc"
}
}
# Public Subnets
resource "aws_subnet" "public" {
count = length(var.azs)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.azs[count.index]
map_public_ip_on_launch = true
tags = {
Name = "public-${var.azs[count.index]}"
Tier = "public"
}
}
# Private Subnets
resource "aws_subnet" "private" {
count = length(var.azs)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.azs[count.index]
tags = {
Name = "private-${var.azs[count.index]}"
Tier = "private"
}
}
# Database Subnets
resource "aws_subnet" "database" {
count = length(var.azs)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 20)
availability_zone = var.azs[count.index]
tags = {
Name = "database-${var.azs[count.index]}"
Tier = "database"
}
}
Routing
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
}
# NAT Gateways (one per AZ for HA)
resource "aws_eip" "nat" {
count = length(var.azs)
domain = "vpc"
}
resource "aws_nat_gateway" "main" {
count = length(var.azs)
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
depends_on = [aws_internet_gateway.main]
}
# Public Route Table
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "public-rt"
}
}
resource "aws_route_table_association" "public" {
count = length(var.azs)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
# Private Route Tables (one per AZ)
resource "aws_route_table" "private" {
count = length(var.azs)
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
tags = {
Name = "private-rt-${var.azs[count.index]}"
}
}
resource "aws_route_table_association" "private" {
count = length(var.azs)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
Security Groups
Layered Security
# ALB Security Group
resource "aws_security_group" "alb" {
name_prefix = "alb-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# Application Security Group
resource "aws_security_group" "app" {
name_prefix = "app-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.alb.id] # Only from ALB
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# Database Security Group
resource "aws_security_group" "database" {
name_prefix = "database-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.app.id] # Only from app
}
# No egress needed for RDS
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
VPC Endpoints
Avoid NAT Gateway costs and improve security:
# Gateway Endpoints (free)
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.s3"
route_table_ids = concat(
[aws_route_table.public.id],
aws_route_table.private[*].id
)
}
resource "aws_vpc_endpoint" "dynamodb" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.dynamodb"
route_table_ids = concat(
[aws_route_table.public.id],
aws_route_table.private[*].id
)
}
# Interface Endpoints (cost per hour + data)
resource "aws_vpc_endpoint" "ecr_api" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.ecr.api"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private[*].id
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
}
resource "aws_vpc_endpoint" "ecr_dkr" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.ecr.dkr"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private[*].id
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
}
resource "aws_security_group" "vpc_endpoints" {
name_prefix = "vpc-endpoints-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [var.vpc_cidr]
}
}
Multi-Region Architecture
Active-Passive DR
Primary Region (us-east-1) DR Region (us-west-2)
┌─────────────────────────┐ ┌─────────────────────────┐
│ VPC: 10.0.0.0/16 │ │ VPC: 10.1.0.0/16 │
│ │ │ │
│ ┌─────┐ ┌─────┐ │ │ ┌─────┐ ┌─────┐ │
│ │ App │ │ DB │ │ ------> │ │ App │ │ DB │ │
│ │(Act)│ │(Pri)│ │ Replica │ │(Stb)│ │(Rep)│ │
│ └─────┘ └─────┘ │ │ └─────┘ └─────┘ │
└─────────────────────────┘ └─────────────────────────┘
│ │
└───────────── Route 53 ───────────┘
(Failover Routing)
Terraform Multi-Region
# providers.tf
provider "aws" {
region = "us-east-1"
alias = "primary"
}
provider "aws" {
region = "us-west-2"
alias = "dr"
}
# modules/vpc/main.tf
module "vpc_primary" {
source = "./modules/vpc"
providers = {
aws = aws.primary
}
vpc_cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
module "vpc_dr" {
source = "./modules/vpc"
providers = {
aws = aws.dr
}
vpc_cidr = "10.1.0.0/16"
azs = ["us-west-2a", "us-west-2b", "us-west-2c"]
}
VPC Peering (Cross-Region)
# Peering connection
resource "aws_vpc_peering_connection" "primary_to_dr" {
provider = aws.primary
vpc_id = module.vpc_primary.vpc_id
peer_vpc_id = module.vpc_dr.vpc_id
peer_region = "us-west-2"
auto_accept = false
tags = {
Name = "primary-to-dr-peering"
}
}
# Accept in DR region
resource "aws_vpc_peering_connection_accepter" "dr" {
provider = aws.dr
vpc_peering_connection_id = aws_vpc_peering_connection.primary_to_dr.id
auto_accept = true
}
# Routes in primary region
resource "aws_route" "primary_to_dr" {
provider = aws.primary
count = length(var.primary_azs)
route_table_id = module.vpc_primary.private_route_table_ids[count.index]
destination_cidr_block = module.vpc_dr.vpc_cidr
vpc_peering_connection_id = aws_vpc_peering_connection.primary_to_dr.id
}
# Routes in DR region
resource "aws_route" "dr_to_primary" {
provider = aws.dr
count = length(var.dr_azs)
route_table_id = module.vpc_dr.private_route_table_ids[count.index]
destination_cidr_block = module.vpc_primary.vpc_cidr
vpc_peering_connection_id = aws_vpc_peering_connection.primary_to_dr.id
}
Transit Gateway
For complex multi-VPC and hybrid architectures:
resource "aws_ec2_transit_gateway" "main" {
description = "Main Transit Gateway"
auto_accept_shared_attachments = "enable"
default_route_table_association = "enable"
default_route_table_propagation = "enable"
dns_support = "enable"
vpn_ecmp_support = "enable"
}
# Attach VPCs
resource "aws_ec2_transit_gateway_vpc_attachment" "production" {
subnet_ids = module.vpc_production.private_subnet_ids
transit_gateway_id = aws_ec2_transit_gateway.main.id
vpc_id = module.vpc_production.vpc_id
}
resource "aws_ec2_transit_gateway_vpc_attachment" "staging" {
subnet_ids = module.vpc_staging.private_subnet_ids
transit_gateway_id = aws_ec2_transit_gateway.main.id
vpc_id = module.vpc_staging.vpc_id
}
# Routes from VPCs to Transit Gateway
resource "aws_route" "production_to_tgw" {
count = length(var.production_azs)
route_table_id = module.vpc_production.private_route_table_ids[count.index]
destination_cidr_block = "10.0.0.0/8" # All private ranges
transit_gateway_id = aws_ec2_transit_gateway.main.id
}
VPC Flow Logs
Monitor network traffic:
resource "aws_cloudwatch_log_group" "flow_logs" {
name = "/aws/vpc/flow-logs"
retention_in_days = 30
}
resource "aws_iam_role" "flow_logs" {
name = "vpc-flow-logs-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "vpc-flow-logs.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy" "flow_logs" {
role = aws_iam_role.flow_logs.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams"
]
Resource = "*"
}]
})
}
resource "aws_flow_log" "main" {
iam_role_arn = aws_iam_role.flow_logs.arn
log_destination = aws_cloudwatch_log_group.flow_logs.arn
traffic_type = "ALL"
vpc_id = aws_vpc.main.id
}
Cost Optimization
NAT Gateway Costs
NAT Gateways are expensive (~$32/month + $0.045/GB):
- Use VPC Endpoints — S3, DynamoDB endpoints are free
- Single NAT for non-prod — One NAT per VPC for dev/staging
- NAT Instance — EC2 NAT instance for very low traffic
# Single NAT for non-production
resource "aws_nat_gateway" "single" {
count = var.environment == "production" ? length(var.azs) : 1
# ...
}
Interface Endpoint Consolidation
# Commonly needed endpoints
locals {
interface_endpoints = var.environment == "production" ? [
"ecr.api", "ecr.dkr", "logs", "secretsmanager",
"ssm", "ssmmessages", "ec2messages"
] : [
"ecr.api", "ecr.dkr" # Minimal for non-prod
]
}
When NOT to Separate Subnets
Sometimes simpler is better:
- Very small applications — Single public/private subnet tier
- Serverless architectures — Lambda in VPC is often unnecessary
- Cost-sensitive development — Single NAT gateway is fine
Key Takeaways
- Size your VPC for growth — /16 gives room for expansion
- Plan CIDR ranges across accounts — Prevent peering conflicts
- Use 3 AZs minimum — AWS recommends 3 for production
- One NAT per AZ in production — AZ isolation prevents cascading failures
- Use VPC Endpoints — Save money, improve security
- Enable Flow Logs — Essential for debugging and security
- Layer security groups — ALB → App → DB, each only talks to adjacent layer
VPC design is foundational. Getting it right from the start saves painful migrations later.