A poorly designed VPC becomes a nightmare to change later. Subnets that are too small, no room for new AZs, or security groups that are impossible to audit. This guide covers VPC design patterns that scale.

VPC Fundamentals

A VPC is your isolated network in AWS. Key components:

ComponentPurpose
VPCVirtual network (10.0.0.0/16)
SubnetNetwork segment within an AZ
Route TableControls traffic routing
Internet GatewayConnects VPC to internet
NAT GatewayOutbound internet for private subnets
Security GroupStateful instance-level firewall
NACLStateless subnet-level firewall

CIDR Planning

Size Your VPC Appropriately

/16 = 65,536 IPs (recommended for production)
/20 = 4,096 IPs (small applications)
/24 = 256 IPs (too small for most use cases)

Reserve IP Ranges

10.0.0.0/8     - Private range (large)
172.16.0.0/12  - Private range (medium)
192.168.0.0/16 - Private range (small)
100.64.0.0/10  - Carrier-grade NAT (avoid in VPCs)

Multi-VPC CIDR Strategy

Production:  10.0.0.0/16
Staging:     10.1.0.0/16
Development: 10.2.0.0/16
Shared:      10.100.0.0/16

# Leaves room for:
# - VPC peering (non-overlapping required)
# - Transit Gateway connections
# - On-prem connectivity

Standard Multi-AZ Architecture

Subnet Layout

VPC: 10.0.0.0/16

Public Subnets (internet-facing):
  10.0.0.0/24   - us-east-1a (public-a)
  10.0.1.0/24   - us-east-1b (public-b)
  10.0.2.0/24   - us-east-1c (public-c)

Private Subnets (applications):
  10.0.10.0/24  - us-east-1a (private-a)
  10.0.11.0/24  - us-east-1b (private-b)
  10.0.12.0/24  - us-east-1c (private-c)

Database Subnets (isolated):
  10.0.20.0/24  - us-east-1a (database-a)
  10.0.21.0/24  - us-east-1b (database-b)
  10.0.22.0/24  - us-east-1c (database-c)

Terraform Implementation

# variables.tf
variable "vpc_cidr" {
  default = "10.0.0.0/16"
}

variable "azs" {
  default = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

# main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "production-vpc"
  }
}

# Public Subnets
resource "aws_subnet" "public" {
  count                   = length(var.azs)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.azs[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "public-${var.azs[count.index]}"
    Tier = "public"
  }
}

# Private Subnets
resource "aws_subnet" "private" {
  count             = length(var.azs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
  availability_zone = var.azs[count.index]

  tags = {
    Name = "private-${var.azs[count.index]}"
    Tier = "private"
  }
}

# Database Subnets
resource "aws_subnet" "database" {
  count             = length(var.azs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 20)
  availability_zone = var.azs[count.index]

  tags = {
    Name = "database-${var.azs[count.index]}"
    Tier = "database"
  }
}

Routing

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
}

# NAT Gateways (one per AZ for HA)
resource "aws_eip" "nat" {
  count  = length(var.azs)
  domain = "vpc"
}

resource "aws_nat_gateway" "main" {
  count         = length(var.azs)
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  depends_on = [aws_internet_gateway.main]
}

# Public Route Table
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "public-rt"
  }
}

resource "aws_route_table_association" "public" {
  count          = length(var.azs)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# Private Route Tables (one per AZ)
resource "aws_route_table" "private" {
  count  = length(var.azs)
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main[count.index].id
  }

  tags = {
    Name = "private-rt-${var.azs[count.index]}"
  }
}

resource "aws_route_table_association" "private" {
  count          = length(var.azs)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

Security Groups

Layered Security

# ALB Security Group
resource "aws_security_group" "alb" {
  name_prefix = "alb-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# Application Security Group
resource "aws_security_group" "app" {
  name_prefix = "app-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]  # Only from ALB
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# Database Security Group
resource "aws_security_group" "database" {
  name_prefix = "database-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]  # Only from app
  }

  # No egress needed for RDS
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

VPC Endpoints

Avoid NAT Gateway costs and improve security:

# Gateway Endpoints (free)
resource "aws_vpc_endpoint" "s3" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.us-east-1.s3"

  route_table_ids = concat(
    [aws_route_table.public.id],
    aws_route_table.private[*].id
  )
}

resource "aws_vpc_endpoint" "dynamodb" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.us-east-1.dynamodb"

  route_table_ids = concat(
    [aws_route_table.public.id],
    aws_route_table.private[*].id
  )
}

# Interface Endpoints (cost per hour + data)
resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.us-east-1.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true
}

resource "aws_vpc_endpoint" "ecr_dkr" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.us-east-1.ecr.dkr"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true
}

resource "aws_security_group" "vpc_endpoints" {
  name_prefix = "vpc-endpoints-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
  }
}

Multi-Region Architecture

Active-Passive DR

Primary Region (us-east-1)          DR Region (us-west-2)
┌─────────────────────────┐         ┌─────────────────────────┐
│  VPC: 10.0.0.0/16       │         │  VPC: 10.1.0.0/16       │
│                         │         │                         │
│  ┌─────┐   ┌─────┐     │         │  ┌─────┐   ┌─────┐     │
│  │ App │   │ DB  │     │ ------> │  │ App │   │ DB  │     │
│  │(Act)│   │(Pri)│     │ Replica │  │(Stb)│   │(Rep)│     │
│  └─────┘   └─────┘     │         │  └─────┘   └─────┘     │
└─────────────────────────┘         └─────────────────────────┘
         │                                   │
         └───────────── Route 53 ───────────┘
                    (Failover Routing)

Terraform Multi-Region

# providers.tf
provider "aws" {
  region = "us-east-1"
  alias  = "primary"
}

provider "aws" {
  region = "us-west-2"
  alias  = "dr"
}

# modules/vpc/main.tf
module "vpc_primary" {
  source = "./modules/vpc"
  
  providers = {
    aws = aws.primary
  }
  
  vpc_cidr = "10.0.0.0/16"
  azs      = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

module "vpc_dr" {
  source = "./modules/vpc"
  
  providers = {
    aws = aws.dr
  }
  
  vpc_cidr = "10.1.0.0/16"
  azs      = ["us-west-2a", "us-west-2b", "us-west-2c"]
}

VPC Peering (Cross-Region)

# Peering connection
resource "aws_vpc_peering_connection" "primary_to_dr" {
  provider    = aws.primary
  vpc_id      = module.vpc_primary.vpc_id
  peer_vpc_id = module.vpc_dr.vpc_id
  peer_region = "us-west-2"
  auto_accept = false

  tags = {
    Name = "primary-to-dr-peering"
  }
}

# Accept in DR region
resource "aws_vpc_peering_connection_accepter" "dr" {
  provider                  = aws.dr
  vpc_peering_connection_id = aws_vpc_peering_connection.primary_to_dr.id
  auto_accept               = true
}

# Routes in primary region
resource "aws_route" "primary_to_dr" {
  provider                  = aws.primary
  count                     = length(var.primary_azs)
  route_table_id            = module.vpc_primary.private_route_table_ids[count.index]
  destination_cidr_block    = module.vpc_dr.vpc_cidr
  vpc_peering_connection_id = aws_vpc_peering_connection.primary_to_dr.id
}

# Routes in DR region
resource "aws_route" "dr_to_primary" {
  provider                  = aws.dr
  count                     = length(var.dr_azs)
  route_table_id            = module.vpc_dr.private_route_table_ids[count.index]
  destination_cidr_block    = module.vpc_primary.vpc_cidr
  vpc_peering_connection_id = aws_vpc_peering_connection.primary_to_dr.id
}

Transit Gateway

For complex multi-VPC and hybrid architectures:

resource "aws_ec2_transit_gateway" "main" {
  description                     = "Main Transit Gateway"
  auto_accept_shared_attachments  = "enable"
  default_route_table_association = "enable"
  default_route_table_propagation = "enable"
  dns_support                     = "enable"
  vpn_ecmp_support               = "enable"
}

# Attach VPCs
resource "aws_ec2_transit_gateway_vpc_attachment" "production" {
  subnet_ids         = module.vpc_production.private_subnet_ids
  transit_gateway_id = aws_ec2_transit_gateway.main.id
  vpc_id            = module.vpc_production.vpc_id
}

resource "aws_ec2_transit_gateway_vpc_attachment" "staging" {
  subnet_ids         = module.vpc_staging.private_subnet_ids
  transit_gateway_id = aws_ec2_transit_gateway.main.id
  vpc_id            = module.vpc_staging.vpc_id
}

# Routes from VPCs to Transit Gateway
resource "aws_route" "production_to_tgw" {
  count                  = length(var.production_azs)
  route_table_id         = module.vpc_production.private_route_table_ids[count.index]
  destination_cidr_block = "10.0.0.0/8"  # All private ranges
  transit_gateway_id     = aws_ec2_transit_gateway.main.id
}

VPC Flow Logs

Monitor network traffic:

resource "aws_cloudwatch_log_group" "flow_logs" {
  name              = "/aws/vpc/flow-logs"
  retention_in_days = 30
}

resource "aws_iam_role" "flow_logs" {
  name = "vpc-flow-logs-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "vpc-flow-logs.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy" "flow_logs" {
  role = aws_iam_role.flow_logs.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents",
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams"
      ]
      Resource = "*"
    }]
  })
}

resource "aws_flow_log" "main" {
  iam_role_arn    = aws_iam_role.flow_logs.arn
  log_destination = aws_cloudwatch_log_group.flow_logs.arn
  traffic_type    = "ALL"
  vpc_id          = aws_vpc.main.id
}

Cost Optimization

NAT Gateway Costs

NAT Gateways are expensive (~$32/month + $0.045/GB):

  1. Use VPC Endpoints — S3, DynamoDB endpoints are free
  2. Single NAT for non-prod — One NAT per VPC for dev/staging
  3. NAT Instance — EC2 NAT instance for very low traffic
# Single NAT for non-production
resource "aws_nat_gateway" "single" {
  count = var.environment == "production" ? length(var.azs) : 1
  # ...
}

Interface Endpoint Consolidation

# Commonly needed endpoints
locals {
  interface_endpoints = var.environment == "production" ? [
    "ecr.api", "ecr.dkr", "logs", "secretsmanager", 
    "ssm", "ssmmessages", "ec2messages"
  ] : [
    "ecr.api", "ecr.dkr"  # Minimal for non-prod
  ]
}

When NOT to Separate Subnets

Sometimes simpler is better:

  • Very small applications — Single public/private subnet tier
  • Serverless architectures — Lambda in VPC is often unnecessary
  • Cost-sensitive development — Single NAT gateway is fine

Key Takeaways

  1. Size your VPC for growth — /16 gives room for expansion
  2. Plan CIDR ranges across accounts — Prevent peering conflicts
  3. Use 3 AZs minimum — AWS recommends 3 for production
  4. One NAT per AZ in production — AZ isolation prevents cascading failures
  5. Use VPC Endpoints — Save money, improve security
  6. Enable Flow Logs — Essential for debugging and security
  7. Layer security groups — ALB → App → DB, each only talks to adjacent layer

VPC design is foundational. Getting it right from the start saves painful migrations later.