Multi-Cloud Strategy: When and How
Should you go multi-cloud? Learn when multi-cloud makes sense, common patterns, pitfalls to avoid, and how to implement a practical multi-cloud architecture.
Multi-cloud is the new enterprise default — 89% of organizations use multiple cloud providers. But most are doing it wrong. They’re not getting redundancy or negotiating leverage; they’re getting complexity and higher costs. Let’s cut through the hype.
When Multi-Cloud Makes Sense
Valid Reasons
✅ Regulatory requirements (data residency)
✅ M&A - inherited different clouds
✅ Best-of-breed services (BigQuery for analytics, AWS for ML)
✅ Genuine disaster recovery across providers
✅ Vendor lock-in mitigation (for specific workloads)
Invalid Reasons
❌ "Avoid vendor lock-in" (you'll just be locked into 2 vendors)
❌ "Negotiate better pricing" (volume discounts work better)
❌ "All eggs in one basket" (within-cloud redundancy is fine)
❌ "Future flexibility" (YAGNI - You Aren't Gonna Need It)
Multi-Cloud Patterns
Pattern 1: Workload Isolation
Different workloads on different clouds, minimal cross-cloud communication.
┌─────────────────────┐ ┌─────────────────────┐
│ AWS │ │ GCP │
│ ┌───────────────┐ │ │ ┌───────────────┐ │
│ │ Web App │ │ │ │ Analytics │ │
│ │ (ECS) │ │ │ │ (BigQuery) │ │
│ └───────────────┘ │ │ └───────────────┘ │
│ ┌───────────────┐ │ │ ┌───────────────┐ │
│ │ API │ │ │ │ ML Training │ │
│ │ (Lambda) │ │ │ │ (Vertex AI) │ │
│ └───────────────┘ │ │ └───────────────┘ │
└─────────────────────┘ └─────────────────────┘
│ │
└─────── Data Sync ─────────┘
(async, batch)
Terraform for Workload Isolation
# providers.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
alias = "primary"
}
provider "google" {
project = var.gcp_project
region = "us-central1"
alias = "analytics"
}
# data_sync.tf - S3 to GCS replication
resource "aws_s3_bucket" "events" {
provider = aws.primary
bucket = "my-app-events"
}
resource "google_storage_bucket" "analytics" {
provider = google.analytics
name = "my-app-analytics-${var.gcp_project}"
location = "US"
}
# Use Cloud Functions to sync data
resource "google_cloudfunctions2_function" "sync" {
provider = google.analytics
name = "s3-sync"
location = "us-central1"
build_config {
runtime = "python312"
entry_point = "sync_from_s3"
source {
storage_source {
bucket = google_storage_bucket.functions.name
object = google_storage_bucket_object.sync_code.name
}
}
}
service_config {
max_instance_count = 10
available_memory = "256M"
timeout_seconds = 300
environment_variables = {
S3_BUCKET = aws_s3_bucket.events.id
GCS_BUCKET = google_storage_bucket.analytics.name
AWS_REGION = "us-east-1"
}
}
}
Pattern 2: Active-Active Multi-Region
Same workload running on multiple clouds for true redundancy.
┌─────────────┐
│ Global │
│ Load │
│ Balancer │
└──────┬──────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ AWS │ │ GCP │ │ Azure │
│ us-east-1 │ │ us-central1 │ │ eastus │
│ ┌───────┐ │ │ ┌───────┐ │ │ ┌───────┐ │
│ │ App │ │ │ │ App │ │ │ │ App │ │
│ └───────┘ │ │ └───────┘ │ │ └───────┘ │
│ ┌───────┐ │ │ ┌───────┐ │ │ ┌───────┐ │
│ │ DB │◄─┼─┼──│ DB │──┼─┼─►│ DB │ │
│ └───────┘ │ │ └───────┘ │ │ └───────┘ │
└─────────────┘ └─────────────┘ └─────────────┘
Using Cloudflare as Global Load Balancer
resource "cloudflare_load_balancer" "multi_cloud" {
zone_id = var.cloudflare_zone_id
name = "api.example.com"
fallback_pool_id = cloudflare_load_balancer_pool.aws.id
default_pool_ids = [
cloudflare_load_balancer_pool.aws.id,
cloudflare_load_balancer_pool.gcp.id,
cloudflare_load_balancer_pool.azure.id,
]
session_affinity = "cookie"
adaptive_routing {
failover_across_pools = true
}
rules {
name = "geo-steering"
condition = "http.request.headers[\"cf-ipcountry\"][0] in {\"US\" \"CA\" \"MX\"}"
overrides {
default_pools = [cloudflare_load_balancer_pool.aws.id]
}
}
rules {
name = "geo-steering-eu"
condition = "http.request.headers[\"cf-ipcountry\"][0] in {\"DE\" \"FR\" \"GB\"}"
overrides {
default_pools = [cloudflare_load_balancer_pool.gcp.id]
}
}
}
resource "cloudflare_load_balancer_pool" "aws" {
name = "aws-pool"
origins {
name = "aws-primary"
address = aws_lb.main.dns_name
enabled = true
}
monitor = cloudflare_load_balancer_monitor.http.id
}
resource "cloudflare_load_balancer_pool" "gcp" {
name = "gcp-pool"
origins {
name = "gcp-primary"
address = google_compute_global_address.main.address
enabled = true
}
monitor = cloudflare_load_balancer_monitor.http.id
}
resource "cloudflare_load_balancer_monitor" "http" {
type = "http"
expected_body = "healthy"
expected_codes = "200"
method = "GET"
path = "/health"
interval = 60
retries = 2
timeout = 5
}
Pattern 3: Cloud-Agnostic with Kubernetes
# Application runs identically on any cloud
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
labels:
app: api
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: myregistry.io/api:v1.2.3
ports:
- containerPort: 8080
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Multi-Cluster Kubernetes with Submariner
# Install Submariner for cross-cluster networking
subctl deploy-broker --kubeconfig broker-kubeconfig
# Join clusters
subctl join --kubeconfig aws-kubeconfig broker-info.subm \
--clusterid aws-cluster \
--natt=false
subctl join --kubeconfig gcp-kubeconfig broker-info.subm \
--clusterid gcp-cluster \
--natt=false
# Export services across clusters
kubectl annotate service my-service \
submariner.io/globalnet-global-ip=<global-ip>
Abstraction Layers
Database Abstraction
# Abstract database interface
from abc import ABC, abstractmethod
class DatabaseClient(ABC):
@abstractmethod
def get(self, key: str) -> dict:
pass
@abstractmethod
def put(self, key: str, item: dict) -> None:
pass
class DynamoDBClient(DatabaseClient):
def __init__(self):
import boto3
self.table = boto3.resource('dynamodb').Table(os.environ['TABLE_NAME'])
def get(self, key: str) -> dict:
response = self.table.get_item(Key={'pk': key})
return response.get('Item')
def put(self, key: str, item: dict) -> None:
item['pk'] = key
self.table.put_item(Item=item)
class FirestoreClient(DatabaseClient):
def __init__(self):
from google.cloud import firestore
self.db = firestore.Client()
self.collection = self.db.collection(os.environ['COLLECTION_NAME'])
def get(self, key: str) -> dict:
doc = self.collection.document(key).get()
return doc.to_dict() if doc.exists else None
def put(self, key: str, item: dict) -> None:
self.collection.document(key).set(item)
# Factory
def get_database() -> DatabaseClient:
provider = os.environ.get('CLOUD_PROVIDER', 'aws')
if provider == 'aws':
return DynamoDBClient()
elif provider == 'gcp':
return FirestoreClient()
else:
raise ValueError(f"Unknown provider: {provider}")
Storage Abstraction
# Use fsspec for cloud-agnostic storage
import fsspec
def get_filesystem(cloud: str) -> fsspec.AbstractFileSystem:
if cloud == 'aws':
return fsspec.filesystem('s3')
elif cloud == 'gcp':
return fsspec.filesystem('gcs')
elif cloud == 'azure':
return fsspec.filesystem('abfs', account_name=os.environ['AZURE_ACCOUNT'])
else:
return fsspec.filesystem('file')
# Usage
fs = get_filesystem(os.environ['CLOUD_PROVIDER'])
with fs.open('bucket/path/file.json', 'r') as f:
data = json.load(f)
Cost Management
Unified Cost Visibility
# Export costs to BigQuery for unified analysis
resource "aws_cur_report_definition" "cost_report" {
report_name = "multi-cloud-costs"
time_unit = "HOURLY"
format = "Parquet"
compression = "Parquet"
additional_schema_elements = ["RESOURCES"]
s3_bucket = aws_s3_bucket.cost_reports.id
s3_prefix = "aws-costs"
s3_region = "us-east-1"
}
# GCP billing export
resource "google_bigquery_dataset" "billing" {
dataset_id = "billing_export"
project = var.billing_project
location = "US"
}
# Azure Cost Export
resource "azurerm_cost_management_scheduled_action" "export" {
name = "daily-export"
display_name = "Daily Cost Export"
view_id = azurerm_cost_management_view.main.id
email_addresses = [var.finops_email]
email_subject = "Daily Azure Costs"
message = "See attached cost report"
day_of_month = 1
frequency = "Daily"
start_date = timestamp()
end_date = timeadd(timestamp(), "8760h")
}
Pitfalls to Avoid
1. Data Egress Costs
AWS → GCP: $0.09/GB
GCP → AWS: $0.12/GB
Cross-cloud traffic at 1TB/day = ~$3,000/month!
Solution: Minimize cross-cloud data movement, use compression, batch transfers.
2. Operational Complexity
# Before: 1 cloud, 1 monitoring stack
monitoring:
- CloudWatch
# After: 3 clouds, 3+ monitoring stacks
monitoring:
- CloudWatch
- Cloud Monitoring
- Azure Monitor
- Datadog (unified) # Now you need this too
3. Security Surface Area
# Attack surface multiplies
identity_providers:
- AWS IAM
- GCP IAM
- Azure AD
secrets_managers:
- AWS Secrets Manager
- GCP Secret Manager
- Azure Key Vault
network_boundaries:
- AWS VPCs
- GCP VPCs
- Azure VNets
- Cross-cloud connections
Decision Framework
┌─────────────────────────────────────────────────────────┐
│ Need multi-cloud? │
└──────────────────────────┬──────────────────────────────┘
│
┌──────────────┴──────────────┐
▼ ▼
Regulatory/M&A Technical Choice
requirement? │
│ ┌────────┴────────┐
│ ▼ ▼
│ Best-of-breed "Avoid lock-in"
│ services? │
│ │ ▼
│ │ Just use one
│ │ cloud well
▼ ▼
┌───────────────────────────────────────┐
│ Multi-cloud with workload isolation │
│ (Pattern 1) │
└───────────────────────────────────────┘
Key Takeaways
- Default to single cloud — multi-cloud should be intentional, not accidental
- Workload isolation > Active-active — minimize cross-cloud coupling
- Abstract at the right level — Kubernetes, not custom wrappers everywhere
- Egress will kill you — design for minimal cross-cloud data transfer
- Unified observability is essential — one dashboard, not three
“Multi-cloud is like polygamy — theoretically possible, but you’ll spend all your time managing relationships instead of getting things done.”