Terraform state is the map to your infrastructure. Lose it, corrupt it, or lock it wrong, and your team stops deploying. Yet most teams treat remote state as an afterthought: throw it in S3, add a lock table, and hope for the best.

This guide covers the real decisions: which backend fits your team, what breaks, how to detect corruption, and how to recover when state goes sideways.


Local State vs Remote: Why Remote Matters

Local State Risks

# Every developer has their own state file
terraform apply
# State now lives only on this machine
ls -la terraform.tfstate

Problems:

  • No single source of truth — Alice’s state diverges from Bob’s
  • No audit trail — who changed what and when?
  • No locking — concurrent applies destroy the state file
  • No backup — one rm command deletes everything
  • No secrets versioning — sensitive values drift

Remote State Benefits

terraform {
  required_version = ">= 1.5"
  
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
    acl            = "private"
  }
}

With remote state:

  • Centralized: one source of truth
  • Locked: DynamoDB prevents concurrent applies
  • Versioned: S3 versioning tracks state history
  • Encrypted: at-rest and in-transit encryption
  • Auditable: CloudTrail logs all state access

S3 + DynamoDB: The Team Standard

This is the most common production pattern for on-premises teams. It’s cheap, simple, and you control the credentials.

1. Create the State Bucket

# backend-setup/main.tf
# Run this ONCE with local state, then migrate everything to it

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

# State bucket
resource "aws_s3_bucket" "terraform_state" {
  bucket = "my-org-terraform-state"
}

# Enable versioning — critical for recovery
resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  versioning_configuration {
    status = "Enabled"
  }
}

# Block public access — never, ever, ever expose state
resource "aws_s3_bucket_public_access_block" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# Server-side encryption with managed keys
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

# Lifecycle policy — delete old versions after 90 days to save costs
resource "aws_s3_bucket_lifecycle_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  rule {
    id     = "delete-old-versions"
    status = "Enabled"

    noncurrent_version_expiration {
      noncurrent_days = 90
    }
  }
}

# DynamoDB table for state locking
resource "aws_dynamodb_table" "terraform_locks" {
  name             = "terraform-locks"
  billing_mode     = "PAY_PER_REQUEST"
  hash_key         = "LockID"
  stream_enabled   = true
  stream_view_type = "NEW_AND_OLD_IMAGES"

  attribute {
    name = "LockID"
    type = "S"
  }

  tags = {
    Name    = "Terraform State Locks"
    Purpose = "State locking and diagnosing deadlocks"
  }
}

# CloudTrail for audit — who accessed state and when
resource "aws_s3_bucket_logging" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  target_bucket = aws_s3_bucket.terraform_state.id
  target_prefix = "logs/"
}

output "state_bucket" {
  value = aws_s3_bucket.terraform_state.id
}

output "locks_table" {
  value = aws_dynamodb_table.terraform_locks.name
}

2. Migrate Local State to S3

# Step 1: Add remote backend config to your working Terraform directory
cat >> main.tf <<'EOF'
terraform {
  backend "s3" {
    bucket         = "my-org-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}
EOF

# Step 2: Initialize — Terraform asks if you want to migrate
terraform init
# Output: Do you want to copy existing state to the new backend?
# Answer: yes

# Step 3: Verify state is now remote
terraform state list
# Should succeed — proof that state moved

# Step 4: Delete local state (only after verification!)
rm terraform.tfstate terraform.tfstate.backup

3. IAM Policy for Backend Access

Never give root credentials to Terraform. Use a dedicated IAM user or role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3StateAccess",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketVersioning"
      ],
      "Resource": "arn:aws:s3:::my-org-terraform-state"
    },
    {
      "Sid": "S3StateObjectAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucketVersions"
      ],
      "Resource": "arn:aws:s3:::my-org-terraform-state/*"
    },
    {
      "Sid": "DynamoDBLocking",
      "Effect": "Allow",
      "Action": [
        "dynamodb:DescribeTable",
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:DeleteItem"
      ],
      "Resource": "arn:aws:dynamodb:us-east-1:ACCOUNT-ID:table/terraform-locks"
    }
  ]
}

Avoiding State Lock Deadlocks

Deadlocked state is the most common failure mode. A failed apply leaves a lock in DynamoDB, and no one can deploy.

How Locks Work

# When you run terraform apply, Terraform:
# 1. Writes LockID to DynamoDB with a TTL
terraform apply
# DynamoDB now holds:
# {
#   "LockID": { "S": "prod/terraform.tfstate" },
#   "Digest": { "S": "..." },
#   "Info": { "S": "Created by alice at 2026-04-25T10:30:00Z" }
# }

# 2. Does the plan and apply
# 3. Deletes the lock when done

If step 3 fails (network dies, Terraform crashes), the lock persists forever.

Detect Deadlocks

# Check what locks exist
aws dynamodb scan --table-name terraform-locks \
  --region us-east-1 \
  --output table

# Output:
# |-----------------------|----|
# | LockID                | ID |
# |---------+-----------+----|
# | prod... | xyz123   |
# |---------+-----------+----|

Force-Unlock (Carefully)

# ONLY do this if you're sure the lock holder crashed
# Verify no one is actually running terraform apply right now

# Get the lock ID
LOCK_ID="prod/terraform.tfstate"

# Option 1: terraform force-unlock (safest)
terraform force-unlock xyz123

# Option 2: Delete from DynamoDB (nuclear)
aws dynamodb delete-item \
  --table-name terraform-locks \
  --key "LockID={S=$LOCK_ID}" \
  --region us-east-1

# Verify it's gone
aws dynamodb get-item \
  --table-name terraform-locks \
  --key "LockID={S=$LOCK_ID}" \
  --region us-east-1
# Returns empty if successful

Prevent Deadlocks

# In production, always add timeout to CI/CD
terraform {
  backend "s3" {
    skip_credentials_validation = false
    skip_metadata_api_check     = false
    skip_requesting_account_id  = false
  }
}

# In CI/CD, set a timeout and unlock on failure
# .github/workflows/deploy.yml
jobs:
  terraform:
    runs-on: ubuntu-latest
    timeout-minutes: 15  # Timeout and release lock
    steps:
      - uses: hashicorp/setup-terraform@v2
      
      - name: Apply
        run: |
          terraform apply -auto-approve || {
            echo "Apply failed, force-unlocking"
            terraform force-unlock -force $(echo "$LOCK_INFO" | jq -r '.ID')
            exit 1
          }
        env:
          TF_LOCK_TIMEOUT: "5m"

State Corruption and Recovery

State files are JSON. Corruption is rare but catastrophic.

Detect Corruption

# Download and inspect state locally
aws s3 cp s3://my-org-terraform-state/prod/terraform.tfstate - | jq . > state.json

# Check for obvious signs:
# - Incomplete JSON (unclosed bracket, etc.)
# - Missing required fields (version, resources, etc.)
# - Null or garbage in sensitive values

# Terraform also detects corruption on init
terraform init
# Error: Error reading state file

Recover from Corruption

# Option 1: Restore from S3 versioning (best case)
# List versions
aws s3api list-object-versions \
  --bucket my-org-terraform-state \
  --prefix prod/terraform.tfstate

# Output:
# {
#   "Versions": [
#     { "VersionId": "abc123", "LastModified": "2026-04-25T10:00:00Z", "Size": 50000 },
#     { "VersionId": "def456", "LastModified": "2026-04-24T15:00:00Z", "Size": 50000 }
#   ]
# }

# Restore an earlier version
aws s3api get-object \
  --bucket my-org-terraform-state \
  --key prod/terraform.tfstate \
  --version-id abc123 \
  prod-terraform-backup.tfstate

# Validate the backup
jq . prod-terraform-backup.tfstate | head -20

# Put it back (make sure no one is applying!)
aws s3 cp prod-terraform-backup.tfstate \
  s3://my-org-terraform-state/prod/terraform.tfstate

# Verify
terraform init
terraform plan  # Should show drift if state is now older

Drift After Recovery

After restoring an old state, resources you created between the old state and now won’t be in the state file. Terraform will try to recreate them.

# After restoring an older state version:
terraform plan

# Output:
# Plan: 3 to add, 0 to change, 5 to destroy

# This is the "drift" — your actual infrastructure doesn't match the restored state
# Options:
# 1. Apply and let Terraform fix it (risky — might delete prod resources)
# 2. Refresh and reconcile manually
# 3. Import missing resources back into state

# Option 3: re-import resources
terraform import aws_instance.web i-0123456789abcdef0
terraform plan  # Now shows no changes

Terraform Cloud / Terraform Enterprise

For teams that want hosted backends with RBAC, audit, and cost estimation.

Setup

terraform {
  cloud {
    organization = "my-org"
    
    workspaces {
      name = "prod"
    }
  }
}

.terraformrc

credentials "app.terraform.io" {
  token = "..."  # From https://app.terraform.io/app/settings/tokens
}

Advantages

  • No S3 setup — Hashicorp handles encryption and backups
  • RBAC — per-workspace permissions, cost centers
  • Audit — all runs are logged with who/when/what
  • Cost estimation — Terraform Cloud estimates costs before apply
  • Drift detection — continuous compliance monitoring
  • State versioning — built-in

Disadvantages

  • Vendor lock-in — state tied to Terraform Cloud
  • Network dependency — offline applies are harder
  • Cost — free tier limited, paid plans add up
  • Data residency — state lives in Hashicorp’s data centers

Multi-Environment Pattern

Most teams manage dev, staging, prod with separate state files.

# terraform/dev/main.tf
terraform {
  backend "s3" {
    bucket         = "my-org-terraform-state"
    key            = "dev/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

# terraform/prod/main.tf
terraform {
  backend "s3" {
    bucket         = "my-org-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

This isolates state and prevents dev changes from affecting prod. Each developer can use cd terraform/prod && terraform plan independently.

Terraform workspaces allow multiple state files in one directory:

terraform workspace list
# default
# staging
# prod

terraform workspace select prod
terraform apply

Avoid this for multi-environment setups. Workspaces are easy to misuse (apply to wrong workspace), and state isolation is less clear. Separate directories are safer.


Secrets in State

State files contain sensitive values: passwords, API keys, database credentials.

# When you do this:
resource "aws_db_instance" "main" {
  password = "super-secret-123"
}

# It ends up in state as plaintext:
terraform state show aws_db_instance.main
# password = "super-secret-123"

Mitigation

  1. Never hardcode secrets — use AWS Secrets Manager or similar
# Bad
resource "aws_db_instance" "main" {
  password = "super-secret-123"  # NEVER
}

# Good
data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "prod/db-password"
}

resource "aws_db_instance" "main" {
  password = data.aws_secretsmanager_secret_version.db_password.secret_string
}
  1. Encrypt state at rest (already done with S3 SSE)

  2. Use sensitive() for outputs

output "db_password" {
  value     = aws_db_instance.main.password
  sensitive = true  # Redacted in logs and output
}
  1. Rotate credentials regularly — Secrets Manager handles this

  2. Audit state access — CloudTrail logs who accessed state


Inspecting State Safely

Sometimes you need to look at raw state to debug issues.

# Download state (keep it local, never commit it!)
aws s3 cp s3://my-org-terraform-state/prod/terraform.tfstate - > prod.tfstate

# View a specific resource
terraform state show aws_instance.web

# View raw JSON
jq '.resources[] | select(.type=="aws_instance")' prod.tfstate

# Count resources by type
jq '[.resources[].type] | group_by(.) | map({type: .[0], count: length})' prod.tfstate

# Find resources with a specific tag
jq '.resources[] | select(.instances[0].attributes.tags.Name=="prod-db")' prod.tfstate

# Clean up — never leave state files lying around
rm prod.tfstate

When State Goes Sideways: A Checklist

  1. Lock deadlockterraform force-unlock, check CI/CD timeout
  2. Corruption → restore from S3 versioning, import missing resources
  3. Driftterraform refresh, terraform import, re-sync
  4. Secrets exposed → rotate them immediately, check CloudTrail for access
  5. Unauthorized access → check IAM, review CloudTrail logs, re-encrypt state
  6. Lost state → if no backup, reconstruct from terraform import (painful)

Key Takeaways

  1. Use S3 + DynamoDB — it’s the standard for multi-person teams
  2. Enable versioning — recovery from corruption depends on it
  3. Use force-unlock sparingly — only after verifying no one is applying
  4. Test migration — never move state to a new backend without backup
  5. Audit state access — CloudTrail tells you who touched what
  6. Separate environments by directory — not by workspace
  7. Encrypt state at rest and in transit — credentials live there
  8. Never commit state files.gitignore terraform.tfstate*
  9. Document your backend setup — recovery is hard without docs
  10. Have a disaster recovery plan — test it before you need it

“Your infrastructure is only as reliable as your state file. Treat it like your source code: version it, backup it, audit it, and never expose it.”