Terraform Modules That Don't Suck: A Design Guide

2026-05-04 · Terraform, IaC, Architecture

Every organization that has used Terraform for more than a year has accumulated a graveyard of modules. Modules that take 47 parameters. Modules that are deeply coupled to one specific team's needs. Modules that wrap a single resource for no apparent reason. Modules that work in development but fail mysteriously when applied to production. The Terraform ecosystem has matured enormously, but module design remains an under-discussed discipline. This guide collects principles that have produced modules our teams actually want to use.

The Two Types of Modules

The first conceptual clarification: not all modules serve the same purpose. We classify them into two categories, with very different design rules.

Primitive modules wrap a small number of related resources to provide sensible defaults and reduce boilerplate. An S3 bucket module that configures encryption, versioning, and lifecycle policies is a primitive. These should have minimal opinions, expose most provider options, and be composable.

Pattern modules implement an architectural pattern: a complete VPC with subnets and routing, a database with backup and monitoring, a service deployment with load balancer and certificates. These should be highly opinionated, expose few options, and represent an organizational standard.

Mixing these categories is the source of most module suffering. A module that tries to be both flexible enough to use anywhere and opinionated enough to enforce standards ends up doing neither well.

Parameter Count as a Smell

If your module accepts more than ten input variables, it is probably trying to do too much. The instinct when designing a module is to expose every possible knob "in case someone needs it." This produces modules that are simultaneously rigid (because the knobs do not quite line up with what people actually want to configure) and complex (because every consumer must understand the full surface).

Our heuristic: design the module to handle the 80% case with three to five required inputs. For the 20% case, either provide a separate module variant or expect consumers to use resources directly. The escape hatch of "use the underlying resource" is healthier than a module with 30 parameters that occasionally produces invalid configurations.

Outputs Are Part of Your API

Module outputs receive far less design attention than they deserve. Once consumers start referencing your outputs, you cannot rename or remove them without breaking changes. Some principles:

Output the resource attributes that consumers will need to reference from elsewhere. Output IDs, ARNs, endpoints -- not implementation details.
Use stable names that describe the conceptual thing, not the underlying resource. output.database_endpoint survives a migration from RDS to Aurora; output.rds_instance_endpoint does not.
Resist the urge to output everything. If consumers do not need it, omit it. Adding outputs later is easy; removing them is painful.
Mark sensitive outputs with sensitive = true. This protects against accidental logging and is essentially free.

Version Pinning, Both Ways

Modules should pin their provider version requirements, and consumers should pin their module versions. Both directions matter.

terraform {
  required_version = ">= 1.6.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.40"
    }
  }
}

The ~> operator allows minor and patch updates while preventing major version jumps. This is the right tradeoff for most providers: bug fixes flow through, breaking changes require explicit action.

For module consumers, use exact version pinning in production:

module "database" {
  source  = "git::https://git.example.com/infra/modules.git//database?ref=v2.4.1"
  ...
}

The ref=v2.4.1 ensures that a module update does not silently change behavior on the next terraform apply. We have seen modules introduce subtle behavioral changes in patch releases. Pinning means upgrades are deliberate.

The Mutable Default Trap

A subtle source of bugs: modules that compute defaults dynamically from data sources. Consider a module that defaults its subnet selection to "all subnets in the VPC":

data "aws_subnets" "default" {
  filter {
    name   = "vpc-id"
    values = [var.vpc_id]
  }
}

locals {
  subnet_ids = length(var.subnet_ids) > 0 ? var.subnet_ids : data.aws_subnets.default.ids
}

This works correctly the first time. But when someone adds a new subnet to the VPC for unrelated reasons, the next terraform apply notices the change and tries to re-deploy resources across the expanded subnet list. Often this is fine. Sometimes it causes unexpected downtime.

Better: require explicit subnet IDs as input. Make the consumer's choice visible in the configuration, not computed dynamically. This is a recurring pattern -- prefer explicit input over implicit discovery.

State Boundaries Matter More Than Module Boundaries

A common anti-pattern: organizing Terraform code by module, with each module having its own state file. This makes refactoring nearly impossible because moving resources between modules requires state migration operations.

Better: organize state files by lifecycle, not by module. Resources that are created and destroyed together should share state. Resources that have wildly different lifecycles (network foundations vs application deployments) should be separate.

A typical layout that has worked for us:

environments/
  prod/
    foundation/        # VPC, IAM, KMS keys - rarely changes
    data/              # Databases, caches - careful changes
    platform/          # Kubernetes clusters, ingress - moderate change
    applications/      # Per-service deployments - frequent change

Each directory has its own state file but may consume multiple modules. The boundaries reflect blast radius and change frequency, not implementation structure.

Testing: Beyond terraform plan

Terraform's built-in testing tools have matured considerably. The terraform test command (stable since Terraform 1.6) provides a real testing framework rather than the ad-hoc validation we used to do with shell scripts.

For primitive modules, write unit-style tests that exercise the module against the provider's plan-only mode:

run "validate_default_encryption" {
  command = plan
  variables {
    bucket_name = "test-bucket-name"
  }
  assert {
    condition     = aws_s3_bucket_server_side_encryption_configuration.this.rule[0].apply_server_side_encryption_by_default[0].sse_algorithm == "AES256"
    error_message = "Default encryption must be AES256"
  }
}

For pattern modules, integration tests against real cloud accounts are unavoidable. We use Terratest for this, deploying into an isolated test account with strict budget alerts. Tests are slow (deploying a VPC takes time) but they catch real integration issues that plan-only tests miss.

Documentation as a First-Class Output

Modules without good documentation are worse than no modules. They impose the cost of figuring out what they do without the benefit of standard patterns. terraform-docs auto-generates input/output reference, but that is the minimum. Good module documentation also includes:

What problem the module solves and what it does not.
A minimal complete example that actually works.
Common variations (high availability, multi-region) with example configurations.
Known limitations and gotchas.
Upgrade guides for breaking changes.

We enforce this through PR templates and module review checklists. A module without these sections does not get merged.

When to Stop Writing Modules

The hardest discipline is recognizing when not to create a module. If a resource configuration appears in only one place, do not abstract it. If a module would have one consumer, do not write it. If a "module" is really a stylized resource definition that adds no value beyond the underlying provider, delete it.

The marginal value of additional modules drops sharply once the obvious patterns are captured. Our team's most valuable modules are perhaps fifteen pattern modules and twenty primitive modules. Anything beyond that tends to be deprecated within a year of creation.

Modules are infrastructure abstractions, and like all abstractions, they have costs as well as benefits. Designed well, they accelerate teams and enforce standards. Designed poorly, they become technical debt that nobody wants to refactor. The principles above will not produce perfect modules, but they will avoid the worst failure modes that consume disproportionate engineering time.