BRO SRE

Reliability practices, infrastructure, automation

← Back to articles

OpenTelemetry Collector Pipelines: A Production-Ready Architecture

2026-01-15 · OpenTelemetry, Observability, Architecture

When we first adopted OpenTelemetry, we started with a single collector instance receiving traces from a handful of services. Within six months, we were processing over two million spans per second across four clusters. The journey from that initial deployment to a production-grade pipeline taught us hard lessons about architecture, sampling, and the subtle art of not drowning in telemetry data.

Agent vs. Gateway: Why We Use Both

The OpenTelemetry Collector can be deployed in two primary patterns: as an agent (a sidecar or DaemonSet running alongside your applications) and as a gateway (a standalone, centralized service). After experimenting with both in isolation, we settled on a two-tier architecture that combines the strengths of each.

The agent tier runs as a Kubernetes DaemonSet on every node. Its job is minimal: receive spans over OTLP from local pods, attach resource attributes (node name, cluster, availability zone), and forward everything to the gateway tier with minimal processing. Keeping the agent lightweight is critical. A memory-hungry agent competes with your workloads for node resources, and a crashing agent means you lose telemetry for every pod on that node.

The gateway tier is where the real work happens. We run it as a Deployment with horizontal pod autoscaling, fronted by an internal load balancer. This tier handles sampling decisions, span filtering, attribute enrichment, and multi-backend export. Centralizing these operations means we change pipeline logic in one place rather than rolling out DaemonSet updates across every node.

# Agent DaemonSet - minimal config
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
processors:
  resourcedetection:
    detectors: [env, system, gcp]
    timeout: 5s
  batch:
    send_batch_size: 1024
    timeout: 200ms
exporters:
  otlp:
    endpoint: otel-gateway.observability.svc:4317
    tls:
      insecure: false
      ca_file: /etc/ssl/certs/ca.crt
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resourcedetection, batch]
      exporters: [otlp]

Pipeline Configuration: Receivers, Processors, Exporters

A collector pipeline is a directed graph with three stages. Getting the order of processors right matters more than most documentation suggests.

Our gateway pipeline processes spans in this order: memory_limiter first (always), then k8sattributes for pod metadata enrichment, filter to drop noise, tail_sampling for intelligent retention decisions, transform for attribute normalization, and finally batch before export. The memory limiter must come first because it is the safety valve that prevents the collector from being OOM-killed under load spikes.

# Gateway pipeline - processing chain
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 4096
    spike_limit_mib: 512
  k8sattributes:
    auth_type: "serviceAccount"
    extract:
      metadata:
        - k8s.pod.name
        - k8s.namespace.name
        - k8s.deployment.name
  filter/drop-health:
    traces:
      span:
        - 'attributes["http.route"] == "/healthz"'
        - 'attributes["http.route"] == "/readyz"'
        - 'attributes["http.route"] == "/metrics"'
  tail_sampling:
    decision_wait: 10s
    num_traces: 200000
    policies:
      - name: errors-always
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-requests
        type: latency
        latency: {threshold_ms: 2000}
      - name: baseline-sample
        type: probabilistic
        probabilistic: {sampling_percentage: 5}

Tail-Based Sampling: The Hard Parts

Head-based sampling (deciding at the start of a trace whether to keep it) is simple but blind. You cannot know at trace creation time whether this request will be the one that fails or takes 30 seconds. Tail-based sampling waits until the trace is complete, examines it, and then decides.

The cost is memory. The collector must buffer all spans for a trace until the decision_wait timer expires. At two million spans per second with a 10-second wait window, that is roughly 20 million spans in memory. We set num_traces to 200,000 concurrent traces, which in our traffic pattern covers the p99 trace duration with headroom.

Our sampling policy is layered. All error traces are kept. All traces with any span exceeding two seconds are kept. Everything else is sampled at 5%. This gives us complete visibility into failures and performance outliers while keeping storage costs manageable. Before implementing this strategy, our Tempo storage was growing at 800 GB per day. After: 60 GB per day, with no loss of diagnostic value.

Filtering Noisy Spans

Health check endpoints, readiness probes, and Prometheus scrapes generate enormous volumes of spans with zero diagnostic value. The filter processor drops these before they reach the sampling stage, which both reduces memory pressure on the sampler and avoids polluting your trace data.

We also discovered that certain client libraries generate internal spans for connection pool management and DNS resolution. These are useful during initial debugging but become noise at scale. We filter spans where span.kind is INTERNAL and the span name matches patterns like dns.resolve or connection.checkout.

Multi-Backend Export: Jaeger and Grafana Tempo

We export traces to two backends simultaneously. Grafana Tempo serves as our long-term storage (30-day retention, S3-backed, cost-effective for high cardinality queries via TraceQL). Jaeger runs in-memory with a 48-hour retention window for real-time debugging during incidents. The dual-export setup uses the otlp exporter for Tempo and the jaeger exporter for Jaeger, both attached to the same pipeline.

exporters:
  otlp/tempo:
    endpoint: tempo-distributor.observability.svc:4317
    tls:
      insecure: true
  jaeger:
    endpoint: jaeger-collector.observability.svc:14250
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors:
        - memory_limiter
        - k8sattributes
        - filter/drop-health
        - tail_sampling
        - transform
        - batch
      exporters: [otlp/tempo, jaeger]

Resource Detection and Attribute Enrichment

Raw spans from application code typically carry service name and little else. The resourcedetection processor on the agent tier automatically discovers cloud provider metadata (project ID, region, availability zone on GCP; account ID, region on AWS). The k8sattributes processor on the gateway tier adds Kubernetes context: pod name, namespace, deployment, and labels.

This enrichment is what makes traces queryable. Without it, finding all traces from a specific deployment in a specific namespace requires application developers to manually inject those attributes, which they will not consistently do.

Scaling to 2M Spans per Second

At high throughput, the gateway tier becomes the bottleneck. We addressed this with several changes. First, we enabled gRPC with compression (gzip) between agents and gateways, cutting network bandwidth by roughly 60%. Second, we tuned the batch processor to send batches of 2048 spans every 200 milliseconds, reducing the number of export calls. Third, we ran the gateway as 12 replicas behind a headless service, using the loadbalancing exporter on agents to distribute traces by trace ID. This ensures all spans for a given trace land on the same gateway instance, which is a requirement for tail-based sampling to work correctly.

# Agent exporter for trace-aware load balancing
exporters:
  loadbalancing:
    protocol:
      otlp:
        tls:
          insecure: false
          ca_file: /etc/ssl/certs/ca.crt
    resolver:
      dns:
        hostname: otel-gateway-headless.observability.svc
        port: 4317

We monitor collector health with dedicated metrics: otelcol_exporter_sent_spans, otelcol_processor_dropped_spans, otelcol_receiver_refused_spans, and memory utilization. A drop in sent_spans or a spike in refused_spans triggers an alert. The collectors themselves export Prometheus metrics on port 8888, scraped by our existing monitoring stack.

Lessons Learned

Three things we wish we had known from the start. First, always set memory_limiter as the first processor. We learned this after an OOM kill cascade took out the entire gateway tier during a traffic spike. Second, tail-based sampling and the loadbalancing exporter are inseparable at scale; without trace-aware routing, sampling decisions are made on incomplete traces and produce incoherent results. Third, invest in filtering before sampling. Every noisy span that reaches the sampler consumes memory and budget that could be spent on spans that actually matter.

The OpenTelemetry Collector is not a deploy-and-forget component. It is infrastructure that requires the same rigor you apply to your databases and load balancers. Treat it accordingly, and it will serve as the backbone of an observability platform that scales with your organization.