Lesson 15: Deploying Distributed Systems — Docker Compose, Tunnels, and Cross-Network Connectivity

Prerequisites: Lessons 10 (The Sidecar Pattern), 12 (State Management Without a Database)

Introduction

Building a distributed system and deploying it are fundamentally different problems. This lesson traces the deployment spectrum for chatixia-mesh: from terminal windows to Docker Compose, Cloudflare Tunnel, and cross-network WebRTC connectivity.

1. The Deployment Spectrum

Each step solves problems the previous step could not handle.

Single machine (development): Run every component in separate terminals. No networking to debug, but no reproducibility, no isolation, and no way to share the setup.

Docker Compose (local multi-service): One declarative file defines all services, dependencies, and environment. docker compose up --build gives anyone with Docker the same environment. Limitation: everything runs on one machine.

Cross-network (tunnels): When agents span different networks, the registry must be internet-reachable via port forwarding, VPN, or a reverse tunnel like Cloudflare Tunnel. Agents themselves do not need to be reachable — WebRTC handles NAT traversal.

Cloud-native (Kubernetes): Adds auto-scaling, rolling deployments, and self-healing. chatixia-mesh does not ship K8s manifests yet. The key challenge: sidecar and agent must stay co-located (same pod) because they share an IPC socket.

Each step is justified by a concrete problem. Docker Compose was added (ADR-015) because onboarding required 3+ commands in different terminals — not because the system needed horizontal scaling.

2. Docker Compose for Multi-Service Systems

The chatixia-mesh docker-compose.yml defines four services: registry, sidecar, agent, and coturn (optional).

Key design patterns

Service DNS: Services reference each other by name. ws://registry:8080/ws resolves via Docker’s built-in DNS — no hardcoded IPs.

Health checks and dependency ordering: The registry health check hits /api/registry/agents every 10 seconds. The sidecar waits for service_healthy on the registry. The agent waits for a healthy registry and a started sidecar (service_started, since the sidecar has no HTTP endpoint).

Named volumes for IPC: The sidecar and agent share a Unix socket via a named volume mounted at /run/chatixia in both containers. This gives IPC without coupling container lifecycles — either can restart independently.

Why not share a network namespace? Because that couples lifecycles: restarting the sidecar tears down the agent’s network stack. Named volumes give IPC without coupling, following the same separation-of-concerns principle as the sidecar pattern itself.

Profiles for optional services: coturn is declared with profiles: [turn] and only starts with docker compose --profile turn up, keeping the default stack simple.

Environment variables with defaults: ${VAR:-default} syntax means docker compose up works out of the box, while production deployments override via .env or shell variables.

3. Multi-Stage Docker Builds

Multi-stage builds use one image for building and a minimal image for running. Only compiled artifacts are copied forward.

The registry: three stages

Stage 1 (Node.js): Builds hub static assets with pnpm build --frozen-lockfile.

Stage 2 (Rust): Compiles the registry binary. Uses a dependency caching trick: copy only Cargo.toml files first and build with stub main.rs files, so Docker caches the dependency layer. Subsequent builds with only source changes skip dependency compilation entirely (5+ minutes down to 30-60 seconds).

Stage 3 (Runtime): debian:bookworm-slim with only the compiled binary (~15 MB), hub assets (~2 MB), ca-certificates, and curl (for health checks). Result: ~50-80 MB, vs ~1.7 GB if build tools were included.

Small images are also a security concern — every package in the runtime image is attack surface. Build tools like gcc and make are exactly what an attacker exploits if they gain container access.

The agent: uv for Python

The agent Dockerfile uses COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv to grab the uv binary from a pre-built image, then uv sync --frozen --no-dev for reproducible dependency installation.

4. Exposing Services with Cloudflare Tunnel

Cloudflare Tunnel creates an outbound connection from your machine to Cloudflare’s edge. Traffic flows: Remote agent --> Cloudflare edge (HTTPS) --> Tunnel --> localhost:8080. No inbound ports, no firewall rules, no dynamic DNS.

Quick tunnel (temporary): cloudflared tunnel --url http://localhost:8080 gives a temporary URL. No account needed.

Persistent tunnel: Create a named tunnel, configure ingress rules mapping hostnames to local services, create a DNS record, and run as a systemd service for boot persistence.

Critical limitation: no UDP

Cloudflare Tunnel proxies HTTP/WebSocket (layer 7) but not raw UDP (layer 4). WebSocket signaling works through the tunnel. TURN relay does not — coturn needs UDP on port 3478 and must be on a host with a public IP or port-forwarded UDP. The registry can hide behind a tunnel, but the TURN relay cannot.

5. Cross-Network Connectivity

When agents span different networks, the transport layer encounters NAT and firewalls. chatixia-mesh handles this with three connectivity tiers that degrade gracefully:

Tier	Path	Latency	When used
1	Direct P2P DataChannel	<100ms	Both peers have open UDP (same LAN, permissive NAT)
2	TURN relay	~50-200ms	NAT/firewall blocks direct UDP, TURN available
3	HTTP task queue via registry	3-15s	All UDP blocked, no TURN configured

The system never fails — it only slows down. ICE negotiation tries Tier 1, falls back to Tier 2 if TURN is configured, and the application layer falls back to Tier 3 if no DataChannel forms.

Enterprise VPNs typically land on Tier 2 (with TURN) or Tier 3 (without). Home-to-home connections often achieve Tier 1.

6. TURN Relay Setup

coturn in the Compose file uses --use-auth-secret for ephemeral credentials. The registry generates time-limited TURN credentials using HMAC-SHA1 over a shared secret — clients never see the secret itself.

coturn must be reachable on UDP 3478 from both peers, meaning a public IP or port-forwarded UDP. Managed alternatives (Metered.ca, Xirsys, Twilio) work with the registry’s credential generation if they support use-auth-secret mode.

If all agents are on the same LAN or you accept Tier 3 latency, you can skip TURN entirely.

Summary

Level	Method	Solves	Does not solve
Dev	Manual terminals	Quick iteration	Reproducibility
Local	Docker Compose	Reproducibility, ordering, isolation	Cross-network
Cross-network	Compose + Tunnel + TURN	Internet-reachable registry, NAT traversal	Auto-scaling
Production	Kubernetes	All above + orchestration	Complexity budget

Key takeaways: Multi-stage builds shrink images from gigabytes to tens of megabytes. Health checks prevent startup race conditions. Named volumes share IPC without coupling lifecycles. Cloudflare Tunnel provides zero-trust registry access but cannot proxy UDP. Connectivity degrades gracefully through three tiers. Each deployment step is justified by a concrete problem.

Previous: Lesson 14: Threat Modeling | Next: Lesson 16: Architecture Decision Records

Deploying Distributed Systems -- Docker Compose, Tunnels, and Cross-Network Connectivity