Lesson 08: Authentication in Distributed Systems

API Keys, JWTs, and Device Pairing

Prerequisites: Lesson 05 — Signaling Protocol Design, Lesson 07 — Application Protocol Design

Key source files:

registry/src/auth.rs — AuthState, JWT issuance and validation, TURN credential generation
registry/src/pairing.rs — invite code generation, redemption, approval pipeline
registry/src/main.rs — WebSocket upgrade with JWT validation

1. Authentication vs Authorization

Every distributed system must answer two questions per request:

Authentication (AuthN): Who are you? Prove your identity.
Authorization (AuthZ): What are you allowed to do?

In a mesh network, the stakes are high — every authenticated peer can communicate with every other peer. Without authentication, attackers join the mesh. Without authorization, a legitimate peer can submit tasks to agents it should not control or exfiltrate data.

chatixia-mesh implements authentication at the HTTP and WebSocket layers but has intentional gaps in authorization. Understanding where those gaps are is a core objective of this lesson.

2. API Key to JWT Exchange

Agents authenticate via a two-step process: present a long-lived credential, receive a short-lived JWT.

Agent                              Registry
  |  POST /api/token                  |
  |  Header: X-API-Key: ak_dev_001   |
  |---------------------------------->|
  |                                   |  Look up key -> peer_id + role
  |                                   |  Sign JWT (exp = now + 300s)
  |  { token, peer_id, role }        |
  |<----------------------------------|
  |  GET /ws?token=eyJ...            |
  |---------------------------------->|  Validate JWT on upgrade
  |  <websocket established>          |

The handler (registry/src/auth.rs) checks X-API-Key first, then X-Device-Token (for paired agents). If neither is valid, it returns 401.

Short-Lived Tokens

The JWT has a 5-minute TTL (exp = now + 300). If intercepted from logs or network traffic, the attacker has at most 5 minutes. The sidecar re-exchanges transparently on expiry.

Claims

pub struct Claims {
    pub sub: String,  // peer_id
    pub role: String, // "agent"
    pub exp: usize,
    pub iat: usize,
}

The sub field is critical for sender verification: the registry checks that every signaling message’s peer_id matches the JWT’s sub claim.

3. JWT Mechanics

A JWT has three Base64url-encoded parts: Header (algorithm), Payload (claims), and Signature.

chatixia-mesh uses HMAC-SHA256 (symmetric): the same SIGNALING_SECRET signs and verifies tokens. Only the registry holds this secret. If it leaks, anyone can forge tokens.

On WebSocket upgrade, the registry validates:

Signature — HMAC valid?
Expiration — exp in the future?
Structure — deserializes to Claims?

The token is passed as ws?token=... (a query parameter) because browser WebSocket APIs do not support custom headers. The trade-off: tokens appear in server/proxy logs. The 5-minute TTL mitigates this — captured tokens are likely expired.

After upgrade, sender verification continues on every message:

if sm.peer_id != peer_id {
    error!("[WS] peer_id mismatch: expected={}, got={}", peer_id, sm.peer_id);
    continue;
}

4. Device Pairing

API keys require manual provisioning. Device pairing provides a zero-configuration onboarding flow using a 6-digit invite code.

The Flow

Admin                  Registry              New Agent
  |                       |                      |
  | POST /generate-code   |                      |
  |---------------------->|                      |
  | { code: "482917" }   |                      |
  |<----------------------|                      |
  |                       |                      |
  | (tell code to agent)  |                      |
  |                       | POST /pair           |
  |                       | { code, agent_name } |
  |                       |<---------------------|
  |                       | rate limit + validate |
  |                       | -> pending_approval   |
  |                       |--------------------->|
  |                       |                      |
  | POST /approve         |                      |
  |---------------------->|                      |
  |                       | generate device_token |
  | { device_token }      |                      |
  |<----------------------|                      |
  |                       |                      |
  | (deliver token)       | POST /api/token      |
  |                       | X-Device-Token: dt_...|
  |                       |<---------------------|
  |                       | { JWT, peer_id }     |
  |                       |--------------------->|

Step 1: Admin generates a 6-digit code (300s TTL, single-use).

Step 2: Agent redeems the code via the unauthenticated /pair endpoint. Three checks: rate limiting (5 attempts/IP/60s), format validation, code consumption. On success, a peer_id is assigned and status is pending_approval.

Step 3: Admin approves. The registry generates a device token (dt_ + 32 hex chars = 128 bits of randomness).

Step 4: Agent exchanges the device token for JWTs, same as API key agents.

Lifecycle States

pending_approval --approve--> approved --revoke--> revoked
                 \--reject--> rejected

Revocation is immediate: the device token becomes invalid on the next JWT exchange.

Rate Limiting

The pairing endpoint is the only unauthenticated path to mesh access. With 5 attempts per IP per 60 seconds and a 5-minute code TTL, a single attacker gets at most 25 guesses against 1,000,000 possible codes (0.0025% success probability). Even then, they only reach pending_approval — admin approval is still required.

5. Ephemeral TURN Credentials

When peers cannot connect directly, they relay through a TURN server. chatixia-mesh uses the coturn use-auth-secret pattern to generate short-lived credentials without a user database.

The registry and TURN server share TURN_SECRET. The registry generates credentials:

fn generate_turn_credentials(secret: &str, ttl_secs: u64) -> (String, String) {
    let expiry = now_secs() + ttl_secs;
    let username = format!("{}:mesh", expiry);
    let password = Base64(HMAC-SHA1(secret, username));
    (username, password)
}

The TURN server validates independently: parse expiry from the username, check it is not past, recompute HMAC-SHA1, compare. No shared database needed.

The 24-hour TTL balances security (leaked credentials expire) against operational convenience (agents do not need frequent re-fetches).

6. End-to-End Encryption via DTLS

WebRTC DataChannels are encrypted by DTLS (the UDP equivalent of TLS), providing confidentiality, integrity, and authentication — all without a PKI. Each sidecar generates a self-signed certificate on startup; the fingerprint is included in the SDP exchanged during signaling.

The critical architectural property: the registry cannot read DataChannel messages. It relays SDP/ICE during setup but never sees the symmetric keys negotiated during the DTLS handshake. Even a compromised registry cannot decrypt data-plane traffic (though it could perform a man-in-the-middle by substituting fingerprints during signaling — threat T3).

This contrasts with HTTP/gRPC models where the server terminates TLS and sees all plaintext. WebRTC’s end-to-end encryption means the system’s own infrastructure cannot inspect the data plane.

7. Threat Modeling

chatixia-mesh uses the STRIDE model. Key threats and their status:

Implemented Mitigations

Threat	Mitigation
Spoofing (T1): Unauthorized signaling access	JWT on WebSocket upgrade + sender verification
Tampering (T5): Task queue poisoning	Skill-based assignment + TTL expiration
Information Disclosure: DataChannel eavesdropping	DTLS end-to-end encryption

Known Gaps

Threat	Gap
T9: Information Disclosure	Registry GET endpoints (`/api/registry/agents`, `/api/registry/route`) are unauthenticated
T4: Denial of Service	No rate limiting except on pairing endpoint
T8: Tampering	`DELETE /api/registry/agents/{id}` is unauthenticated — any client can deregister any agent
T5: Authorization	Any authenticated agent can submit tasks to any other agent
T7: Elevation of Privilege	No sanitization of task payloads before LLM processing (prompt injection risk)

Authentication Boundary Summary

Authenticated:                  Unauthenticated:
- POST /api/token (key/token)   - GET /api/registry/agents
- GET /ws (JWT required)        - DELETE /api/registry/agents/{id}
- POST /generate-code (API key) - POST /api/hub/tasks
- POST /pair (rate-limited)     - POST /api/pairing/{id}/approve
                                - GET /api/hub/topology

The unauthenticated column is the current attack surface.

Summary

Core authentication flow: Long-lived credentials (API keys or device tokens) are exchanged for short-lived JWTs (5-minute TTL) that authenticate WebSocket connections and enable sender verification.

Device pairing provides zero-config onboarding: 6-digit invite code (rate-limited, single-use) leads to admin approval, then a device token that works like an API key.

TURN credentials use HMAC-SHA1 shared-secret validation with 24-hour TTL, requiring no user database.

DTLS provides end-to-end encryption on DataChannels without PKI — even the registry cannot read data-plane traffic.

STRIDE threat modeling reveals both implemented protections (JWT auth, rate-limited pairing, DTLS) and gaps (unauthenticated APIs, no task submission ACLs, prompt injection risk).

Previous: Lesson 07: Application Protocol Design | Next: Lesson 09: AI Agent Architecture