Lesson 03 — WebRTC Fundamentals: The Protocol Stack for Real-Time P2P

Prerequisites: Lesson 02 (Networking Foundations)

1. What is WebRTC?

WebRTC (Web Real-Time Communication) is an open standard for peer-to-peer communication. It defines three core APIs:

getUserMedia — captures audio/video from local devices. Not relevant to chatixia-mesh.
RTCPeerConnection — manages the full lifecycle of a P2P connection: ICE negotiation, DTLS encryption, and multiplexing of streams.
RTCDataChannel — sends arbitrary data (text, binary) between peers. This is the API chatixia-mesh uses.

WebRTC is not just for browsers. Server-side implementations exist in Go (Pion), Rust (webrtc-rs), Python (aiortc), and C++ (libwebrtc). chatixia-mesh uses webrtc-rs inside its Rust sidecar. The Python agent never touches WebRTC directly — it communicates with the sidecar over a Unix socket using JSON-line IPC.

2. The Protocol Stack

WebRTC builds a deep stack on top of UDP for DataChannels:

+---------------------------------------------+
|            Application (MeshMessage)         |   Your JSON messages
+---------------------------------------------+
|              DataChannel API                 |   Named channels, open/close events
+---------------------------------------------+
|                   SCTP                       |   Reliable/unreliable delivery,
|         (Stream Control Transmission)        |   message framing, flow control
+---------------------------------------------+
|                   DTLS                       |   Encryption + mutual authentication
+---------------------------------------------+
|                   ICE                        |   NAT traversal, connectivity checks
+---------------------------------------------+
|              STUN / TURN                     |   Discover public IP / relay fallback
+---------------------------------------------+
|                   UDP                        |   Unreliable datagrams
+---------------------------------------------+

UDP — The transport foundation. WebRTC chose UDP over TCP because real-time media cannot tolerate TCP’s head-of-line blocking. SCTP adds reliability back when needed.

ICE — Gathers candidates (local, STUN-discovered, TURN relay), exchanges them via signaling, and runs connectivity checks on every pair to pick the best working path.

DTLS — Encryption for datagrams. Negotiates keys and authenticates peers using self-signed certificates with fingerprints verified through signaling. No certificate authority needed.

SCTP — Provides reliable delivery, message framing, and flow control on top of DTLS. Supports multiple independent streams; each DataChannel maps to one stream.

DataChannel API — Each DataChannel has a label and configuration (ordered, maxRetransmits). Applications send/receive messages without worrying about lower layers.

3. SDP: Session Description Protocol

Before two peers connect, they negotiate capabilities through SDP exchanged via the offer/answer pattern:

Peer A creates an offer (SDP blob), sets it as local description, sends via signaling.
Peer B receives the offer, sets it as remote description, creates an answer, sets as local description.
Peer B sends the answer back through signaling.
Both peers start exchanging ICE candidates (trickled as discovered).

Key SDP fields:

ice-ufrag / ice-pwd — Short-term credentials for STUN connectivity checks.
fingerprint — SHA-256 hash of the peer’s self-signed DTLS certificate. This is how peers verify identity without a CA. The signaling channel must be trusted.
setup — DTLS role negotiation (actpass = willing to be client or server).
m=application — Declares a DataChannel session.
candidate lines — Each is a potential network path (host, srflx, relay). ICE tries them all.

Candidate types

Type	Name	Priority
host	Host candidate (local IP)	Highest
srflx	Server-reflexive (STUN)	Medium
prflx	Peer-reflexive (discovered during checks)	Medium
relay	Relay candidate (TURN)	Lowest

4. DTLS: Encryption without PKI

Traditional TLS relies on certificate authorities. This is impractical for ephemeral agents with no domain names. WebRTC solves this with DTLS using self-signed certificates and fingerprint verification via signaling:

Each peer generates a self-signed certificate and computes its SHA-256 fingerprint.
The fingerprint is embedded in the SDP and sent through signaling.
During the DTLS handshake, each peer presents its certificate and verifies the remote certificate’s fingerprint matches the SDP.
If fingerprints match, the connection is authenticated and encrypted.

Property	TLS (web)	DTLS (WebRTC)
Certificate authority	Required	Not needed
Identity verification	Domain name	Fingerprint in SDP
Transport	TCP	UDP
Mutual authentication	Optional (mTLS)	Always mutual
Trust anchor	CA root certificates	Signaling channel integrity

The critical security assumption: the signaling channel must be trustworthy. In chatixia-mesh, signaling goes through the registry over WebSocket with JWT authentication.

5. SCTP and DataChannels

SCTP runs on top of DTLS, providing features UDP lacks: message framing (complete messages, not byte streams), reliable delivery (configurable per stream), ordered delivery (also configurable), multiple independent streams (up to 65,535), and flow control.

DataChannel delivery modes

Mode	Ordered	Reliable	Use case
Reliable-ordered	Yes	Yes	RPC, task delegation (chatixia-mesh default)
Reliable-unordered	No	Yes	File transfer chunks
Unreliable-ordered	Yes	No	Latest-value sensors
Unreliable-unordered	No	No	Game state, live telemetry

chatixia-mesh uses reliable-ordered for its "mesh" DataChannel. For small JSON payloads (typically under 1 KB), the overhead is negligible and the guarantees are essential.

Head-of-line blocking is the trade-off: if packet N is lost, packets N+1 onward are buffered until N is retransmitted. For chatixia-mesh this is acceptable because payloads are small, ordering matters for request/response correlation, and the mesh operates over networks where packet loss is rare.

6. The Connection Lifecycle

The complete sequence from “no connection” to “DataChannel open”:

Create RTCPeerConnection with ICE servers.
Create DataChannel ("mesh", reliable-ordered).
Create offer (SDP), set local description.
Send offer through signaling (WebSocket to registry).
Remote peer creates answer, sends back through signaling.
Exchange ICE candidates (trickled in both directions).
ICE connectivity checks — test candidate pairs, select best.
DTLS handshake — verify certificate fingerprints.
SCTP association established over the DTLS tunnel.
DataChannel open — on_open fires, IPC peer_connected sent to Python agent.

Why does this take 5-10 seconds?

The bottleneck is ICE. Gathering candidates requires STUN/TURN queries (network round trips). Then every candidate pair must be tested. On a LAN this takes 1-2 seconds; across the internet with TURN fallback, 5-10 seconds. Compare TCP+TLS 1.3 at 50-100ms.

But the 5-10 second cost is one-time per peer pair. Once established, messages flow with sub-millisecond latency on LAN. For chatixia-mesh, agents maintain long-lived connections and exchange many messages, so the setup cost is amortized across the session.

In chatixia-mesh

The WebRTC stack lives entirely in the Rust sidecar:

Concept	Implementation
WebRTC library	`webrtc = "0.17"` crate in `sidecar/Cargo.toml`
Peer connection	`webrtc_peer.rs` — ICE server config, offer/answer
ICE candidate exchange	`webrtc_peer.rs` — `setup_ice_forwarding()` via signaling
DataChannel	`pc.create_data_channel("mesh", None)` (reliable, ordered)
Message format	`protocol.rs` — `MeshMessage` (JSON over DataChannel)
IPC to Python	`ipc.rs` — JSON lines over Unix socket

The Python agent receives peer_connected/peer_disconnected events but never participates in ICE, DTLS, or SCTP. From the agent’s perspective, the mesh is a simple message bus: send JSON, receive JSON.

Exercises

Label the protocol stack. For each layer (Application, DataChannel, SCTP, DTLS, ICE, STUN/TURN, UDP), write one sentence describing its primary responsibility.
Read an SDP offer. Given an SDP with host, srflx, and relay candidates: list each candidate’s type, IP, and port. Identify the DTLS fingerprint and hash algorithm. If Peer B’s NAT blocks UDP, which candidate type will ICE fall back to?
DataChannel delivery modes. Explain reliable-ordered vs reliable-unordered vs unreliable-unordered. Why does chatixia-mesh use reliable-ordered? If you added live telemetry (CPU/memory every 100ms), which mode would you choose?
Connection setup cost. Break down where time goes in WebRTC setup. For a 5-agent full mesh, calculate total connections and sequential setup time. Calculate amortized per-message overhead of 5-10s setup over 500 messages vs. 50ms HTTP per request.

Previous: Lesson 02: Peer-to-Peer Networking | Next: Lesson 04: Async Programming Patterns