WebRTC Fundamentals: The Protocol Stack for Real-Time P2P
WebRTC (Web Real-Time Communication) is an open standard for peer-to-peer communication. Most people associate it with video calls in a browser, but that is only one use case. WebRTC defines three ...
On this page
Lesson 03 — WebRTC Fundamentals: The Protocol Stack for Real-Time P2P
Prerequisites: Lesson 02 (Networking Foundations)
1. What is WebRTC?
WebRTC (Web Real-Time Communication) is an open standard for peer-to-peer communication. It defines three core APIs:
- getUserMedia — captures audio/video from local devices. Not relevant to chatixia-mesh.
- RTCPeerConnection — manages the full lifecycle of a P2P connection: ICE negotiation, DTLS encryption, and multiplexing of streams.
- RTCDataChannel — sends arbitrary data (text, binary) between peers. This is the API chatixia-mesh uses.
WebRTC is not just for browsers. Server-side implementations exist in Go (Pion), Rust (webrtc-rs), Python (aiortc), and C++ (libwebrtc). chatixia-mesh uses webrtc-rs inside its Rust sidecar. The Python agent never touches WebRTC directly — it communicates with the sidecar over a Unix socket using JSON-line IPC.
2. The Protocol Stack
WebRTC builds a deep stack on top of UDP for DataChannels:
+---------------------------------------------+
| Application (MeshMessage) | Your JSON messages
+---------------------------------------------+
| DataChannel API | Named channels, open/close events
+---------------------------------------------+
| SCTP | Reliable/unreliable delivery,
| (Stream Control Transmission) | message framing, flow control
+---------------------------------------------+
| DTLS | Encryption + mutual authentication
+---------------------------------------------+
| ICE | NAT traversal, connectivity checks
+---------------------------------------------+
| STUN / TURN | Discover public IP / relay fallback
+---------------------------------------------+
| UDP | Unreliable datagrams
+---------------------------------------------+
UDP — The transport foundation. WebRTC chose UDP over TCP because real-time media cannot tolerate TCP’s head-of-line blocking. SCTP adds reliability back when needed.
ICE — Gathers candidates (local, STUN-discovered, TURN relay), exchanges them via signaling, and runs connectivity checks on every pair to pick the best working path.
DTLS — Encryption for datagrams. Negotiates keys and authenticates peers using self-signed certificates with fingerprints verified through signaling. No certificate authority needed.
SCTP — Provides reliable delivery, message framing, and flow control on top of DTLS. Supports multiple independent streams; each DataChannel maps to one stream.
DataChannel API — Each DataChannel has a label and configuration (ordered, maxRetransmits). Applications send/receive messages without worrying about lower layers.
3. SDP: Session Description Protocol
Before two peers connect, they negotiate capabilities through SDP exchanged via the offer/answer pattern:
- Peer A creates an offer (SDP blob), sets it as local description, sends via signaling.
- Peer B receives the offer, sets it as remote description, creates an answer, sets as local description.
- Peer B sends the answer back through signaling.
- Both peers start exchanging ICE candidates (trickled as discovered).
Key SDP fields:
- ice-ufrag / ice-pwd — Short-term credentials for STUN connectivity checks.
- fingerprint — SHA-256 hash of the peer’s self-signed DTLS certificate. This is how peers verify identity without a CA. The signaling channel must be trusted.
- setup — DTLS role negotiation (
actpass= willing to be client or server). - m=application — Declares a DataChannel session.
- candidate lines — Each is a potential network path (host, srflx, relay). ICE tries them all.
Candidate types
| Type | Name | Priority |
|---|---|---|
| host | Host candidate (local IP) | Highest |
| srflx | Server-reflexive (STUN) | Medium |
| prflx | Peer-reflexive (discovered during checks) | Medium |
| relay | Relay candidate (TURN) | Lowest |
4. DTLS: Encryption without PKI
Traditional TLS relies on certificate authorities. This is impractical for ephemeral agents with no domain names. WebRTC solves this with DTLS using self-signed certificates and fingerprint verification via signaling:
- Each peer generates a self-signed certificate and computes its SHA-256 fingerprint.
- The fingerprint is embedded in the SDP and sent through signaling.
- During the DTLS handshake, each peer presents its certificate and verifies the remote certificate’s fingerprint matches the SDP.
- If fingerprints match, the connection is authenticated and encrypted.
| Property | TLS (web) | DTLS (WebRTC) |
|---|---|---|
| Certificate authority | Required | Not needed |
| Identity verification | Domain name | Fingerprint in SDP |
| Transport | TCP | UDP |
| Mutual authentication | Optional (mTLS) | Always mutual |
| Trust anchor | CA root certificates | Signaling channel integrity |
The critical security assumption: the signaling channel must be trustworthy. In chatixia-mesh, signaling goes through the registry over WebSocket with JWT authentication.
5. SCTP and DataChannels
SCTP runs on top of DTLS, providing features UDP lacks: message framing (complete messages, not byte streams), reliable delivery (configurable per stream), ordered delivery (also configurable), multiple independent streams (up to 65,535), and flow control.
DataChannel delivery modes
| Mode | Ordered | Reliable | Use case |
|---|---|---|---|
| Reliable-ordered | Yes | Yes | RPC, task delegation (chatixia-mesh default) |
| Reliable-unordered | No | Yes | File transfer chunks |
| Unreliable-ordered | Yes | No | Latest-value sensors |
| Unreliable-unordered | No | No | Game state, live telemetry |
chatixia-mesh uses reliable-ordered for its "mesh" DataChannel. For small JSON payloads (typically under 1 KB), the overhead is negligible and the guarantees are essential.
Head-of-line blocking is the trade-off: if packet N is lost, packets N+1 onward are buffered until N is retransmitted. For chatixia-mesh this is acceptable because payloads are small, ordering matters for request/response correlation, and the mesh operates over networks where packet loss is rare.
6. The Connection Lifecycle
The complete sequence from “no connection” to “DataChannel open”:
- Create RTCPeerConnection with ICE servers.
- Create DataChannel (
"mesh", reliable-ordered). - Create offer (SDP), set local description.
- Send offer through signaling (WebSocket to registry).
- Remote peer creates answer, sends back through signaling.
- Exchange ICE candidates (trickled in both directions).
- ICE connectivity checks — test candidate pairs, select best.
- DTLS handshake — verify certificate fingerprints.
- SCTP association established over the DTLS tunnel.
- DataChannel open —
on_openfires, IPCpeer_connectedsent to Python agent.
Why does this take 5-10 seconds?
The bottleneck is ICE. Gathering candidates requires STUN/TURN queries (network round trips). Then every candidate pair must be tested. On a LAN this takes 1-2 seconds; across the internet with TURN fallback, 5-10 seconds. Compare TCP+TLS 1.3 at 50-100ms.
But the 5-10 second cost is one-time per peer pair. Once established, messages flow with sub-millisecond latency on LAN. For chatixia-mesh, agents maintain long-lived connections and exchange many messages, so the setup cost is amortized across the session.
In chatixia-mesh
The WebRTC stack lives entirely in the Rust sidecar:
| Concept | Implementation |
|---|---|
| WebRTC library | webrtc = "0.17" crate in sidecar/Cargo.toml |
| Peer connection | webrtc_peer.rs — ICE server config, offer/answer |
| ICE candidate exchange | webrtc_peer.rs — setup_ice_forwarding() via signaling |
| DataChannel | pc.create_data_channel("mesh", None) (reliable, ordered) |
| Message format | protocol.rs — MeshMessage (JSON over DataChannel) |
| IPC to Python | ipc.rs — JSON lines over Unix socket |
The Python agent receives peer_connected/peer_disconnected events but never participates in ICE, DTLS, or SCTP. From the agent’s perspective, the mesh is a simple message bus: send JSON, receive JSON.
Exercises
-
Label the protocol stack. For each layer (Application, DataChannel, SCTP, DTLS, ICE, STUN/TURN, UDP), write one sentence describing its primary responsibility.
-
Read an SDP offer. Given an SDP with host, srflx, and relay candidates: list each candidate’s type, IP, and port. Identify the DTLS fingerprint and hash algorithm. If Peer B’s NAT blocks UDP, which candidate type will ICE fall back to?
-
DataChannel delivery modes. Explain reliable-ordered vs reliable-unordered vs unreliable-unordered. Why does chatixia-mesh use reliable-ordered? If you added live telemetry (CPU/memory every 100ms), which mode would you choose?
-
Connection setup cost. Break down where time goes in WebRTC setup. For a 5-agent full mesh, calculate total connections and sequential setup time. Calculate amortized per-message overhead of 5-10s setup over 500 messages vs. 50ms HTTP per request.
Previous: Lesson 02: Peer-to-Peer Networking | Next: Lesson 04: Async Programming Patterns