Peer-to-Peer Networking: When Servers Get Out of the Way
The vast majority of internet traffic follows the client-server model: your browser (client) sends a request to a server, and the server sends back a response. This pattern dominates because it is ...
On this page
- Prerequisites
- What You’ll Learn
- 1. Client-Server vs Peer-to-Peer
- 2. The NAT Problem
- Why NATs exist
- Why NAT breaks inbound connections
- Types of NAT
- 3. NAT Traversal Techniques
- UDP Hole-Punching
- STUN: Session Traversal Utilities for NAT
- TURN: Traversal Using Relays around NAT
- In chatixia-mesh
- 4. ICE: Interactive Connectivity Establishment
- Candidate gathering
- Connectivity checks
- Path selection
- In chatixia-mesh
- 5. Signaling: The Bootstrap Problem
- In chatixia-mesh
- Exercises
- Related Lessons
- Further Reading
Peer-to-Peer Networking: When Servers Get Out of the Way
Prerequisites
- Lesson 01: Networking Foundations — TCP/IP, DNS, ports, sockets, HTTP basics.
What You’ll Learn
- Why most internet traffic uses client-server and when P2P is a better fit
- What NATs are, why they exist, and how they block inbound connections
- How STUN, TURN, and UDP hole-punching overcome NAT barriers
- How ICE orchestrates candidate gathering, connectivity checks, and path selection
1. Client-Server vs Peer-to-Peer
The vast majority of internet traffic follows the client-server model: your browser sends a request to a server with a stable, publicly routable IP address, and the server responds. Clients can be anywhere — behind home routers, on cellular networks, on corporate VPNs.
Peer-to-peer (P2P) removes the central server from the data path. Peers talk directly to each other. P2P is worth the added complexity when:
- Latency matters. Routing through a server adds a round trip. In video calls and real-time collaboration, that extra hop is noticeable.
- Bandwidth is expensive at the center. BitTorrent moves terabytes without any single server paying the bandwidth bill.
- Privacy or trust. P2P with end-to-end encryption (DTLS in WebRTC) means only the communicating peers see the plaintext.
- Resilience. No central server means no single point of failure.
| Use Case | Protocol | Why P2P |
|---|---|---|
| File sharing | BitTorrent | Bandwidth distributed across peers |
| Video/voice calls | WebRTC | Low latency, end-to-end encryption |
| Blockchain | Bitcoin/Ethereum | No central authority |
| Agent mesh | chatixia-mesh | Direct agent-to-agent, registry not in data path |
The trade-off is complexity. Establishing a direct connection between two peers behind NATs is substantially harder than connecting to a known server.
2. The NAT Problem
Why NATs exist
IPv4 provides roughly 4.3 billion addresses — not enough for the modern internet. Network Address Translation (NAT) is the short-term fix: your home router assigns private IPs to local devices (e.g., 192.168.1.x) and shares a single public IP for all outbound traffic. The router maintains a mapping table to route replies back to the correct device.
Why NAT breaks inbound connections
NAT works for client-server traffic: the client initiates the connection, the NAT creates a mapping, and replies flow back. But if an external device tries to initiate a connection to your machine, there is no mapping — the router drops the packet.
This is the fundamental P2P problem: both peers are typically behind NATs, and neither can accept incoming connections from the other.
Types of NAT
NATs differ in how strictly they filter inbound packets:
- Full Cone (least restrictive) — Once a mapping is created, any external host can send to that mapped port. Easiest to traverse.
- Restricted Cone — Only accepts inbound from IP addresses the internal host has previously sent to.
- Port-Restricted Cone — Only the exact IP:port pair the internal host contacted can send back.
- Symmetric (most restrictive) — A new mapping is created for every unique destination, with different external ports. STUN sees a different port than a peer would, making hole-punching very difficult. Common in enterprise networks.
- Carrier-Grade NAT (CGNAT) — The ISP adds a second layer of NAT. Two levels of translation make hole-punching unreliable; TURN is often required.
3. NAT Traversal Techniques
UDP Hole-Punching
Both peers send UDP packets to each other’s public address simultaneously. Both NATs create mappings for the outbound traffic, and subsequent packets flow through. The trick: both peers need to know each other’s public IP and port first (learned via a STUN or signaling server).
Hole-punching works with full cone, restricted cone, and port-restricted cone NATs. It generally fails with symmetric NAT because the port assigned for the STUN query differs from the port assigned for the peer.
STUN: Session Traversal Utilities for NAT
STUN (RFC 8489) answers one question: “What is my public IP address and port?” A peer sends a Binding Request to a STUN server on the public internet; the server returns the source IP:port it sees after NAT translation. This gives the peer its server-reflexive candidate.
STUN servers are cheap to operate (small request/response pairs, no relay). Google operates free STUN servers (stun:stun.l.google.com:19302). STUN alone fails when the peer is behind symmetric NAT or UDP is entirely blocked.
TURN: Traversal Using Relays around NAT
TURN (RFC 8656) is the fallback when direct connectivity is impossible. A TURN server allocates a public relay address and forwards all traffic between peers through itself. It is expensive (relays all data) and partially negates the P2P advantage, but preserves end-to-end encryption (DTLS) and ensures connectivity when nothing else works.
TURN servers typically use ephemeral credentials: the signaling server generates time-limited username/credential pairs via HMAC-SHA1, which the TURN server validates without a database lookup.
In chatixia-mesh
The registry serves ICE server configuration via GET /api/config, always including STUN and optionally TURN. The sidecar reads TURN_URL and TURN_SECRET from environment variables and generates ephemeral credentials using the same HMAC-SHA1 algorithm as coturn.
4. ICE: Interactive Connectivity Establishment
ICE (RFC 8445) ties STUN, TURN, and direct connectivity together. It is an orchestration protocol that gathers all possible ways to reach a peer, tests them, and selects the best one.
Candidate gathering
ICE gathers three types of candidates:
| Type | Name | Source | Priority |
|---|---|---|---|
| host | Host candidate | Local network interface | Highest |
| srflx | Server-reflexive | STUN server response | Medium |
| relay | Relay candidate | TURN server allocation | Lowest |
Connectivity checks
Once both peers exchange candidates via signaling, ICE forms candidate pairs (every local candidate paired with every remote candidate) and tests them via STUN Binding Requests. Pairs are tested in priority order; the first to succeed wins.
Path selection
ICE ranks pairs by candidate type (host > srflx > relay), network interface, and component ID. The first pair to complete a connectivity check becomes the selected pair. If the path degrades later, ICE can restart and switch to a better pair.
In chatixia-mesh
The sidecar’s webrtc_peer.rs creates an RTCPeerConnection with ICE servers from the environment. As candidates are gathered, each is sent to the remote peer through the signaling server. The full flow:
Sidecar A Registry (signaling) Sidecar B
| | |
|-- register ----------------->|<---------------- register ----|
|<-- peer_list [B] -----------|-- peer_list [A] ------------->|
|-- SDP offer (target: B) --->|-- SDP offer ----------------->|
|<------------- SDP answer ---|<-- SDP answer (target: A) ----|
|-- ICE candidates ---------->|-- ICE candidates ------------>|
|<------------- ICE cand. ----|<-- ICE candidates ------------|
| | |
|<======= DataChannel (direct P2P, DTLS) ===================>|
5. Signaling: The Bootstrap Problem
To establish a P2P connection, two peers need to exchange SDP offers/answers and ICE candidates. But they cannot exchange information until they have a connection. This is the bootstrap problem.
A signaling server is a rendezvous point where peers discover each other and exchange connection metadata. It is not part of the data path — once the P2P connection is up, the signaling server could shut down and existing connections would continue.
The signaling mechanism is intentionally left unspecified by WebRTC. Any transport works: WebSocket, HTTP long-polling, carrier pigeon.
In chatixia-mesh
The registry (signaling.rs) serves as the signaling server. Sidecars connect via WebSocket, authenticated with JWT. The registry tracks peers, sends peer_list messages, and relays sdp_offer, sdp_answer, and ice_candidate messages between peers. The three connectivity tiers degrade gracefully:
| Tier | Path | Latency | When used |
|---|---|---|---|
| 1 | Direct P2P DataChannel | <100ms | Open UDP path (same LAN or cooperating NATs) |
| 2 | TURN relay | 50-200ms | NAT/firewall blocks direct UDP |
| 3 | HTTP task queue (via registry) | 3-15s | All UDP blocked, no TURN configured |
The system never fails — it only slows down.
Exercises
-
Determine your NAT type. Use a WebRTC test page or STUN client to query
stun.l.google.com:19302. What is your public IP:port? Does the port change between queries (indicating symmetric NAT)? -
ICE on the same LAN. Trace the ICE sequence for two agents on 192.168.1.x. Which candidate pair succeeds first? Does STUN/TURN play any role?
-
ICE across networks. Trace ICE for one agent behind a home NAT and another on a VPS with a public IP. Which candidate pair wins? When would TURN be needed?
-
Why TURN is necessary. Give two network configurations where hole-punching fails. Compare running a TURN server vs. accepting Tier 3 HTTP latency for different workload types.
Related Lessons
- Lesson 01: Networking Foundations — prerequisite.
- Lesson 03: WebRTC Deep Dive — DTLS, SCTP, DataChannels, the full WebRTC stack.
- Lesson 04: The Sidecar Pattern — why chatixia-mesh separates WebRTC into a Rust sidecar.
Further Reading
- RFC 8445 — ICE. RFC 8489 — STUN. RFC 8656 — TURN. RFC 8866 — SDP.
- WebRTC for the Curious — free online book covering WebRTC internals.
- ICE, STUN, and TURN explanation (MDN)
Previous: Lesson 01: Why Distributed Systems | Next: Lesson 03: WebRTC Fundamentals