Lesson 04: Async Programming — Concurrency Without Threads

Prerequisites: None (parallel to Lessons 01—03).

1. The Problem with Threads

The simplest model for a network server is thread-per-connection: spawn one thread for every client, each blocking on I/O. This works for a handful of clients but breaks at scale:

Stack memory. Each thread gets 1—8 MB of stack. 1,000 threads means 1—8 GB of stack space alone.
Context switching. Saving/restoring registers, updating page tables, and flushing caches costs 1—6 microseconds per switch. At high thread counts, the CPU spends more time switching than working.
Scheduling overhead. The OS scheduler, designed for dozens of threads, collapses under thousands.
Shared mutable state. Multiple threads accessing the same data requires locks, introducing contention and deadlock risks.

This is the C10K problem (Dan Kegel, 1999): how do you handle 10,000 concurrent connections? The answer is to stop mapping one connection to one thread. Instead, use a small number of threads to handle many connections by only paying attention to connections with data ready.

2. Event Loops and async/await

The reactor pattern

Instead of blocking a thread per connection, the reactor pattern uses a single thread (or small pool) with an event loop:

Register interest in I/O events (“notify me when socket A has data”).
Call the OS to wait for any event (Linux: epoll, macOS: kqueue).
Run the handler for whichever event fires.
Repeat.

A single thread handles thousands of connections because it never blocks on any one of them.

Cooperative vs preemptive multitasking

Threads use preemptive multitasking: the OS forcibly interrupts after a time slice. Async uses cooperative multitasking: each task voluntarily yields at I/O wait points. This means no OS context-switch overhead, no locks needed between yield points — but a task that computes without yielding blocks all other tasks on that thread.

async/await syntax

Modern languages wrap the reactor pattern in async/await:

async function handle_connection(socket):
    data = await socket.read()     // yields here
    result = process(data)
    await socket.write(result)     // yields here too

The await keyword marks yield points. The runtime parks the current task and runs another that is ready. When I/O completes, the original task resumes exactly where it left off.

3. Rust Async: Tokio

Rust’s async model uses futures: values representing a computation that will complete later. Unlike JavaScript promises, Rust futures are lazy — nothing happens until you .await them. The Tokio runtime is the standard async runtime; chatixia-mesh uses it in both the registry and sidecar.

tokio::spawn — concurrent tasks

tokio::spawn runs a future as an independent task. From the registry (main.rs):

// Background tasks run concurrently alongside the HTTP server
tokio::spawn(async move { reg.health_check_loop().await });
tokio::spawn(async move { hub.expire_tasks_loop().await });
tokio::spawn(async move { pairing.cleanup_loop().await });

let listener = tokio::net::TcpListener::bind(addr).await?;
axum::serve(listener, app).await?;

Three background tasks and the HTTP server all run on the same Tokio runtime with no explicitly created threads.

tokio::select! — multiplexing

tokio::select! waits on multiple async operations and runs whichever completes first. From the registry’s WebSocket handler:

loop {
    tokio::select! {
        Some(msg) = rx.recv() => {
            // Message from another handler -- forward to this peer's WebSocket
            socket.send(Message::Text(msg.into())).await;
        }
        msg = socket.recv() => {
            // Peer sent a message -- parse and relay
            match msg {
                Some(Ok(Message::Text(text))) => { /* handle */ }
                Some(Ok(Message::Close(_))) | None => break,
                _ => {}
            }
        }
    }
}

Without select!, you would need two separate tasks and coordination logic. select! collapses this into a single clear loop.

Channels: tokio::sync::mpsc

Channels are async-safe queues. Tokio’s mpsc (multi-producer, single-consumer) channels decouple message production from delivery. In chatixia-mesh, the registry creates one channel per WebSocket peer. To relay a signaling message, it pushes into the target peer’s channel; the WebSocket handler drains from the other end.

4. Python Async: asyncio

Python’s asyncio provides an event loop, coroutines, and tasks for cooperative multitasking. chatixia-mesh uses it in the Python agent.

Coroutines, tasks, and the event loop

async def fetch_data(host, port):
    reader, writer = await asyncio.open_connection(host, port)
    writer.write(b"GET / HTTP/1.0\r\n\r\n")
    await writer.drain()
    data = await reader.read()
    return data.decode()

asyncio.create_task() schedules a coroutine to run concurrently. asyncio.run() creates the event loop and runs to completion.

How the agent uses asyncio

From the agent runner (runner.py):

async def run_agent(config):
    client = MeshClient(socket_path=config.sidecar.socket)
    await client.start()
    client.on("message", _handle_p2p_message)

    while True:
        resp = requests.post(f"{registry}/api/hub/heartbeat", ...)
        for task in resp.json().get("pending_tasks", []):
            asyncio.create_task(
                _execute_task(registry, api_key, task, mesh_client=client)
            )
        await asyncio.sleep(15)

Several things run concurrently on a single thread: the IPC listener inside MeshClient, the heartbeat loop, and any spawned task execution coroutines.

5. Channels for Message Passing

Instead of sharing data structures with locks, pass messages through channels. Each task owns its own data and communicates by sending and receiving messages (“share memory by communicating”).

The sidecar has three concurrent subsystems (signaling, IPC, WebRTC) that exchange messages via channels:

DataChannel handler --> [mpsc channel] --> IPC writer task --> Unix socket to Python

The to_agent_tx sender is passed into WebRTC handlers. When a DataChannel message arrives, the handler calls to_agent_tx.send(msg). The IPC task receives and writes to the Unix socket. No shared mutable state, no locks.

The same pattern exists on the registry side: each WebSocket peer gets its own channel for relaying signaling messages.

6. Concurrent Data Structures

The problem with HashMap + Mutex

Wrapping a HashMap in a Mutex works but serializes all access. With 100 concurrent WebSocket handlers, most time is spent waiting for the lock.

DashMap: sharded concurrent HashMap

DashMap partitions keys into shards, each with its own lock. Operations on keys in different shards proceed in parallel. The registry uses five DashMap instances:

State module	Key	Value	Purpose
`SignalingState`	peer_id	`UnboundedSender<String>`	WebSocket sender per peer
`RegistryState`	agent_id	`AgentRecord`	Agent registry with health
`HubState`	task_id	`Task`	Task queue lifecycle
`PairingState`	invite code	`InviteCode`	Ephemeral invite codes
`PairingState`	entry_id	`OnboardingEntry`	Onboarding lifecycle

Each is wrapped in Arc and shared across handlers. A heartbeat handler can update RegistryState while a WebSocket handler reads SignalingState with no contention.

When to use what

Approach	Use when
`Mutex<HashMap>`	Low contention, simple patterns
`RwLock<HashMap>`	Many readers, few writers
`DashMap`	High contention, many concurrent readers/writers on different keys
Channels	Data flows one direction between tasks (most contention-free)

Exercises

Tokio select! with two channels. Write a program with two mpsc channels. One producer sends an i32 every 500ms, another sends a String every 700ms. Use select! to print messages from both for 3 seconds.
asyncio concurrent coroutines. Write a Python program with a timer (prints “tick” every 1s, 5 times), a consumer (reads from a Queue), and a producer (puts 3 messages at 0.5s intervals then “STOP”). Run all three concurrently.
Why create_task instead of await? The agent heartbeat loop uses create_task(_execute_task(...)). What would happen with await instead? If 3 tasks each take 10s, when does the next heartbeat fire under each approach?
DashMap vs HashMap+Mutex. The registry’s health_check_loop iterates all agents every 15s. How does this behave under DashMap vs Mutex<HashMap>? Could RwLock<HashMap> work instead?

Previous: Lesson 03: WebRTC Fundamentals | Next: Lesson 05: Signaling Protocol Design

Async Programming -- Concurrency Without Threads