Async Programming -- Concurrency Without Threads
The simplest model for a network server is to spawn one thread for every client connection. A chat server using this model might look like:
On this page
- 1. The Problem with Threads
- 2. Event Loops and async/await
- The reactor pattern
- Cooperative vs preemptive multitasking
- async/await syntax
- 3. Rust Async: Tokio
- tokio::spawn — concurrent tasks
- tokio::select! — multiplexing
- Channels: tokio::sync::mpsc
- 4. Python Async: asyncio
- Coroutines, tasks, and the event loop
- How the agent uses asyncio
- 5. Channels for Message Passing
- 6. Concurrent Data Structures
- The problem with HashMap + Mutex
- DashMap: sharded concurrent HashMap
- When to use what
- Exercises
Lesson 04: Async Programming — Concurrency Without Threads
Prerequisites: None (parallel to Lessons 01—03).
1. The Problem with Threads
The simplest model for a network server is thread-per-connection: spawn one thread for every client, each blocking on I/O. This works for a handful of clients but breaks at scale:
- Stack memory. Each thread gets 1—8 MB of stack. 1,000 threads means 1—8 GB of stack space alone.
- Context switching. Saving/restoring registers, updating page tables, and flushing caches costs 1—6 microseconds per switch. At high thread counts, the CPU spends more time switching than working.
- Scheduling overhead. The OS scheduler, designed for dozens of threads, collapses under thousands.
- Shared mutable state. Multiple threads accessing the same data requires locks, introducing contention and deadlock risks.
This is the C10K problem (Dan Kegel, 1999): how do you handle 10,000 concurrent connections? The answer is to stop mapping one connection to one thread. Instead, use a small number of threads to handle many connections by only paying attention to connections with data ready.
2. Event Loops and async/await
The reactor pattern
Instead of blocking a thread per connection, the reactor pattern uses a single thread (or small pool) with an event loop:
- Register interest in I/O events (“notify me when socket A has data”).
- Call the OS to wait for any event (Linux:
epoll, macOS:kqueue). - Run the handler for whichever event fires.
- Repeat.
A single thread handles thousands of connections because it never blocks on any one of them.
Cooperative vs preemptive multitasking
Threads use preemptive multitasking: the OS forcibly interrupts after a time slice. Async uses cooperative multitasking: each task voluntarily yields at I/O wait points. This means no OS context-switch overhead, no locks needed between yield points — but a task that computes without yielding blocks all other tasks on that thread.
async/await syntax
Modern languages wrap the reactor pattern in async/await:
async function handle_connection(socket):
data = await socket.read() // yields here
result = process(data)
await socket.write(result) // yields here too
The await keyword marks yield points. The runtime parks the current task and runs another that is ready. When I/O completes, the original task resumes exactly where it left off.
3. Rust Async: Tokio
Rust’s async model uses futures: values representing a computation that will complete later. Unlike JavaScript promises, Rust futures are lazy — nothing happens until you .await them. The Tokio runtime is the standard async runtime; chatixia-mesh uses it in both the registry and sidecar.
tokio::spawn — concurrent tasks
tokio::spawn runs a future as an independent task. From the registry (main.rs):
// Background tasks run concurrently alongside the HTTP server
tokio::spawn(async move { reg.health_check_loop().await });
tokio::spawn(async move { hub.expire_tasks_loop().await });
tokio::spawn(async move { pairing.cleanup_loop().await });
let listener = tokio::net::TcpListener::bind(addr).await?;
axum::serve(listener, app).await?;
Three background tasks and the HTTP server all run on the same Tokio runtime with no explicitly created threads.
tokio::select! — multiplexing
tokio::select! waits on multiple async operations and runs whichever completes first. From the registry’s WebSocket handler:
loop {
tokio::select! {
Some(msg) = rx.recv() => {
// Message from another handler -- forward to this peer's WebSocket
socket.send(Message::Text(msg.into())).await;
}
msg = socket.recv() => {
// Peer sent a message -- parse and relay
match msg {
Some(Ok(Message::Text(text))) => { /* handle */ }
Some(Ok(Message::Close(_))) | None => break,
_ => {}
}
}
}
}
Without select!, you would need two separate tasks and coordination logic. select! collapses this into a single clear loop.
Channels: tokio::sync::mpsc
Channels are async-safe queues. Tokio’s mpsc (multi-producer, single-consumer) channels decouple message production from delivery. In chatixia-mesh, the registry creates one channel per WebSocket peer. To relay a signaling message, it pushes into the target peer’s channel; the WebSocket handler drains from the other end.
4. Python Async: asyncio
Python’s asyncio provides an event loop, coroutines, and tasks for cooperative multitasking. chatixia-mesh uses it in the Python agent.
Coroutines, tasks, and the event loop
async def fetch_data(host, port):
reader, writer = await asyncio.open_connection(host, port)
writer.write(b"GET / HTTP/1.0\r\n\r\n")
await writer.drain()
data = await reader.read()
return data.decode()
asyncio.create_task() schedules a coroutine to run concurrently. asyncio.run() creates the event loop and runs to completion.
How the agent uses asyncio
From the agent runner (runner.py):
async def run_agent(config):
client = MeshClient(socket_path=config.sidecar.socket)
await client.start()
client.on("message", _handle_p2p_message)
while True:
resp = requests.post(f"{registry}/api/hub/heartbeat", ...)
for task in resp.json().get("pending_tasks", []):
asyncio.create_task(
_execute_task(registry, api_key, task, mesh_client=client)
)
await asyncio.sleep(15)
Several things run concurrently on a single thread: the IPC listener inside MeshClient, the heartbeat loop, and any spawned task execution coroutines.
5. Channels for Message Passing
Instead of sharing data structures with locks, pass messages through channels. Each task owns its own data and communicates by sending and receiving messages (“share memory by communicating”).
The sidecar has three concurrent subsystems (signaling, IPC, WebRTC) that exchange messages via channels:
DataChannel handler --> [mpsc channel] --> IPC writer task --> Unix socket to Python
The to_agent_tx sender is passed into WebRTC handlers. When a DataChannel message arrives, the handler calls to_agent_tx.send(msg). The IPC task receives and writes to the Unix socket. No shared mutable state, no locks.
The same pattern exists on the registry side: each WebSocket peer gets its own channel for relaying signaling messages.
6. Concurrent Data Structures
The problem with HashMap + Mutex
Wrapping a HashMap in a Mutex works but serializes all access. With 100 concurrent WebSocket handlers, most time is spent waiting for the lock.
DashMap: sharded concurrent HashMap
DashMap partitions keys into shards, each with its own lock. Operations on keys in different shards proceed in parallel. The registry uses five DashMap instances:
| State module | Key | Value | Purpose |
|---|---|---|---|
SignalingState | peer_id | UnboundedSender<String> | WebSocket sender per peer |
RegistryState | agent_id | AgentRecord | Agent registry with health |
HubState | task_id | Task | Task queue lifecycle |
PairingState | invite code | InviteCode | Ephemeral invite codes |
PairingState | entry_id | OnboardingEntry | Onboarding lifecycle |
Each is wrapped in Arc and shared across handlers. A heartbeat handler can update RegistryState while a WebSocket handler reads SignalingState with no contention.
When to use what
| Approach | Use when |
|---|---|
Mutex<HashMap> | Low contention, simple patterns |
RwLock<HashMap> | Many readers, few writers |
DashMap | High contention, many concurrent readers/writers on different keys |
| Channels | Data flows one direction between tasks (most contention-free) |
Exercises
-
Tokio select! with two channels. Write a program with two
mpscchannels. One producer sends an i32 every 500ms, another sends a String every 700ms. Useselect!to print messages from both for 3 seconds. -
asyncio concurrent coroutines. Write a Python program with a timer (prints “tick” every 1s, 5 times), a consumer (reads from a Queue), and a producer (puts 3 messages at 0.5s intervals then “STOP”). Run all three concurrently.
-
Why create_task instead of await? The agent heartbeat loop uses
create_task(_execute_task(...)). What would happen withawaitinstead? If 3 tasks each take 10s, when does the next heartbeat fire under each approach? -
DashMap vs HashMap+Mutex. The registry’s
health_check_loopiterates all agents every 15s. How does this behave underDashMapvsMutex<HashMap>? CouldRwLock<HashMap>work instead?
Previous: Lesson 03: WebRTC Fundamentals | Next: Lesson 05: Signaling Protocol Design