Application Protocol Design -- MeshMessage and Task Lifecycle
Lessons 05 and 06 covered how chatixia-mesh establishes connections (signaling) and how the sidecar bridges WebRTC to Python (IPC). Both of those are transport protocols -- they move bytes between ...
On this page
- Introduction
- 1. Layered Protocols
- 2. MeshMessage Format
- Message Types
- 3. Task Lifecycle
- 4. The Dual Execution Path
- Path 1: P2P (fast, <100ms typical)
- Path 2: HTTP Fallback (3-15s typical)
- Comparison
- 5. Graceful Degradation
- 6. Fire-and-Forget vs Request/Response
- Fire-and-Forget: mesh_send and mesh_broadcast
- Request/Response: delegate
- Summary
Lesson 07: Application Protocol Design — MeshMessage and Task Lifecycle
Prerequisites: Lesson 05: Signaling Protocol Design, Lesson 06: Inter-Process Communication
Key source files:
sidecar/src/protocol.rs— MeshMessage struct and message type constantsregistry/src/hub.rs— Task struct, TaskSubmission, HubState, task lifecycleagent/chatixia/core/mesh_skills.py— handle_delegate with P2P-first and HTTP fallbackagent/chatixia/core/mesh_client.py— MeshClient.request() for correlated request/response
Introduction
Lessons 05 and 06 covered how chatixia-mesh establishes connections (signaling) and bridges WebRTC to Python (IPC). Both are transport protocols. This lesson covers what those bytes actually mean — the application protocol that gives messages their semantics.
1. Layered Protocols
Every message travels through multiple layers:
+---------------------------------------------------------------+
| Application: MeshMessage | IpcMessage | SignalingMessage |
+---------------------------------------------------------------+
| Transport: DataChannel | Unix Socket | WebSocket |
+---------------------------------------------------------------+
| Network: UDP | Filesystem | TCP |
+---------------------------------------------------------------+
A MeshMessage from Agent A to Agent B crosses two channels: wrapped in an IpcMessage over the Unix socket to Sidecar A, sent as raw JSON over the DataChannel to Sidecar B, then wrapped in another IpcMessage to reach Agent B. Each component only understands its own protocol — the Python agent never deals with WebRTC, the sidecar never interprets task payloads.
2. MeshMessage Format
The MeshMessage is the single envelope for all agent-to-agent communication. Five fields:
// sidecar/src/protocol.rs
pub struct MeshMessage {
#[serde(rename = "type")]
pub msg_type: String,
#[serde(default)]
pub request_id: String,
#[serde(default)]
pub source_agent: String,
#[serde(default)]
pub target_agent: String,
#[serde(default)]
pub payload: serde_json::Value,
}
| Field | Required | Purpose |
|---|---|---|
type | Yes | Determines how the receiver interprets the message |
request_id | No | Correlates requests with responses (12-char UUID hex) |
source_agent | No | Sender identity for routing and attribution |
target_agent | No | Intended recipient; "*" for broadcasts |
payload | No | Arbitrary JSON, contents depend on type |
Most fields default to empty, so a ping is just {"type": "ping"}. The #[serde(default)] annotation means missing fields deserialize gracefully, making the protocol forward-compatible.
Message Types
Connectivity: ping / pong — lightweight DataChannel heartbeat.
Task delegation (request/response):
task_request— “Execute this task.” Carriesrequest_id, payload includesmessageandskill.task_response— “Here is the result.” Carries matchingrequest_id, payload includesresultorerror.task_stream_chunk— Streaming partial results with the samerequest_id.
Skill discovery: skill_query / skill_response — “What can you do?”
Agent communication (fire-and-forget):
agent_status— Broadcast of skills, health, load.agent_prompt— Direct message or broadcast, no response expected.agent_response/agent_stream_chunk— Optional replies.
3. Task Lifecycle
When an agent delegates work, it is tracked as a task with a four-state lifecycle (registry/src/hub.rs):
submit_task
|
v
[pending]
|
get_pending_for_agent()
|
v
[assigned]
/ \
update_task() update_task()
completed failed
| |
v v
[completed] [failed]
| Transition | Trigger | What happens |
|---|---|---|
-> pending | submit_task() HTTP handler | New task created with UUID, timestamped |
pending -> assigned | get_pending_for_agent() | Agent claims task by skill match; assigned_agent_id set |
assigned -> completed | update_task() | Agent POSTs result |
assigned -> failed | update_task() | Agent POSTs error |
any -> failed | expire_tasks_loop() | TTL exceeded (default 300s, checked every 30s) |
Terminal states (completed/failed) are permanent — no retry mechanism. The source agent decides whether to resubmit.
4. The Dual Execution Path
The core design pattern: every operation tries the fast P2P DataChannel first, then falls back to the slower HTTP task queue.
Path 1: P2P (fast, <100ms typical)
Taken when the mesh client is connected and the target peer is reachable:
msg = MeshMessage(
msg_type="task_request",
source_agent=agent_id,
target_agent=target_agent_id,
payload={"message": message, "skill": skill},
)
response = await _mesh_client.request(target_peer, msg, timeout=120.0)
The request() method generates a request_id, registers an asyncio.Future, sends the message through IPC -> DataChannel -> remote sidecar -> remote agent, and resolves the future when a task_response with the matching ID arrives.
Total hops: 6 (3 each direction). Network crossings: 2 DataChannel messages.
Path 2: HTTP Fallback (3-15s typical)
Used when P2P is unavailable:
result = _post(f"{registry}/api/hub/tasks", {
"skill": skill, "target_agent_id": target_agent_id,
"source_agent_id": agent_id, "payload": {"message": message}, "ttl": 300,
})
task_id = result.get("task_id")
# Poll every 3s until completed, failed, or 120s deadline
while time_remaining:
await asyncio.sleep(3)
status = _get(f"{registry}/api/hub/tasks/{task_id}")
if status["state"] in ("completed", "failed"):
return status
Minimum 4 HTTP requests per task. Latency dominated by the 3-second polling interval.
Comparison
| Aspect | P2P DataChannel | HTTP Task Queue |
|---|---|---|
| Latency | <100ms | 3-15s |
| Registry load | None | 4+ requests/task |
| Encryption | DTLS (end-to-end) | TLS to registry (registry sees content) |
| Reliability | Requires active DataChannel | Works if registry is reachable |
5. Graceful Degradation
The dual path is part of a three-tier transport hierarchy:
| Tier | Transport | Latency | When used |
|---|---|---|---|
| 1 | Direct P2P (UDP) | <100ms | Same LAN, public IPs, or STUN-assisted NAT traversal |
| 2 | TURN Relay (UDP) | 100-500ms | Symmetric NAT or restrictive firewall |
| 3 | HTTP Fallback | 3-15s | No WebRTC connectivity at all |
Tier selection is implicit: ICE negotiation tries direct then TURN; if all WebRTC fails, handle_delegate falls through to HTTP. From the agent’s perspective, delegate always returns a result — only latency varies.
This design serves four goals: latency (P2P is 30-100x faster), privacy (DTLS means the registry never sees data-plane content), scalability (P2P offloads registry traffic), and resilience (established DataChannels survive registry downtime).
6. Fire-and-Forget vs Request/Response
Fire-and-Forget: mesh_send and mesh_broadcast
Uses agent_prompt type and MeshClient.send():
msg = MeshMessage(msg_type="agent_prompt", source_agent=agent_id,
target_agent=target_agent_id, payload={"message": message})
await _mesh_client.send(target_peer, msg) # returns immediately
No request_id, no response expected. Appropriate for status announcements, notifications, and broadcasts.
Request/Response: delegate
Uses task_request/task_response types and MeshClient.request():
msg = MeshMessage(msg_type="task_request", source_agent=agent_id,
target_agent=target_agent_id, payload={"message": message, "skill": skill})
response = await _mesh_client.request(target_peer, msg, timeout=120.0)
Generates request_id, blocks until response or timeout. Appropriate for task delegation where the sender needs the result.
The wait parameter on handle_delegate converts it to fire-and-forget when wait=False — sends a task_request but does not await the response.
Summary
MeshMessage is a five-field JSON envelope that carries all agent-to-agent communication. Its minimal design makes it easy to implement across languages and extend with new types.
Tasks follow a four-state lifecycle (pending -> assigned -> completed | failed) with TTL-based expiration preventing abandoned tasks from accumulating.
The dual execution path tries P2P first (fast, private, decentralized) and falls back to HTTP (slower but always available). request_id enables correlation over DataChannels; task_id serves the same purpose over HTTP polling.
Graceful degradation across three tiers means the system never stops working — it only slows down.
Previous: Lesson 06: Inter-Process Communication | Next: Lesson 08: Authentication and Security