chatixia blog
Deep Dive March 15, 2026 · 6 min read

Application Protocol Design -- MeshMessage and Task Lifecycle

Lessons 05 and 06 covered how chatixia-mesh establishes connections (signaling) and how the sidecar bridges WebRTC to Python (IPC). Both of those are transport protocols -- they move bytes between ...

protocol-designmessagingtask-lifecycle
On this page

Lesson 07: Application Protocol Design — MeshMessage and Task Lifecycle

Prerequisites: Lesson 05: Signaling Protocol Design, Lesson 06: Inter-Process Communication

Key source files:

  • sidecar/src/protocol.rs — MeshMessage struct and message type constants
  • registry/src/hub.rs — Task struct, TaskSubmission, HubState, task lifecycle
  • agent/chatixia/core/mesh_skills.py — handle_delegate with P2P-first and HTTP fallback
  • agent/chatixia/core/mesh_client.py — MeshClient.request() for correlated request/response

Introduction

Lessons 05 and 06 covered how chatixia-mesh establishes connections (signaling) and bridges WebRTC to Python (IPC). Both are transport protocols. This lesson covers what those bytes actually mean — the application protocol that gives messages their semantics.


1. Layered Protocols

Every message travels through multiple layers:

+---------------------------------------------------------------+
|  Application:  MeshMessage | IpcMessage | SignalingMessage     |
+---------------------------------------------------------------+
|  Transport:    DataChannel | Unix Socket | WebSocket           |
+---------------------------------------------------------------+
|  Network:      UDP         | Filesystem  | TCP                 |
+---------------------------------------------------------------+

A MeshMessage from Agent A to Agent B crosses two channels: wrapped in an IpcMessage over the Unix socket to Sidecar A, sent as raw JSON over the DataChannel to Sidecar B, then wrapped in another IpcMessage to reach Agent B. Each component only understands its own protocol — the Python agent never deals with WebRTC, the sidecar never interprets task payloads.


2. MeshMessage Format

The MeshMessage is the single envelope for all agent-to-agent communication. Five fields:

// sidecar/src/protocol.rs
pub struct MeshMessage {
    #[serde(rename = "type")]
    pub msg_type: String,
    #[serde(default)]
    pub request_id: String,
    #[serde(default)]
    pub source_agent: String,
    #[serde(default)]
    pub target_agent: String,
    #[serde(default)]
    pub payload: serde_json::Value,
}
FieldRequiredPurpose
typeYesDetermines how the receiver interprets the message
request_idNoCorrelates requests with responses (12-char UUID hex)
source_agentNoSender identity for routing and attribution
target_agentNoIntended recipient; "*" for broadcasts
payloadNoArbitrary JSON, contents depend on type

Most fields default to empty, so a ping is just {"type": "ping"}. The #[serde(default)] annotation means missing fields deserialize gracefully, making the protocol forward-compatible.

Message Types

Connectivity: ping / pong — lightweight DataChannel heartbeat.

Task delegation (request/response):

  • task_request — “Execute this task.” Carries request_id, payload includes message and skill.
  • task_response — “Here is the result.” Carries matching request_id, payload includes result or error.
  • task_stream_chunk — Streaming partial results with the same request_id.

Skill discovery: skill_query / skill_response — “What can you do?”

Agent communication (fire-and-forget):

  • agent_status — Broadcast of skills, health, load.
  • agent_prompt — Direct message or broadcast, no response expected.
  • agent_response / agent_stream_chunk — Optional replies.

3. Task Lifecycle

When an agent delegates work, it is tracked as a task with a four-state lifecycle (registry/src/hub.rs):

              submit_task
                  |
                  v
             [pending]
                  |
         get_pending_for_agent()
                  |
                  v
             [assigned]
              /        \
   update_task()    update_task()
   completed        failed
        |               |
        v               v
  [completed]      [failed]
TransitionTriggerWhat happens
-> pendingsubmit_task() HTTP handlerNew task created with UUID, timestamped
pending -> assignedget_pending_for_agent()Agent claims task by skill match; assigned_agent_id set
assigned -> completedupdate_task()Agent POSTs result
assigned -> failedupdate_task()Agent POSTs error
any -> failedexpire_tasks_loop()TTL exceeded (default 300s, checked every 30s)

Terminal states (completed/failed) are permanent — no retry mechanism. The source agent decides whether to resubmit.


4. The Dual Execution Path

The core design pattern: every operation tries the fast P2P DataChannel first, then falls back to the slower HTTP task queue.

Path 1: P2P (fast, <100ms typical)

Taken when the mesh client is connected and the target peer is reachable:

msg = MeshMessage(
    msg_type="task_request",
    source_agent=agent_id,
    target_agent=target_agent_id,
    payload={"message": message, "skill": skill},
)
response = await _mesh_client.request(target_peer, msg, timeout=120.0)

The request() method generates a request_id, registers an asyncio.Future, sends the message through IPC -> DataChannel -> remote sidecar -> remote agent, and resolves the future when a task_response with the matching ID arrives.

Total hops: 6 (3 each direction). Network crossings: 2 DataChannel messages.

Path 2: HTTP Fallback (3-15s typical)

Used when P2P is unavailable:

result = _post(f"{registry}/api/hub/tasks", {
    "skill": skill, "target_agent_id": target_agent_id,
    "source_agent_id": agent_id, "payload": {"message": message}, "ttl": 300,
})
task_id = result.get("task_id")

# Poll every 3s until completed, failed, or 120s deadline
while time_remaining:
    await asyncio.sleep(3)
    status = _get(f"{registry}/api/hub/tasks/{task_id}")
    if status["state"] in ("completed", "failed"):
        return status

Minimum 4 HTTP requests per task. Latency dominated by the 3-second polling interval.

Comparison

AspectP2P DataChannelHTTP Task Queue
Latency<100ms3-15s
Registry loadNone4+ requests/task
EncryptionDTLS (end-to-end)TLS to registry (registry sees content)
ReliabilityRequires active DataChannelWorks if registry is reachable

5. Graceful Degradation

The dual path is part of a three-tier transport hierarchy:

TierTransportLatencyWhen used
1Direct P2P (UDP)<100msSame LAN, public IPs, or STUN-assisted NAT traversal
2TURN Relay (UDP)100-500msSymmetric NAT or restrictive firewall
3HTTP Fallback3-15sNo WebRTC connectivity at all

Tier selection is implicit: ICE negotiation tries direct then TURN; if all WebRTC fails, handle_delegate falls through to HTTP. From the agent’s perspective, delegate always returns a result — only latency varies.

This design serves four goals: latency (P2P is 30-100x faster), privacy (DTLS means the registry never sees data-plane content), scalability (P2P offloads registry traffic), and resilience (established DataChannels survive registry downtime).


6. Fire-and-Forget vs Request/Response

Fire-and-Forget: mesh_send and mesh_broadcast

Uses agent_prompt type and MeshClient.send():

msg = MeshMessage(msg_type="agent_prompt", source_agent=agent_id,
                  target_agent=target_agent_id, payload={"message": message})
await _mesh_client.send(target_peer, msg)  # returns immediately

No request_id, no response expected. Appropriate for status announcements, notifications, and broadcasts.

Request/Response: delegate

Uses task_request/task_response types and MeshClient.request():

msg = MeshMessage(msg_type="task_request", source_agent=agent_id,
                  target_agent=target_agent_id, payload={"message": message, "skill": skill})
response = await _mesh_client.request(target_peer, msg, timeout=120.0)

Generates request_id, blocks until response or timeout. Appropriate for task delegation where the sender needs the result.

The wait parameter on handle_delegate converts it to fire-and-forget when wait=False — sends a task_request but does not await the response.


Summary

MeshMessage is a five-field JSON envelope that carries all agent-to-agent communication. Its minimal design makes it easy to implement across languages and extend with new types.

Tasks follow a four-state lifecycle (pending -> assigned -> completed | failed) with TTL-based expiration preventing abandoned tasks from accumulating.

The dual execution path tries P2P first (fast, private, decentralized) and falls back to HTTP (slower but always available). request_id enables correlation over DataChannels; task_id serves the same purpose over HTTP polling.

Graceful degradation across three tiers means the system never stops working — it only slows down.


Previous: Lesson 06: Inter-Process Communication | Next: Lesson 08: Authentication and Security

Comments