Lesson 13: Building Monitoring Dashboards

Prerequisites: Lesson 07 — Application Protocol Design, Lesson 12 — State Management Without a Database

1. Why Dashboards Matter

A distributed system without a dashboard is a distributed system you cannot operate. Dashboards answer the operator’s first questions: Is the system healthy? Who is connected? What work is queued? What does the mesh topology look like? These are operational awareness questions that should be answerable in under two seconds.

chatixia-mesh’s hub dashboard focuses on metrics and system state. It is distinct from monitoring (continuous automated observation) and alerting (automated notification), which would require time-series storage and notification infrastructure.

2. Polling vs. Push

The hub uses the simplest strategy: poll the registry’s REST API every 5 seconds.

const refresh = useCallback(async () => {
  const [a, t, topo, pending] = await Promise.all([
    fetchAgents(), fetchTasks(), fetchTopology(), fetchPendingApprovals(),
  ])
  setAgents(Array.isArray(a) ? a : [])
  setTasks(Array.isArray(t) ? t : [])
  setTopology(topo)
  setPendingApprovals(Array.isArray(pending) ? pending : [])
}, [])

useEffect(() => {
  refresh()
  const interval = setInterval(refresh, 5000)
  return () => clearInterval(interval)
}, [refresh])

Promise.all parallelizes four API calls. Array guards defend against unexpected response shapes. Cleanup clears intervals on unmount.

Polling advantages: No persistent connection to manage, no reconnection logic, simple server implementation, works through any HTTP proxy.

Polling disadvantages: Data up to 5 seconds stale, most polls return unchanged data, fixed frequency for all data types.

WebSocket push eliminates staleness but adds connection management, subscription tracking, and fan-out complexity. chatixia-mesh chose polling because the hub is an operator tool with 1-2 simultaneous users — 5-second staleness is acceptable and the polling load is negligible.

3. Canvas-Based Topology Visualization

The NetworkTopology.tsx component renders agents as circles and DataChannel connections as lines on an HTML5 canvas. Canvas was chosen over DOM or SVG for its full drawing control — gradients, glowing dots, dashed curves, and per-pixel precision.

Layout Strategy

Small meshes (1-4 agents) use a hierarchical layout with the registry at top center and agents spread horizontally below. Large meshes (5+ agents) use a circular arrangement with the registry at center:

const angle = (2 * Math.PI * i) / nodes.length - Math.PI / 2
return {
  x: hubX + Math.cos(angle) * circleRadius,
  y: hubY + Math.sin(angle) * circleRadius,
}

The - Math.PI / 2 offset places the first node at 12 o’clock.

Drawing and Mesh Edges

The canvas draws in layers: background, hub-to-agent edges, hub node (gradient circle with glow), agent nodes (with health-colored dots), and mesh edges on top.

Mesh edges between agents use quadratic Bezier curves that bow toward the hub, computed via perpendicular offset and dot product to determine direction. This visually distinguishes peer-to-peer connections from hub-to-agent signaling lines.

High-DPI rendering scales the canvas bitmap by window.devicePixelRatio while keeping CSS size at logical pixels, preventing blurriness on Retina displays.

4. Design Tokens

Every visual value in the hub comes from theme.ts — no component defines its own colors or spacing. This centralized design system prevents drift and enables theme-wide changes from a single file.

The Atmospheric Luminescence system uses:

Tonal surface layering — subtle color shifts (#ffffff to #e5e9eb) instead of hard borders
Semantic color names — color.active (green), color.stale (amber), color.offline (red) describe purpose, not appearance
Three font families — Space Grotesk (headings), Manrope (body), JetBrains Mono (code)
Glassmorphism — semi-transparent backgrounds with backdrop-filter: blur() at three intensities
Ambient shadows — soft, large-radius, low-opacity shadows that create a floating effect, with a cyan-tinted primaryGlow for CTAs

5. Component Architecture

The hub uses top-down data flow. App.tsx owns all state and passes it to child components as props. Components do not fetch their own data — they receive it, render it, and optionally callback to the parent.

StatCard — Metric display (label + large number in accent color). Four form a grid: agents online, total agents, pending tasks, awaiting approval.

AgentCards — Responsive grid showing agent ID, health dot with glow, hostname, endpoint, peer ID, and skill count. Clickable to select an agent for chat. Empty state: “Waiting for agent heartbeats…”

TaskQueue — Table with expandable detail rows. State badges use color-coded text with 8% opacity tinted backgrounds (pending=amber, assigned=blue, completed=green, failed=red).

ApprovalQueue — Renders only when entries exist (returns null when empty). Shows agent name, ID, age, with Approve/Reject buttons that disable during flight.

AgentChat — Intervention interface that submits a user_intervention task to the selected agent via the existing task queue. Not real-time chat — it is a task submission form.

The API Client

api.ts defines TypeScript interfaces and thin fetch wrappers. The base URL is empty (same-origin), with Vite proxying in development. No error handling at this layer — errors propagate to the refresh function.

6. Practical Considerations

Stale data handling. The registry classifies agents by heartbeat age (active < 90s, stale < 270s, offline). The dashboard trusts this classification. If a poll fails, the dashboard silently retains last known state — a simplicity trade-off that means data could be stale indefinitely if the registry becomes unreachable.

Empty states. Every component handles “nothing to show” explicitly. ApprovalQueue takes a different approach: it returns null when empty, occupying zero layout space.

Relative timestamps. formatAge converts Unix epochs to 5s, 3m, 2h — deliberately simple with no days/weeks since task TTLs are measured in minutes.

Live clock. A 1-second ticker in the header serves as a liveness indicator (frozen = something wrong) and time reference for relative timestamps. Uses 24-hour format for operational clarity.

Summary

The chatixia-mesh hub dashboard demonstrates patterns common in monitoring interfaces:

Polling with setInterval + Promise.all provides simple, reliable freshness without WebSocket complexity
Canvas rendering gives full control over topology visualization
Centralized design tokens enforce visual consistency
Top-down data flow keeps state management predictable
Explicit empty states prevent confusion when data is absent

Each trade-off — polling over push, canvas over SVG, inline styles over CSS — is defensible for a single-operator dashboard but might need revisiting for a multi-user monitoring tool.

Previous: Lesson 12: State Management Without a Database | Next: Lesson 14: Threat Modeling