Files
webui/ARCHITECTURE.md
Hermes 7019c25021 Hermes Web UI — Sprints 11-14: multi-provider models, settings, session QoL, alerts, polish
Sprint 11 (v0.13): multi-provider model support, streaming smoothness
- Dynamic model dropdown populated from configured API keys (OpenAI, Anthropic,
  Google, DeepSeek, GLM, Kimi, MiniMax, OpenRouter, Nous Portal)
- Scroll pinning during streaming (no forced scroll when user has scrolled up)
- All route handlers extracted to api/routes.py (server.py now ~76 lines)

Sprint 12 (v0.14): settings panel, SSE reconnect, session QoL
- Settings panel (gear icon) -- persist default model and workspace server-side
- SSE auto-reconnect on network blips
- Pin/star sessions to top of sidebar
- Import session from JSON export

Sprint 13 (v0.15): cron alerts, background errors, session duplicate, tab title
- Cron completion alerts: toast per completion + unread badge on Tasks tab
- Background agent error banner when a non-active session errors mid-stream
- Session duplicate button
- Browser tab title reflects active session name

Sprint 14 (v0.16): Mermaid diagrams, file ops, session archive/tags, timestamps
- Mermaid diagram rendering inline (dark theme, lazy CDN load)
- File rename (double-click in file tree) and create folder
- Session archive (hide without deleting, toggle to show)
- Session tags -- #hashtag in title becomes colored chip + click-to-filter
- Message timestamps (HH:MM on hover, full date as tooltip)

Test suite: 224 tests across 14 sprint files + regression gate, 0 failures.
2026-03-31 07:02:47 +00:00

73 KiB

Hermes Web UI: Developer and Architecture Guide

This document is the canonical reference for anyone (human or agent) working on the Hermes Web UI. It covers the exact current state of the code, every design decision and quirk discovered during development, and a phased architecture improvement roadmap that runs in parallel with the feature roadmap in ROADMAP.md.

Keep this document updated as architecture changes are made.


1. Overview and Purpose

The Hermes Web UI is a lightweight web application that gives you a browser-based interface to the Hermes agent that is functionally equivalent to the CLI. It is modeled on the Claude-style interface: a three-panel layout with a sidebar for session management, a central chat area, and a right panel for workspace file browsing.

The design philosophy is deliberately minimal. There is no build step, no bundler, no frontend framework. The Python server is split into a routing shell (server.py) and business logic modules (api/). The frontend is six vanilla JS modules loaded from static/. This makes the code easy to modify from a terminal or by an agent.


2. File Inventory

<repo>/
server.py              Thin routing shell + HTTP Handler. ~76 lines. Pure Python.
                       Delegates all route handling to api/routes.py.
start.sh               Discovery script: finds agent dir, Python, starts server.
api/
  __init__.py          Package marker
  routes.py            All GET + POST route handlers (~802 lines)
  config.py            Shared configuration, constants, global state, model discovery (~453 lines)
  helpers.py           HTTP helpers: j(), bad(), require(), safe_resolve() (~57 lines)
  models.py            Session model + CRUD (~114 lines)
  workspace.py         File ops: list_dir, read_file_content, workspace helpers (~77 lines)
  upload.py            Multipart parser, file upload handler (~77 lines)
  streaming.py         SSE engine, run_agent integration, cancel support (~218 lines)
static/
  index.html           HTML template (served from disk)
  style.css            All CSS
  ui.js                DOM helpers, renderMd, tool cards, model dropdown (~671 lines)
  workspace.js         File tree, preview, file ops (~168 lines)
  sessions.js          Session CRUD, list rendering, search (~206 lines)
  messages.js          send(), SSE event handlers, approval, transcript (~310 lines)
  panels.js            Cron, skills, memory, workspace, todo, switchPanel (~600 lines)
  boot.js              Event wiring + boot IIFE (~154 lines)
tests/
  conftest.py          Isolated test server (port 8788, separate HERMES_HOME) (~240 lines)
  test_sprint1-11.py   Feature tests per sprint (13 files)
  test_regressions.py  Permanent regression gate
AGENTS.md              Instruction file for agents working in this directory.
ROADMAP.md             Feature and product roadmap document.
SPRINTS.md             Forward sprint plan with CLI + Claude parity targets.
ARCHITECTURE.md        THIS FILE.
TESTING.md             Manual browser test plan and automated coverage reference.
CHANGELOG.md           Release notes per sprint.
PORTABILITY.md         Portability design spec for download-and-run installs.
requirements.txt       Python dependencies.
.env.example           Sample environment variable overrides.

State directory (runtime data, separate from source):

~/.hermes/webui-mvp/
sessions/          One JSON file per session: {session_id}.json
workspaces.json    Registered workspaces list
last_workspace.txt Last-used workspace path
settings.json      (future) User settings

Log file:

/tmp/webui-mvp.log   stdout/stderr from the background server process

3. Runtime Environment

  • Python interpreter: /venv/bin/python
  • The venv has all Hermes agent dependencies (run_agent, tools/, cron/)
  • Server binds to 127.0.0.1:8787 (localhost only, not public internet)
  • Access from Mac: SSH tunnel: ssh -N -L 8787:127.0.0.1:8787 @
  • The server imports Hermes modules via sys.path.insert(0, parent_dir)

Environment variables controlling behavior:

HERMES_WEBUI_HOST              Bind address (default: 127.0.0.1)
HERMES_WEBUI_PORT              Port (default: 8787)
HERMES_WEBUI_DEFAULT_WORKSPACE Default workspace path for new sessions
HERMES_WEBUI_STATE_DIR         Where sessions/ folder lives
HERMES_CONFIG_PATH             Path to ~/.hermes/config.yaml
HERMES_WEBUI_DEFAULT_MODEL     Default LLM model string

Test isolation environment variables (set by conftest.py):

HERMES_WEBUI_PORT=8788                           Isolated test port
HERMES_WEBUI_STATE_DIR=~/.hermes/webui-mvp-test  Isolated test state
HERMES_WEBUI_DEFAULT_WORKSPACE=.../test-workspace Isolated test workspace

Tests NEVER talk to the production server (port 8787). The test state dir is wiped before each test session and deleted after. See: /tests/conftest.py

Per-request environment variables (set by chat handler, restored after):

TERMINAL_CWD         Set to session.workspace before running agent.
                     The terminal tool reads this to default cwd.
HERMES_EXEC_ASK      Set to "1" to enable approval gate for dangerous commands.
HERMES_SESSION_KEY   Set to session_id. The approval tool keys pending entries
                     by this value, enabling per-session approval state.

WARNING: These env vars are process-global. Two concurrent chat requests will clobber each other. This is safe only for single-user, single-concurrent-request use. See Architecture Phase B for the fix.


4. Server Architecture: Current State

4.1 HTTP Server Layer

Python stdlib ThreadingHTTPServer (from http.server). Each HTTP request runs in its own thread. The Handler class subclasses BaseHTTPRequestHandler with two methods:

do_GET    Routes: /, /health, /api/session, /api/sessions, /api/list,
                  /api/chat/stream, /api/file, /api/approval/pending
do_POST   Routes: /api/upload, /api/session/new, /api/session/update,
                  /api/session/delete, /api/chat/start, /api/chat,
                  /api/approval/respond

Routing is a flat if/elif chain inside each method. No routing framework.

Helper functions used by all handlers:

j(handler, payload, status=200)     Sends JSON response with correct headers
t(handler, payload, status=200, ct) Sends plain text or HTML response
read_body(handler)                  Reads and JSON-parses the POST body

CRITICAL ORDERING RULE in do_POST: The /api/upload check MUST appear BEFORE calling read_body(). read_body() calls handler.rfile.read() which consumes the HTTP body stream. The upload handler also needs rfile (to read the multipart payload). If read_body() runs first on a multipart request, the upload handler receives an empty body and the upload silently fails.

4.2 Session Model

Session is a plain Python class (not a dataclass, not SQLAlchemy):

Fields:
  session_id    hex string, 12 chars (uuid4().hex[:12])
  title         string, auto-set from first user message
  workspace     absolute path string, resolved at creation
  model         OpenRouter model ID string (e.g. "anthropic/claude-sonnet-4.6")
  messages      list of OpenAI-format message dicts
  created_at    float Unix timestamp
  updated_at    float Unix timestamp, updated on every save()

Key methods:
  path (property)  Returns SESSION_DIR/{session_id}.json
  save()           Writes __dict__ as pretty JSON to path, updates updated_at
  load(cls, sid)   Class method: reads JSON from disk, returns Session or None
  compact()        Returns metadata-only dict (no messages) for the session list

In-memory cache:
  SESSIONS = {}    dict: session_id -> Session object
  LOCK = threading.Lock()   defined but NOT currently used around SESSIONS access

get_session(sid): checks SESSIONS cache, loads from disk on miss, raises KeyError
new_session(workspace, model): creates Session, caches in SESSIONS, saves, returns
all_sessions(): scans SESSION_DIR/*.json + SESSIONS, deduplicates, sorts by updated_at,
                returns list of compact() dicts

all_sessions() does a full directory scan on every call.
With 10 sessions: negligible. With 1000+: will be slow.
See Architecture Phase C for the index file fix.

title_from(): takes messages list, finds first user message, returns first 64 chars. Called after run_conversation() completes to set the session title retroactively.

4.3 SSE Streaming Engine

This is the most architecturally interesting part. Two endpoints cooperate:

POST /api/chat/start     Receives the user message. Creates a queue.Queue, stores it
                         in STREAMS[stream_id], spawns a daemon thread running
                         _run_agent_streaming(), returns {stream_id} immediately.

GET  /api/chat/stream    Long-lived SSE connection. Reads from STREAMS[stream_id]
                         and forwards events to the browser until 'done' or 'error'.

Queue registry:

STREAMS = {}               dict: stream_id -> queue.Queue
STREAMS_LOCK = threading.Lock()

SSE event types and their data shapes:

token       {"text": "..."}                         LLM token delta
tool        {"name": "...", "preview": "..."}       Tool invocation started
approval    {"command": "...", "description": "...", "pattern_keys": [...]}
done        {"session": {compact_fields + messages}} Agent finished successfully
error       {"message": "...", "trace": "..."}       Agent threw exception

The SSE handler loop: - Blocks on queue.get(timeout=30) - On timeout (no events in 30s): sends a heartbeat comment (": heartbeat

") to keep the connection alive through proxies and firewalls - On 'done' or 'error' event: breaks the loop and returns - Catches BrokenPipeError and ConnectionResetError silently (browser disconnected)

Stream cleanup: _run_agent_streaming() pops its stream_id from STREAMS in a finally block. If the browser disconnects mid-stream, the daemon thread runs to completion and then cleans up. The queue fills and the put_nowait() calls fail silently (queue.Full is caught).

Fallback sync endpoint: POST /api/chat still exists and holds the connection open until the agent finishes. The frontend never uses it but it can be useful for debugging.

4.4 Agent Invocation (_run_agent_streaming)

def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id):
  1. Fetches session from SESSIONS (not from disk -- session was just updated by /api/chat/start)
  2. Sets TERMINAL_CWD, HERMES_EXEC_ASK, HERMES_SESSION_KEY env vars
  3. Creates AIAgent with:
    • model=model, platform='cli', quiet_mode=True
    • enabled_toolsets=CLI_TOOLSETS (from config.yaml or hardcoded default)
    • session_id=session_id
    • stream_delta_callback=on_token (fires per token)
    • tool_progress_callback=on_tool (fires per tool invocation)
  4. Calls agent.run_conversation(user_message=msg_text, conversation_history=s.messages, task_id=session_id) NOTE: keyword is task_id NOT session_id (common mistake, documented in skill)
  5. On return: updates s.messages, calls title_from(), saves session
  6. Puts ('done', {session: ...}) into queue
  7. Finally block: restores env vars, pops stream_id from STREAMS

on_token callback: if text is None: return # end-of-stream sentinel from AIAgent put('token', {'text': text})

on_tool callback: put('tool', {'name': name, 'preview': preview}) # Also immediately surface any pending approval: if has_pending(session_id): with _lock: p = dict(_pending.get(session_id, {})) if p: put('approval', p)

The approval surface-on-tool logic means approvals appear immediately after the tool fires (within the same SSE stream), without waiting for the next poll cycle.

4.5 Approval System Integration

The approval system uses the existing Hermes gateway module at tools/approval.py. All state lives in module-level variables in that file:

_pending = {}        dict: session_key -> pending_entry_dict
_lock = Lock()       protects _pending
_permanent_approved  set of permanently approved pattern keys

Because server.py imports tools.approval at module load time and everything runs in the same process, this state IS shared between HTTP threads and agent daemon threads.

Important: this only works because Python imports are cached (sys.modules). The same module object is used everywhere. If the approval module were ever imported in a subprocess or via importlib.reload(), this would break.

GET /api/approval/pending: - Peeks at _pending[sid] without removing it - Returns {pending: entry} or {pending: null} - Called by the browser every 1500ms while S.busy is true (polling fallback)

POST /api/approval/respond: - Pops _pending[sid] (removes it) - For choice "once" or "session": calls approve_session(sid, pattern_key) for each key - For choice "always": calls approve_session + approve_permanent + save_permanent_allowlist - For choice "deny": just pops, does nothing (agent gets denied result) - Returns {ok: true, choice: choice}

4.6 File Upload Parser

parse_multipart(rfile, content_type, content_length): - Reads all content_length bytes from rfile into memory (up to MAX_UPLOAD_BYTES = 20MB) - Extracts boundary from Content-Type header - Splits raw bytes on b'--' + boundary - For each part: parses MIME headers via email.parser.HeaderParser - Returns (fields, files) where fields is {name: value} and files is {name: (filename, bytes)}

handle_upload(handler): - Calls parse_multipart() - Validates: file field present, filename present, session exists - Sanitizes filename: replaces non-word chars with _, truncates to 200 chars - Writes bytes to session.workspace / safe_name - Returns {filename, path, size}

Why not cgi.FieldStorage: - Deprecated in Python 3.11+ - Broken for binary files (silently corrupts or throws) - The manual parser handles all file types correctly

4.7 File System Operations

safe_resolve(root, requested): - Resolves requested path relative to root - Calls .relative_to(root) to assert the result is inside root - Raises ValueError on path traversal (../../etc/passwd)

list_dir(workspace, rel='.'): - Calls safe_resolve, then iterdir() - Sorts: directories first, then files, case-insensitive alpha within each group - Returns up to 200 entries with {name, path, type, size}

read_file_content(workspace, rel): - Calls safe_resolve - Enforces MAX_FILE_BYTES = 200KB size limit - Reads as UTF-8 with errors='replace' (binary files show replacement chars) - Returns {path, content, size, lines}


5. Frontend Architecture: Current State

5.1 Structure

The frontend is served from static/ as separate files: one HTML template, one CSS file, and six JavaScript modules (~2,025 lines total). External dependency: Prism.js from CDN (syntax highlighting, loaded async/deferred).

Six JS modules loaded in order at end of :

  1. ui.js (~589 lines) DOM helpers, renderMd, tool card rendering, global state
  2. workspace.js (~168 lines) File tree, preview, file operations
  3. sessions.js (~206 lines) Session CRUD, list rendering, search
  4. messages.js (~310 lines) send(), SSE event handlers, approval, transcript
  5. panels.js (~600 lines) Cron, skills, memory, workspace, todo, switchPanel
  6. boot.js (~152 lines) Event wiring + boot IIFE

Three-panel layout (in static/index.html):

<aside class="sidebar">    Left panel: session list, nav tabs, model selector
<main class="main">        Center: topbar, messages area, approval card, composer
<aside class="rightpanel"> Right panel: workspace file tree and file preview

5.2 Global State

const S = {
  session:      null,   // current Session compact dict (includes model, workspace, title)
  messages:     [],     // full messages array for current session
  entries:      [],     // current directory listing
  busy:         false,  // true while agent is running (disables Send button)
  pendingFiles: []      // File objects queued for upload with next message
}

const INFLIGHT = {}
// keyed by session_id while a request is in-flight for that session
// value: {messages: [...snapshot...], uploaded: [...filenames...]}
// Purpose: if user switches sessions while a request is pending,
//   switching back shows the in-progress state instead of the saved state

5.3 Key Functions Reference

Session management: newSession() POST /api/session/new, update S.session, save to localStorage loadSession(sid) GET /api/session?session_id=X, check INFLIGHT first, update S deleteSession(sid) POST /api/session/delete, handle active/inactive cases correctly renderSessionList() GET /api/sessions, rebuild #sessionList DOM

Chat: send() Main action: upload files, POST /api/chat/start, open EventSource uploadPendingFiles() Upload each file in S.pendingFiles, return filenames array appendThinking() Adds three-dot animation to message list removeThinking() Removes thinking dots (called on first token or on error)

Rendering: renderMessages() Full rebuild of #msgInner from S.messages renderMd(raw) Homegrown markdown renderer (see 5.4 for known gaps) syncTopbar() Updates topbar title, meta, model chip, workspace chip renderTray() Updates attach tray showing pending files

Approval: showApprovalCard(p) Shows the approval card with command/description text hideApprovalCard() Hides approval card, clears text respondApproval(ch) POST /api/approval/respond, hide card startApprovalPolling setInterval 1500ms GET /api/approval/pending stopApprovalPolling clearInterval

UI helpers: setStatus(t) Updates #statusText in composer footer setBusy(v) Sets S.busy, disables/enables Send button, clears status on false showToast(msg, ms) Bottom-center fade toast (default 2800ms) autoResize() Auto-resize #msg textarea up to 200px

Files: loadDir(path) GET /api/list, rebuild #fileTree openFile(path) GET /api/file, show in #previewArea

Transcript: transcript() Builds markdown string from S.messages for download

Boot IIFE: localStorage key 'hermes-webui-session' stores last session_id On load: try to loadSession(saved), fall back to empty state if missing or fails NEVER auto-creates a session on boot

5.4 Markdown Renderer (renderMd)

A hand-rolled regex chain. Processes in this order:

  1. Code blocks (lang ...) ->
     with language header
  2. Inline code (...) ->
  3. Bold+italic (..) ->
  4. Bold (...) ->
  5. Italic (...) ->
  6. Headings (# ## ###) ->

  7. Horizontal rules (---+) ->
  8. Blockquotes (> ...) ->
  9. Unordered lists (- or * or + at line start) ->