Hermes Web UI — Sprints 11-14: multi-provider models, settings, session QoL, alerts, polish

Sprint 11 (v0.13): multi-provider model support, streaming smoothness
- Dynamic model dropdown populated from configured API keys (OpenAI, Anthropic,
  Google, DeepSeek, GLM, Kimi, MiniMax, OpenRouter, Nous Portal)
- Scroll pinning during streaming (no forced scroll when user has scrolled up)
- All route handlers extracted to api/routes.py (server.py now ~76 lines)

Sprint 12 (v0.14): settings panel, SSE reconnect, session QoL
- Settings panel (gear icon) -- persist default model and workspace server-side
- SSE auto-reconnect on network blips
- Pin/star sessions to top of sidebar
- Import session from JSON export

Sprint 13 (v0.15): cron alerts, background errors, session duplicate, tab title
- Cron completion alerts: toast per completion + unread badge on Tasks tab
- Background agent error banner when a non-active session errors mid-stream
- Session duplicate button
- Browser tab title reflects active session name

Sprint 14 (v0.16): Mermaid diagrams, file ops, session archive/tags, timestamps
- Mermaid diagram rendering inline (dark theme, lazy CDN load)
- File rename (double-click in file tree) and create folder
- Session archive (hide without deleting, toggle to show)
- Session tags -- #hashtag in title becomes colored chip + click-to-filter
- Message timestamps (HH:MM on hover, full date as tooltip)

Test suite: 224 tests across 14 sprint files + regression gate, 0 failures.
This commit is contained in:
Hermes
2026-03-31 07:02:47 +00:00
parent 732d227b97
commit 7019c25021
29 changed files with 2871 additions and 1122 deletions

View File

@@ -11,39 +11,63 @@
## 1. Overview and Purpose
The Hermes Web UI is a lightweight, single-file web application that gives you
a browser-based interface to the Hermes agent that is functionally equivalent to the CLI.
It is modeled on the Claude interface: a three-panel layout with a sidebar for
session management, a central chat area, and a right panel for workspace file browsing.
The Hermes Web UI is a lightweight web application that gives you a browser-based
interface to the Hermes agent that is functionally equivalent to the CLI. It is modeled on
the Claude-style interface: a three-panel layout with a sidebar for session management,
a central chat area, and a right panel for workspace file browsing.
The design philosophy is deliberately minimal. There is no build step, no bundler, no
frontend framework. Everything ships from a single Python file. This makes the code easy
to modify from a terminal or by an agent, but it creates architectural debt that grows as
the feature set expands.
frontend framework. The Python server is split into a routing shell (server.py) and
business logic modules (api/). The frontend is six vanilla JS modules loaded from static/.
This makes the code easy to modify from a terminal or by an agent.
---
## 2. File Inventory
<agent-dir>/webui-mvp/
server.py Main server file. ~1150 lines. Pure Python.
HTTP server, all API handlers, Session model, SSE engine,
approval wiring, file upload parser. No inline HTML/CSS/JS.
(Phase A+E complete: HTML/CSS/JS all extracted to static/)
server.py.bak Backup from a prior iteration. Kept for reference.
server_new.py Intermediate ~900-line draft. Superseded by server.py.
Safe to delete once Wave 1 begins.
start.sh Convenience script: kills running instance, starts server.py
via nohup, writes stdout/stderr to /tmp/webui-mvp.log
AGENTS.md Instruction file for agents working in this directory.
ROADMAP.md Feature and product roadmap document.
ARCHITECTURE.md THIS FILE.
<repo>/
server.py Thin routing shell + HTTP Handler. ~76 lines. Pure Python.
Delegates all route handling to api/routes.py.
start.sh Discovery script: finds agent dir, Python, starts server.
api/
__init__.py Package marker
routes.py All GET + POST route handlers (~802 lines)
config.py Shared configuration, constants, global state, model discovery (~453 lines)
helpers.py HTTP helpers: j(), bad(), require(), safe_resolve() (~57 lines)
models.py Session model + CRUD (~114 lines)
workspace.py File ops: list_dir, read_file_content, workspace helpers (~77 lines)
upload.py Multipart parser, file upload handler (~77 lines)
streaming.py SSE engine, run_agent integration, cancel support (~218 lines)
static/
index.html HTML template (served from disk)
style.css All CSS
ui.js DOM helpers, renderMd, tool cards, model dropdown (~671 lines)
workspace.js File tree, preview, file ops (~168 lines)
sessions.js Session CRUD, list rendering, search (~206 lines)
messages.js send(), SSE event handlers, approval, transcript (~310 lines)
panels.js Cron, skills, memory, workspace, todo, switchPanel (~600 lines)
boot.js Event wiring + boot IIFE (~154 lines)
tests/
conftest.py Isolated test server (port 8788, separate HERMES_HOME) (~240 lines)
test_sprint1-11.py Feature tests per sprint (13 files)
test_regressions.py Permanent regression gate
AGENTS.md Instruction file for agents working in this directory.
ROADMAP.md Feature and product roadmap document.
SPRINTS.md Forward sprint plan with CLI + Claude parity targets.
ARCHITECTURE.md THIS FILE.
TESTING.md Manual browser test plan and automated coverage reference.
CHANGELOG.md Release notes per sprint.
PORTABILITY.md Portability design spec for download-and-run installs.
requirements.txt Python dependencies.
.env.example Sample environment variable overrides.
State directory (runtime data, separate from source):
~/.hermes/webui-mvp/
sessions/ One JSON file per session: {session_id}.json
test-workspace/ Default empty workspace used during development
workspaces.json Registered workspaces list
last_workspace.txt Last-used workspace path
settings.json (future) User settings
Log file:
@@ -301,13 +325,21 @@ read_file_content(workspace, rel):
### 5.1 Structure
The entire frontend is ~750 lines inside the HTML Python raw string.
Structure: <head> with CSS only (no external stylesheets), <body> with three-panel layout,
<script> with all JavaScript (no external libraries).
The frontend is served from static/ as separate files: one HTML template, one CSS file,
and six JavaScript modules (~2,025 lines total). External dependency: Prism.js from CDN
(syntax highlighting, loaded async/deferred).
Three-panel layout:
Six JS modules loaded in order at end of <body>:
1. ui.js (~589 lines) DOM helpers, renderMd, tool card rendering, global state
2. workspace.js (~168 lines) File tree, preview, file operations
3. sessions.js (~206 lines) Session CRUD, list rendering, search
4. messages.js (~310 lines) send(), SSE event handlers, approval, transcript
5. panels.js (~600 lines) Cron, skills, memory, workspace, todo, switchPanel
6. boot.js (~152 lines) Event wiring + boot IIFE
<aside class="sidebar"> Left panel: session list, model selector, workspace path
Three-panel layout (in static/index.html):
<aside class="sidebar"> Left panel: session list, nav tabs, model selector
<main class="main"> Center: topbar, messages area, approval card, composer
<aside class="rightpanel"> Right panel: workspace file tree and file preview
@@ -477,13 +509,17 @@ Step-by-step trace of what happens when you type a message and press Send:
## 7. Dependency Map
Direct imports in server.py:
server.py imports from api/ modules (config, helpers, models, workspace, upload, streaming).
The api/ modules in turn import Hermes internals:
run_agent.AIAgent Main agent class. Wraps LLM + tool execution.
tools.approval.* Module-level approval state.
yaml Config loading.
Standard library: json, os, re, sys, threading, time, traceback, uuid,
http.server, pathlib, urllib.parse, email.parser, queue
api/streaming.py imports:
run_agent.AIAgent Main agent class. Wraps LLM + tool execution.
api/config.py imports:
yaml Config loading.
server.py imports:
tools.approval.* Module-level approval state (with graceful fallback).
Standard library across all modules: json, os, re, sys, threading, time, traceback,
uuid, http.server, pathlib, urllib.parse, email.parser, queue, collections
AIAgent constructor parameters used:
@@ -561,52 +597,33 @@ restriction from the UI yet (see ROADMAP.md Wave 4 for the plan).
These phases run in parallel with the feature roadmap. Each phase targets software
quality: testability, resilience, maintainability, and modularity.
### Phase A: File Separation (Priority: High, Effort: Medium)
### Phase A: File Separation -- COMPLETE
Split server.py into a proper package.
Split server.py into a proper package. Completed across Sprints 4-10.
Target structure:
Current structure:
webui-mvp/
server.py Entry point: starts server, imports api/
<repo>/
server.py Entry point + HTTP Handler routing (~704 lines)
api/
__init__.py
handlers.py do_GET / do_POST routing and dispatch
session_store.py Session class, get_session, new_session, all_sessions, SESSIONS
streaming.py _run_agent_streaming, STREAMS, STREAMS_LOCK, _sse()
upload.py parse_multipart, handle_upload
files.py safe_resolve, list_dir, read_file_content
approval.py Thin wrapper around tools.approval for the HTTP API
config.py Configuration loading (env vars, config.yaml)
config.py Configuration, constants, global state (~273 lines)
helpers.py HTTP helpers: j(), bad(), require(), safe_resolve() (~57 lines)
models.py Session model + CRUD (~114 lines)
workspace.py File ops, workspace management (~77 lines)
upload.py Multipart parser, file upload handler (~77 lines)
streaming.py SSE engine, run_agent, cancel support (~218 lines)
static/
index.html HTML document (served directly from disk)
index.html HTML document (served from disk)
style.css All CSS
[app.js deleted] Replaced by 6 modules: ui.js, workspace.js, sessions.js,
messages.js, panels.js, boot.js
ui.js, workspace.js, sessions.js, messages.js, panels.js, boot.js
tests/
test_session_crud.py
test_upload.py
test_streaming.py
test_approval.py
test_files.py
frontend/
test_markdown.html
test_session_state.html
conftest.py Isolated test server on port 8788
test_sprint1-10.py Feature tests per sprint (12 files)
test_regressions.py Permanent regression gate
Implementation steps:
1. Extract CSS and HTML to static/style.css and static/index.html. No content changes.
Server serves index.html from disk: handler reads Path('static/index.html').read_text()
2. Extract JS to 6 static modules (complete -- app.js deleted Sprint 9)
Add GET /static/* handler in do_GET.
3. Extract Session class and helpers to api/session_store.py
4. Extract _run_agent_streaming and SSE helpers to api/streaming.py
5. Extract parse_multipart and handle_upload to api/upload.py
6. Extract list_dir and friends to api/files.py
7. Refactor handlers.py to import from the above modules
8. server.py becomes: config setup, start server, import Handler from handlers.py
Benefit: Each file is under ~200 lines. Agents can read and modify individual files
without loading the full 1100-line blob.
Remaining: server.py still has all 49 route handlers in one do_GET/do_POST class.
Sprint 11 plans extracting these to api/routes.py, making server.py a ~50-line shell.
### Phase B: Thread-Safe Request Context (Priority: Critical, Effort: Medium)
@@ -637,72 +654,36 @@ Option 3 (interim, safe for single-user): Wrap the env var block in a per-sessio
Phase B also includes: review all other os.environ reads/writes in the codebase for
similar thread-safety issues.
### Phase C: Session Store Improvements (Priority: Medium, Effort: Medium)
### Phase C: Session Store Improvements -- COMPLETE
Three problems to fix:
All three problems fixed in Sprint 5:
1. Unbounded SESSIONS cache:
Replace dict with functools.lru_cache wrapper or a simple OrderedDict with max size.
Evict LRU entries when size exceeds 100.
1. SESSIONS cache: OrderedDict with LRU cap of 100, oldest evicted automatically.
2. LOCK: all SESSIONS dict reads/writes wrapped with LOCK (from Sprint 1).
3. Session index: `sessions/_index.json` maintained on every save/delete.
`all_sessions()` reads the index file (O(1)) instead of scanning all JSONs.
2. No locking around SESSIONS:
Wrap all SESSIONS dict reads and writes with LOCK (already defined, just unused).
Pattern: with LOCK: s = SESSIONS.get(sid)
### Phase D: Input Validation and Error Handling -- COMPLETE
3. O(n) directory scan in all_sessions():
Add an index file: SESSION_DIR/index.json
Contents: list of compact() dicts, sorted by updated_at
Maintained on every Session.save() and every delete.
all_sessions() reads index.json (one file read) instead of scanning all JSONs.
get_session() still loads the full {session_id}.json on cache miss.
Index rebuild tool: a function that regenerates index.json from all *.json files.
Completed in Sprint 4-6:
### Phase D: Input Validation and Error Handling (Priority: Medium, Effort: Low)
1. `require()` and `bad()` helpers in `api/helpers.py` for parameter validation.
2. All endpoints return clean 400/404 responses instead of tracebacks.
3. Structured JSON request logging via `log_request()` override (Sprint 1).
1. Add a validate() helper:
def validate(body, *required_fields):
missing = [f for f in required_fields if not body.get(f)]
if missing: raise ValueError(f"Missing required fields: {missing}")
### Phase E: Frontend Modularization -- COMPLETE
2. Refine the outer try/except in do_GET and do_POST:
except ValueError as e:
return j(self, {'error': str(e)}, status=400)
except KeyError as e:
return j(self, {'error': f'Not found: {e}'}, status=404)
except Exception as e:
log.exception('Unhandled error')
return j(self, {'error': 'Internal server error'}, status=500)
# Never expose tracebacks to the client (security risk even on localhost)
Completed across Sprints 5, 6, and 9:
3. Add request duration logging:
Log at INFO level: {method} {path} -> {status} in {duration}ms
1. HTML extracted to `static/index.html` (Sprint 6).
2. CSS extracted to `static/style.css` (Sprint 4).
3. `app.js` deleted Sprint 9, replaced by 6 focused modules:
`ui.js`, `workspace.js`, `sessions.js`, `messages.js`, `panels.js`, `boot.js`.
Loaded as standard `<script>` tags (not ES modules) in dependency order.
4. Prism.js added for syntax highlighting (Sprint 8) via CDN, deferred load.
### Phase E: Frontend Modularization (Priority: Medium, Effort: High)
After Phase A splits the HTML/JS into files, Phase E improves the JavaScript itself.
1. Switch to ES Modules (type="module"):
app.js deleted Sprint 9 -- replaced by 6 modules:
- state.js: export S, INFLIGHT
- sessions.js: session CRUD functions
- chat.js: send(), SSE handling
- files.js: loadDir(), openFile()
- upload.js: uploadPendingFiles(), addFiles(), renderTray()
- approval.js: approval card and polling
- markdown.js: renderMd()
- ui.js: setStatus, setBusy, showToast, syncTopbar
Each module imports what it needs from state.js and other modules.
2. Replace renderMd with marked.js:
CDN: https://cdn.jsdelivr.net/npm/marked/marked.min.js
No bundler needed, ~50KB, handles tables, nested lists, HTML sanitization.
Usage: marked.parse(raw) -- drop-in replacement.
Add DOMPurify alongside for XSS sanitization of rendered HTML.
3. Add Prism.js for syntax highlighting:
CDN: https://cdn.jsdelivr.net/npm/prismjs
Apply after renderMd: Prism.highlightAllUnder(element)
Supports 200+ languages with auto-detection.
Remaining: renderMd() is still a hand-rolled regex chain. Tables partially supported.
Replacing with marked.js + DOMPurify is a future improvement (not blocking).
### Phase F: API Design Cleanup (Priority: Low, Effort: Medium)
@@ -719,16 +700,11 @@ After Phase A splits the HTML/JS into files, Phase E improves the JavaScript its
4. Consistent naming: use snake_case for all JSON keys.
### Phase G: Observability (Priority: Low, Effort: Low)
### Phase G: Observability -- MOSTLY COMPLETE
1. Structured JSON logging to /tmp/webui-mvp.log:
{"ts": "...", "method": "POST", "path": "/api/chat/start", "status": 200, "ms": 12}
2. Enhanced /health response:
{"status": "ok", "sessions": 10, "active_streams": 2, "uptime_s": 3600, "version": "0.3"}
3. GET /api/debug/stats (localhost only):
{"sessions_cached": N, "streams_active": M, "memory_mb": X}
1. Structured JSON logging: COMPLETE (Sprint 1). Per-request JSON to /tmp/webui-mvp.log.
2. Enhanced /health: COMPLETE (Sprint 7). Returns `active_streams`, `uptime_seconds`.
3. GET /api/debug/stats: NOT YET IMPLEMENTED. Low priority.
### Phase H: Authentication (Priority: Low, Effort: Medium)
@@ -740,29 +716,15 @@ Optional password gate for non-SSH-tunnel deployments.
4. All API endpoints check cookie if HERMES_WEBUI_PASSWORD is set
5. Cookie validity: 30 days from last activity
### Phase I: Test Infrastructure (Priority: High, Effort: High)
### Phase I: Test Infrastructure -- COMPLETE
No tests exist today. This is the highest-risk technical debt.
190 tests across 12 test files + regression gate. Isolated test server on port 8788
with separate HERMES_HOME, wiped per run. Production data never touched.
1. Python unit tests (pytest):
- tests/test_session_crud.py: Session class, get_session, new_session, all_sessions
- tests/test_upload.py: parse_multipart directly with known byte payloads
- tests/test_files.py: safe_resolve, list_dir, read_file_content with tmp dirs
- tests/test_streaming.py: mock AIAgent, verify event sequence
- tests/test_approval.py: approval state machine
Test files: `test_sprint1.py` through `test_sprint10.py`, `test_regressions.py`.
Fixtures in `conftest.py`: auto-cleanup, cron isolation, workspace reset.
2. HTTP integration tests:
- Start a test server on a random port
- Drive it with httpx or requests
- Verify all API endpoints return correct shapes and status codes
3. Frontend tests (no build step):
- tests/frontend/test_markdown.html: known input -> expected HTML output assertions
- Run via: python3 -m http.server and open in browser, or use playwright
4. CI (GitHub Actions):
- .github/workflows/test.yml: on push, run pytest + ruff lint
- Target: zero test failures before merging any feature branch
Remaining: no CI (GitHub Actions), no frontend tests (browser-based).
### Phase J: Performance (Priority: Low, Effort: High)
@@ -985,7 +947,7 @@ Resolution: Phase B replaces with thread-local or explicit parameter passing.
Bug fix: Escape from file editor now cancels edits
New endpoints: POST /api/crons/create, GET /api/session/export
Tests: 16 new, 106/106 total
v0.0.6 Sprint 8 (March 31, 2026):
v0.10 Sprint 8 (March 31, 2026):
Features: edit+regenerate messages, regenerate last response, clear conversation,
Prism.js syntax highlighting, message queue (MSG_QUEUE + drain on idle),
INFLIGHT-first loadSession (message persists on switch-away/back)
@@ -994,22 +956,22 @@ Resolution: Phase B replaces with thread-local or explicit parameter passing.
Tests: 14 new, 139/139 total
JS: MSG_QUEUE global, updateQueueBadge(), setBusy drain logic, send() queues when busy,
loadSession checks INFLIGHT before server fetch
v0.1.0 Concurrency sweeps (March 31, 2026):
v0.12.2 Concurrency sweeps (March 31, 2026):
R10-R15: approval cross-session, activity bar per-session, live card
restore on switch-back, settled cards after done, model source,
newSession card clear. 190/190 tests.
v0.0.8 Sprint 10 (March 31, 2026):
v0.12 Sprint 10 (March 31, 2026):
Arch: server.py split into api/ modules (config, helpers, models, workspace, upload, streaming)
Features: background task cancel, cron run history, tool card UX polish
Post-sprint fixes: SSE cancel event breaks loop, Cancel button always hidden on setBusy(false),
S.activeStreamId initialized, tool-card show-more uses data attributes, version label v0.0.8,
S.activeStreamId initialized, tool-card show-more uses data attributes, version label v0.12,
Session.__init__ **kwargs forward-compat, test cron isolation via HERMES_HOME,
last_workspace reset in conftest between tests, tool cards grouped by assistant turn
Tests: 18 new, 167/167 total
Regressions fixed: uuid, AIAgent, has_pending, SSE cancel loop, Session.__init__ tool_calls
test_regressions.py: 10 tests -- one per introduced bug, permanent regression gate
Total after fixes: 177/177
v0.0.7 Sprint 9 (March 31, 2026):
v0.11 Sprint 9 (March 31, 2026):
Arch: app.js deleted; replaced by ui.js, workspace.js, sessions.js, messages.js, panels.js, boot.js
Features: tool call cards (inline collapsible, live + history), attachment persistence,
todo list panel (parses tool results from session history)