Brings all documentation up to date after Sprint 16, PRs #18-25: - CHANGELOG: add v0.18 (Sprint 16), v0.17.3 (bug fixes), update footer - ROADMAP: Sprint 16 in history, custom model discovery feature, updated line counts and architecture table, fix Wave 3 checkboxes - SPRINTS: bump version to v0.18, update parity percentages, fix reasoning display sprint reference - ARCHITECTURE: update all file line counts to match current source, add Session model fields (pinned, archived, project_id, tool_calls), document ICONS constant and sessions.js overlay pattern, update Phase A/I sections - BUGS: restructure with open/fixed sections, add v0.17.3 fixes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
457 lines
21 KiB
Markdown
457 lines
21 KiB
Markdown
# Hermes Web UI -- Forward Sprint Plan
|
|
|
|
> Current state: v0.18 | 237 tests | Daily driver ready
|
|
> This document plans the path from here to two targets:
|
|
>
|
|
> Target A: 1:1 feature parity with the Hermes CLI (everything you can do from the
|
|
> terminal, you can do from the browser)
|
|
>
|
|
> Target B: 1:1 parity with Claude's reproducible features (the full Claude
|
|
> browser UI experience, minus things only Anthropic can build)
|
|
>
|
|
> Sprints are ordered by impact. Each builds on the one before.
|
|
> Past sprint history lives in CHANGELOG.md.
|
|
|
|
---
|
|
|
|
## Where we are now (v0.18)
|
|
|
|
**CLI parity: ~85% complete.** Core agent loop, all tools visible, workspace
|
|
file ops, cron/skills/memory CRUD, session management, streaming, cancel,
|
|
multi-provider models, custom endpoint discovery -- all solid. Gaps are
|
|
subagent visibility, toolset control, and code execution.
|
|
|
|
**Claude parity: ~65% complete.** Chat, streaming, file browser, session
|
|
management, tool cards, syntax highlighting, model switching, projects,
|
|
settings, Mermaid diagrams, mobile layout -- all present. Gaps are
|
|
artifacts, voice, reasoning display, sharing.
|
|
|
|
---
|
|
|
|
## Sprint 11 -- Multi-Provider Models + Streaming Smoothness (COMPLETED)
|
|
|
|
**Theme:** Use any Hermes-supported model provider from the UI, and make
|
|
heavy agentic work feel fast and fluid.
|
|
|
|
**Why now:** Two high-impact gaps converge here. First, the model dropdown is
|
|
hardcoded to ~10 OpenRouter model strings. If Hermes is configured with direct
|
|
Anthropic, OpenAI, Google, or other API providers, the web UI can't use them.
|
|
This means users who set up Hermes with native API keys are locked out of
|
|
their own models in the browser. Second, the streaming render path rebuilds
|
|
the entire message list on every tool event, causing visible flicker during
|
|
heavy agentic work.
|
|
|
|
### Track A: Bugs
|
|
- Tool card DOM thrash: renderMessages() rebuilds all cards on each tool event.
|
|
Switch to incremental append (append new card to existing group, no full rebuild).
|
|
- Scroll position lost on re-render during streaming (messages jump).
|
|
|
|
### Track B: Features
|
|
- **Multi-provider model support:** Query Hermes agent's configured providers
|
|
and available models at startup via a new `GET /api/models` endpoint. The
|
|
model dropdown populates dynamically from whatever providers the user has
|
|
configured (OpenRouter, direct OpenAI, direct Anthropic, Google, DeepSeek,
|
|
etc.). Group by provider. Fall back to the current hardcoded list if the
|
|
agent query fails. This ensures the web UI can use any model the CLI can.
|
|
- **Incremental tool card streaming:** Instead of renderMessages() on each
|
|
tool event, maintain a live card group element per turn and append/update
|
|
cards in place. The assistant text row below the cards also updates
|
|
incrementally (already does via assistantBody.innerHTML).
|
|
- **Smooth scroll:** Pin scroll to bottom during streaming unless user has
|
|
manually scrolled up (read-back mode). Resume pinning when user scrolls
|
|
back to bottom.
|
|
|
|
### Track C: Architecture
|
|
- `api/routes.py`: extract the 49 if/elif route handlers from server.py's
|
|
Handler class into a dedicated routes module. server.py becomes a true
|
|
~50-line shell: imports, Handler stub that delegates to routes, main().
|
|
Completes the server split started in Sprint 10.
|
|
|
|
**Tests:** ~15 new. Total: ~205.
|
|
**Hermes CLI parity impact:** High (model provider parity is a major CLI gap)
|
|
**Claude parity impact:** Low (streaming smoothness)
|
|
|
|
---
|
|
|
|
## Sprint 12 -- Settings Panel + Reliability + Session QoL
|
|
|
|
**Theme:** Persist your preferences, survive network blips, and organize sessions.
|
|
|
|
**Why now:** Three daily-driver friction points converge. First, default model
|
|
and workspace aren't persisted server-side -- every restart loses them. Second,
|
|
SSH tunnel hiccups during long agent runs silently kill the response with no
|
|
recovery. Third, after 50+ sessions the flat chronological list makes it hard
|
|
to keep important conversations accessible.
|
|
|
|
### Track A: Bugs
|
|
- Workspace validation on add doesn't check symlinks (shows as invalid when
|
|
it's actually a valid symlink to a directory).
|
|
|
|
### Track B: Features
|
|
- **Settings panel:** A gear icon in the topbar opens a slide-in settings panel.
|
|
Sections: Default Model, Default Workspace. Persisted server-side in
|
|
`~/.hermes/webui-mvp/settings.json`. Server reads settings on startup and
|
|
uses them as defaults. `GET /api/settings` + `POST /api/settings` endpoints.
|
|
- **SSE auto-reconnect:** When the EventSource connection drops mid-stream
|
|
(network blip, SSH tunnel hiccup), auto-reconnect once using the same
|
|
`stream_id`. The server-side queue holds undelivered events. If reconnect
|
|
fails after 5s, show error banner. This is the #1 reliability gap for
|
|
remote VPS usage.
|
|
- **Pin sessions:** A star icon on any session in the sidebar. Pinned sessions
|
|
float to the top of the list above date groups. Persisted on the session
|
|
JSON as `pinned: true`. Toggle on click. Simple and high quality-of-life.
|
|
- **Import session from JSON:** Drag a `.json` export file into the sidebar
|
|
(or click an import button) to restore it as a new session. Mirrors the
|
|
existing JSON export. Useful for moving sessions between machines.
|
|
|
|
### Track C: Architecture
|
|
- Settings schema: `settings.json` with typed fields, validated on load, with
|
|
sane defaults. Served via `GET /api/settings`, written via `POST /api/settings`.
|
|
- SSE reconnect: server keeps `STREAMS[stream_id]` alive for 60s after
|
|
client disconnect, allowing reconnect with the same stream_id.
|
|
|
|
**Tests:** ~15 new. Total: ~216.
|
|
**Hermes CLI parity impact:** Medium (settings persistence, reliability)
|
|
**Claude parity impact:** Medium (settings panel, pinned conversations)
|
|
|
|
---
|
|
|
|
## Sprint 13 -- Alerts, Session QoL, Polish
|
|
|
|
**Theme:** Know what Hermes is doing, and small quality-of-life wins.
|
|
|
|
**Why now:** Cron jobs run silently. Background errors surface nowhere. You have
|
|
no way to know a long-running task finished (or failed) while you were on another
|
|
tab. Meanwhile, a few small UX gaps (no session duplicate, no tab title) add up
|
|
to daily friction.
|
|
|
|
### Track A: Bugs
|
|
- Symlink workspace validation — confirmed already fixed (`.resolve()` follows
|
|
symlinks before `is_dir()` check).
|
|
|
|
### Track B: Features
|
|
- **Cron completion alerts:** `GET /api/crons/recent?since=TIMESTAMP` endpoint.
|
|
UI polls every 30s (only when tab is focused). Toast notification on each
|
|
completion. Red badge count on Tasks nav tab, cleared when tab is opened.
|
|
- **Background agent error alerts:** When a streaming session errors out and
|
|
the user is on a different session, show a persistent red banner above the
|
|
message area: "Session X encountered an error." Click "View" to navigate,
|
|
"Dismiss" to clear.
|
|
- **Session duplicate:** Copy icon on each session in the sidebar (visible on
|
|
hover). Creates a new session with same workspace/model, titled "(copy)".
|
|
- **Browser tab title:** `document.title` updates to show the active session
|
|
title (e.g. "My Task — Hermes"). Resets to "Hermes" when no session active.
|
|
|
|
**Tests:** ~10 new. Total: ~221.
|
|
**Hermes CLI parity impact:** Medium (cron visibility, error surfacing)
|
|
**Claude parity impact:** Low
|
|
|
|
---
|
|
|
|
## Sprint 14 -- Visual Polish + Workspace Ops + Session Organization (COMPLETED)
|
|
|
|
**Theme:** Polish the visual experience, close workspace file gaps, and
|
|
organize sessions properly.
|
|
|
|
### Track B: Features
|
|
- **Mermaid diagram rendering:** Code blocks tagged `mermaid` render as
|
|
diagrams inline. Mermaid.js loaded lazily from CDN. Dark theme. Falls
|
|
back to code block on parse error.
|
|
- **Message timestamps:** Subtle HH:MM time next to each role label. Full
|
|
date/time on hover. User messages tagged with `_ts` on send.
|
|
- **Date grouping fix:** Session list uses `created_at` for groups instead
|
|
of `updated_at`. Prevents sessions jumping between groups on auto-title.
|
|
- **File rename:** Double-click any filename in the workspace panel to
|
|
rename inline (same pattern as session rename). `POST /api/file/rename`.
|
|
- **Folder create:** Folder icon button in workspace panel header.
|
|
`POST /api/file/create-dir`. Prompt for folder name.
|
|
- **Session tags:** Add `#tag` to session titles. Tags extracted and shown
|
|
as colored chips in the sidebar. Click a tag to filter the session list.
|
|
- **Session archive:** Archive button on each session (box icon). Archived
|
|
sessions hidden from sidebar by default. "Show N archived" toggle at top
|
|
of list. `POST /api/session/archive` endpoint.
|
|
|
|
**Tests:** ~12 new. Total: ~233.
|
|
**Hermes CLI parity impact:** Medium (file rename, folder create)
|
|
**Claude parity impact:** Medium (Mermaid, tags, archive)
|
|
|
|
---
|
|
|
|
## Sprint 15 -- Session Projects + Code Copy + Tool Card Toggle (COMPLETED)
|
|
|
|
**Theme:** Organize work the way you think, not just chronologically.
|
|
Plus two quick UX wins for code and agentic workflows.
|
|
|
|
**Why now:** After 100+ sessions the sidebar is a flat chronological list.
|
|
Finding sessions from 2 weeks ago, or keeping work separated by project,
|
|
requires the search box. Session projects are the single biggest remaining
|
|
organizational gap vs. Claude's project folders.
|
|
|
|
### Track A: Bugs
|
|
- None.
|
|
|
|
### Track B: Features
|
|
- **Session projects:** Named groups for organizing sessions. A project
|
|
filter bar (subtle chips) sits between the search input and the session
|
|
list. Each project has a name and color. Click a chip to filter sessions
|
|
to that project; "All" shows everything. Create projects inline (+
|
|
button), rename (double-click chip), delete (right-click). Assign
|
|
sessions via folder icon button (hover-reveal) with a dropdown picker.
|
|
Projects stored in `projects.json`. Session model gains `project_id`
|
|
field (null = unassigned). Fully backward-compatible with existing
|
|
sessions. Endpoints: `GET /api/projects`, `POST /api/projects/create`,
|
|
`POST /api/projects/rename`, `POST /api/projects/delete`,
|
|
`POST /api/session/move`.
|
|
- **Code block copy button:** Every code block gets a "Copy" button.
|
|
Positioned in the language header bar (or top-right corner for plain
|
|
code blocks). Click copies code to clipboard, shows "Copied!" for 1.5s.
|
|
- **Tool card expand/collapse:** When a message has 2+ tool cards, an
|
|
"Expand all / Collapse all" toggle appears above the card group.
|
|
Scoped per message group, not global.
|
|
|
|
### Track C: Architecture
|
|
- `projects.json` flat file storage for project list (same pattern as
|
|
`workspaces.json` and `settings.json`).
|
|
- `project_id` field on Session model with backward-compatible null default.
|
|
- `_index.json` includes `project_id` for fast client-side filtering.
|
|
|
|
**Tests:** 13 new. Total: ~237.
|
|
**Hermes CLI parity impact:** Low (CLI has no session organization)
|
|
**Claude parity impact:** Very High (projects are a core Claude concept)
|
|
|
|
### Candidates for later sprints
|
|
- Artifacts + code execution (HTML/SVG preview, inline Python execution)
|
|
- Voice input via Whisper
|
|
- Subagent delegation cards (enhanced tool card rendering)
|
|
|
|
---
|
|
|
|
## Sprint 16 -- Session Sidebar Visual Polish (COMPLETED)
|
|
|
|
**Theme:** Make the session list feel high-quality and delightful.
|
|
|
|
**Why now:** The session sidebar had two visible UX bugs: titles truncated
|
|
unnecessarily because action icons reserved space even when hidden, and
|
|
the project folder icon felt "sticky" and awkward. Emoji icons rendered
|
|
inconsistently across platforms. These were the most common visual complaints.
|
|
|
|
### Track A: Bugs (from BUGS.md)
|
|
- **Session title truncation.** Action icons (pin, move, archive, dup, trash)
|
|
were always in the DOM with `flex-shrink:0`, reserving ~30px even when
|
|
invisible. Fix: wrapped all actions in a `.session-actions` overlay
|
|
container with `position:absolute`. Titles now use full available width.
|
|
Actions appear on hover with a gradient fade from the right edge.
|
|
- **Folder button feels sticky.** Replaced `.has-project` persistent blue
|
|
button with a colored left border matching the project color. The folder
|
|
button now only appears in the hover overlay like all other actions.
|
|
|
|
### Track B: Features
|
|
- **SVG action icons.** Replaced all emoji HTML entities (★, 📂, 📦, ⊕, 🗑)
|
|
with monochrome SVG line icons that inherit `currentColor`. Consistent
|
|
rendering across macOS, Linux, and Windows. Icons: pin (star), folder,
|
|
archive (box), duplicate (overlapping squares), trash (bin with lines).
|
|
- **Pin indicator.** Small gold filled-star icon rendered inline before the
|
|
title only when the session is actually pinned. Unpinned sessions get
|
|
full title width with zero space reservation.
|
|
- **Project border indicator.** Sessions assigned to a project show a
|
|
colored left border matching the project color, replacing the old
|
|
always-visible blue folder button.
|
|
- **Hover overlay polish.** Actions container uses a gradient background
|
|
that fades from transparent to the sidebar color, creating a smooth
|
|
emergence effect. Overlay hides automatically during inline rename.
|
|
|
|
### Deferred to Sprint 17
|
|
- Slash commands (basic set with `commands.js` module)
|
|
- Thinking/reasoning display for extended-thinking models
|
|
- Slash command autocomplete popup
|
|
|
|
**Tests:** 0 new (pure CSS/DOM changes). Total: 237.
|
|
**Hermes CLI parity impact:** Low
|
|
**Claude parity impact:** Medium (sidebar polish matches Claude's quality bar)
|
|
|
|
---
|
|
|
|
## Sprint 17 -- Voice + Multimodal Input
|
|
|
|
**Theme:** Input beyond the keyboard.
|
|
|
|
**Why now:** Voice is a meaningful quality-of-life feature for longer sessions
|
|
and is achievable with Whisper. Image input closes the last modality gap with
|
|
Claude (Claude accepts image paste natively -- we do too, but only as
|
|
file uploads, not clipboard screenshots into the conversation directly).
|
|
|
|
### Track A: Bugs
|
|
- Image paste currently requires a click-to-attach flow. Direct paste into the
|
|
message textarea should embed the image inline (as a preview chip) and queue
|
|
it for upload on Send. (Partially works -- clean up edge cases.)
|
|
- Large image uploads (>5MB) time out the upload step silently.
|
|
|
|
### Track B: Features
|
|
- **Voice input (Whisper):** A microphone icon in the composer. Hold to record,
|
|
release to transcribe via `POST /api/transcribe` (calls local Whisper or
|
|
OpenAI Whisper API). Transcribed text appears in the message input, editable
|
|
before send. Supports the full "voice -> text -> Hermes response" loop.
|
|
- **TTS playback:** A speaker icon on assistant messages. Calls a TTS endpoint
|
|
(ElevenLabs or OpenAI TTS) and plays the audio. Toggle per-message. Optional
|
|
auto-play mode in settings.
|
|
- **Vision input improvements:** Paste a screenshot directly from clipboard into
|
|
the conversation (not just the tray). Shows as an inline preview chip with
|
|
the image thumbnail. On Send, uploads and includes in the message.
|
|
|
|
### Track C: Architecture
|
|
- Audio pipeline: `POST /api/transcribe` streams audio bytes, returns transcript.
|
|
`GET /api/tts?text=...` returns audio/mpeg. Both use lazy import of Whisper
|
|
and TTS libraries to keep cold start fast.
|
|
|
|
**Tests:** ~12 new. Total: ~271.
|
|
**Hermes CLI parity impact:** Medium (voice not in CLI, but adds capability)
|
|
**Claude parity impact:** High (Claude has native voice mode)
|
|
|
|
---
|
|
|
|
## Sprint 18 -- Subagent Visibility + Agentic Transparency
|
|
|
|
**Theme:** Watch Hermes think, not just respond.
|
|
|
|
**Why now:** When Hermes delegates to subagents (delegate_task, spawns parallel
|
|
workstreams), the UI shows nothing. On long multi-agent tasks you have no idea
|
|
what's happening. This is the last major "CLI feels better" gap for power users.
|
|
|
|
### Track A: Bugs
|
|
- Tool cards for delegate_task show no information about what the subagent was
|
|
asked to do or what it returned.
|
|
- The activity bar text truncates at 55 chars -- tool previews for long terminal
|
|
commands show nothing useful.
|
|
|
|
### Track B: Features
|
|
- **Subagent delegation cards:** When `delegate_task` fires, show an expandable
|
|
card with the subagent's goal, status (pending/running/done), and result
|
|
summary. Multiple subagents from one call appear as a card group. Uses the
|
|
existing tool card infrastructure.
|
|
- **Background task monitor:** A "Tasks" indicator in the topbar (separate from
|
|
the cron Tasks panel). Shows count of active agent threads. Click opens a
|
|
popover listing all in-flight streams with session names and elapsed times.
|
|
Cancel any individual thread. This is the full job queue visibility the CLI
|
|
implicitly has via `ps aux`.
|
|
- **Thinking/reasoning display:** When the model emits reasoning tokens (o3,
|
|
Claude extended thinking), show them in a collapsible "Reasoning" card above
|
|
the response. Collapsed by default. This matches Claude's reasoning display.
|
|
|
|
### Track C: Architecture
|
|
- Task registry: extend STREAMS to include session name, start time, and task
|
|
description. New `GET /api/tasks/active` endpoint returns all running streams
|
|
with metadata.
|
|
|
|
**Tests:** ~14 new. Total: ~285.
|
|
**Hermes CLI parity impact:** Very High (subagent and task visibility is the
|
|
last major CLI gap)
|
|
**Claude parity impact:** High (Claude shows reasoning, tool use visibly)
|
|
|
|
---
|
|
|
|
## Sprint 19 -- Auth, HTTPS, and Production Hardening
|
|
|
|
**Theme:** Make this safe to leave running.
|
|
|
|
**Why now:** Everything else is done. This is the sprint you run when you want
|
|
to expose the UI beyond localhost -- to a team, a mobile device, or a public
|
|
address.
|
|
|
|
### Track A: Bugs
|
|
- Server has no request size limit on non-upload endpoints (potential DoS).
|
|
- Session JSON files have no size cap (a runaway agent could write GBs).
|
|
|
|
### Track B: Features
|
|
- **Password authentication:** A login page with a configurable password
|
|
(HERMES_WEBUI_PASSWORD env var). Signed cookie session (24h expiry).
|
|
Single-user model -- no accounts, no registration.
|
|
- **HTTPS / reverse proxy guide:** A one-page `DEPLOY.md` with instructions
|
|
for running behind nginx + Let's Encrypt on a VPS. Configuration snippets
|
|
for systemd service, nginx config, certbot.
|
|
- **Mobile responsive layout:** Collapsible sidebar (hamburger). Touch-friendly
|
|
session list (swipe to delete, tap to navigate). Composer expands on focus.
|
|
Right panel hidden by default on mobile, accessible via a Files tab.
|
|
- **Rate limiting:** Simple per-IP token bucket on the chat/start endpoint
|
|
(configurable, default 10 req/min) to prevent accidental hammering.
|
|
|
|
### Track C: Architecture
|
|
- Helmet headers: X-Content-Type-Options, X-Frame-Options, HSTS (when served
|
|
over HTTPS). Simple middleware in the Handler.
|
|
|
|
**Tests:** ~12 new. Total: ~297.
|
|
**Hermes CLI parity impact:** Low (CLI has no auth/HTTPS concerns)
|
|
**Claude parity impact:** Very High (Claude is authenticated, HTTPS only)
|
|
|
|
---
|
|
|
|
## Feature Parity Summary
|
|
|
|
### After Sprint 18 (Hermes CLI parity: complete)
|
|
|
|
| CLI Feature | Status |
|
|
|-------------|--------|
|
|
| Chat / agent loop | Done (v0.3) |
|
|
| Streaming responses | Done (v0.5) |
|
|
| Tool call visibility | Done (v0.11) |
|
|
| File ops (read/write/search/patch) | Done (v0.6) |
|
|
| Terminal commands | Done via workspace |
|
|
| Cron job management | Done (v0.9) |
|
|
| Skills management | Done (v0.9) |
|
|
| Memory read/write | Done (v0.9) |
|
|
| Session history | Done (v0.3) |
|
|
| Workspace switching | Done (v0.7) |
|
|
| Model selection | Done (v0.3) |
|
|
| Multi-provider model support | Done (Sprint 11) |
|
|
| Toolset control | Sprint 12 |
|
|
| Settings persistence | Done (Sprint 12) |
|
|
| Subagent visibility | Sprint 18 |
|
|
| Background task monitor | Sprint 18 |
|
|
| Code execution (Jupyter) | Sprint 17+ |
|
|
| Cron completion alerts | Done (Sprint 13) |
|
|
| Virtual scroll (perf) | Deferred |
|
|
|
|
### After Sprint 19 (Claude parity: ~90% complete)
|
|
|
|
| Claude Feature | Status |
|
|
|----------------|--------|
|
|
| Dark theme, 3-panel layout | Done (v0.1) |
|
|
| Streaming chat | Done (v0.5) |
|
|
| Model switching | Done (v0.3) |
|
|
| File attachments | Done (v0.6) |
|
|
| Syntax highlighting | Done (v0.10) |
|
|
| Tool use visibility | Done (v0.11) |
|
|
| Edit/regenerate messages | Done (v0.10) |
|
|
| Session management | Done (v0.6) |
|
|
| Artifacts (HTML/SVG preview) | Sprint 17+ |
|
|
| Code execution inline | Sprint 17+ |
|
|
| Mermaid diagrams | Done (Sprint 14) |
|
|
| Projects / folders | Done (Sprint 15) |
|
|
| Pinned/starred sessions | Done (Sprint 12) |
|
|
| Reasoning display | Sprint 18 |
|
|
| Voice input | Sprint 17 |
|
|
| TTS playback | Sprint 17 |
|
|
| Notifications | Done (Sprint 13) |
|
|
| Settings panel | Done (Sprint 12) |
|
|
| Auth / login | Sprint 19 |
|
|
| HTTPS | Sprint 19 |
|
|
| Mobile layout | Done (v0.16.1) |
|
|
| Sharing / public URLs | Not planned (requires server infra) |
|
|
| Claude-specific features | Not replicable (Projects AI, artifacts sync) |
|
|
|
|
### What is intentionally not planned
|
|
|
|
- **Sharing / public conversation URLs:** Requires a hosted backend with access
|
|
control and CDN. Out of scope for a personal VPS deployment.
|
|
- **Claude-specific model features:** Claude-native Projects memory, extended
|
|
artifacts sync, Anthropic's proprietary reasoning UI. These are Anthropic
|
|
infrastructure, not reproducible.
|
|
- **Real-time collaboration:** Multiple users in the same session simultaneously.
|
|
Single-user assumption throughout.
|
|
- **Plugin marketplace:** Hermes skills cover this use case already.
|
|
|
|
---
|
|
|
|
*Last updated: April 2, 2026*
|
|
*Current version: v0.18 | 237 tests*
|
|
*Next sprint: Sprint 17 (Voice + Multimodal Input)*
|