408 lines
18 KiB
Markdown
408 lines
18 KiB
Markdown
# Hermes WebUI -- Forward Sprint Plan
|
|
|
|
> Current state: v0.1.0 | 190 tests | Daily driver ready
|
|
> This document plans the path from here to two targets:
|
|
>
|
|
> Target A: 1:1 feature parity with the Hermes CLI (everything you can do from the
|
|
> terminal, you can do from the browser)
|
|
>
|
|
> Target B: 1:1 parity with Claude's reproducible features (the full Claude
|
|
> browser UI experience, minus things only Anthropic can build)
|
|
>
|
|
> Sprints are ordered by impact. Each builds on the one before.
|
|
> Past sprint history lives in CHANGELOG.md.
|
|
|
|
---
|
|
|
|
## Where we are now (v0.1.0)
|
|
|
|
**CLI parity: ~80% complete.** Core agent loop, all tools visible, workspace
|
|
file ops, cron/skills/memory CRUD, session management, streaming, cancel --
|
|
all solid. Gaps are configuration, subagent visibility, and runtime controls.
|
|
|
|
**Claude parity: ~55% complete.** Chat, streaming, file browser,
|
|
session management, tool cards, syntax highlighting, model switching -- all
|
|
present. Gaps are project organization, artifacts, voice, sharing, mobile.
|
|
|
|
---
|
|
|
|
## Sprint 11 -- Streaming Smoothness + Tool Card Incremental Render
|
|
|
|
**Theme:** Make heavy agentic work feel fast and fluid.
|
|
|
|
**Why now:** The biggest remaining daily friction point. During a 20-step task,
|
|
every tool event triggers a full renderMessages() re-render of the entire
|
|
message list. On fast tasks you can see flicker. This is the last thing that
|
|
makes the UI feel noticeably worse than watching the CLI.
|
|
|
|
### Track A: Bugs
|
|
- Tool card DOM thrash: renderMessages() rebuilds all cards on each tool event.
|
|
Switch to incremental append (append new card to existing group, no full rebuild).
|
|
- Scroll position lost on re-render during streaming (messages jump).
|
|
|
|
### Track B: Features
|
|
- **Incremental tool card streaming:** Instead of renderMessages() on each
|
|
tool event, maintain a live card group element per turn and append/update
|
|
cards in place. The assistant text row below the cards also updates
|
|
incrementally (already does via assistantBody.innerHTML).
|
|
- **Tool card collapse-all / expand-all:** A small toggle in the topbar or
|
|
per-message to collapse all tool cards in a response. Useful when a response
|
|
has 10+ tool calls.
|
|
- **Smooth scroll:** Pin scroll to bottom during streaming unless user has
|
|
manually scrolled up (read-back mode). Resume pinning when user scrolls
|
|
back to bottom.
|
|
|
|
### Track C: Architecture
|
|
- `api/routes.py`: extract the 49 if/elif route handlers from server.py's
|
|
Handler class into a dedicated routes module. server.py becomes a true
|
|
~50-line shell: imports, Handler stub that delegates to routes, main().
|
|
Completes the server split started in Sprint 10.
|
|
|
|
**Tests:** ~12 new. Total: ~196.
|
|
**Hermes CLI parity impact:** Low (smoothness, not features)
|
|
**Claude parity impact:** Low
|
|
|
|
---
|
|
|
|
## Sprint 12 -- Settings Panel + Toolset Control
|
|
|
|
**Theme:** Configuration you can actually reach from the UI.
|
|
|
|
**Why now:** Last remaining thing that forces a trip to the CLI or config files
|
|
for basic setup. The model dropdown works but defaults aren't persisted
|
|
server-side. Toolsets can't be toggled per session.
|
|
|
|
### Track A: Bugs
|
|
- Model dropdown doesn't sync when a session was created with a model not in
|
|
the current dropdown list (edge case from model additions).
|
|
- Workspace validation on add doesn't check symlinks (shows as invalid when
|
|
it's actually a valid symlink to a directory).
|
|
|
|
### Track B: Features
|
|
- **Settings panel:** A gear icon in the topbar opens a slide-in settings panel.
|
|
Sections: Default Model (writes HERMES_WEBUI_DEFAULT_MODEL to a settings file),
|
|
Default Workspace (writes HERMES_WEBUI_DEFAULT_WORKSPACE), UI preferences
|
|
(font size, message density). Persisted server-side in `~/.hermes/webui-mvp/settings.json`.
|
|
- **Toolset control per session:** A "Tools" chip in the session topbar opens
|
|
a popover listing all available toolsets (terminal, web, file, memory, etc.)
|
|
with toggles. Selected toolsets stored on the session and passed to AIAgent.
|
|
Matches the `--tools` flag behavior in the CLI.
|
|
- **Rename file / Create folder:** Two small file tree ops that close the last
|
|
workspace management gap. Inline rename on double-click (same pattern as
|
|
session rename). Create folder via + menu next to the existing + file button.
|
|
|
|
### Track C: Architecture
|
|
- Settings schema: `settings.json` with typed fields, validated on load, with
|
|
sane defaults. Served via `GET /api/settings`, written via `POST /api/settings`.
|
|
|
|
**Tests:** ~15 new. Total: ~211.
|
|
**Hermes CLI parity impact:** High (toolset control is the last major CLI feature)
|
|
**Claude parity impact:** Medium (settings exist in Claude as a panel)
|
|
|
|
---
|
|
|
|
## Sprint 13 -- Notification System + Background Visibility
|
|
|
|
**Theme:** Know what Hermes is doing even when you're not watching.
|
|
|
|
**Why now:** Cron jobs run silently. Background errors surface nowhere. You have
|
|
no way to know a long-running task finished (or failed) while you were on another
|
|
tab. This is a meaningful daily driver gap for anyone using cron heavily.
|
|
|
|
### Track A: Bugs
|
|
- Cron "Run now" button shows no feedback if the job errors immediately.
|
|
- Sessions with very long message histories (100+ messages) cause noticeable
|
|
render lag on load (no virtual scroll yet).
|
|
|
|
### Track B: Features
|
|
- **Cron completion alerts:** When a cron job finishes (success or error), push
|
|
a toast notification to the UI. Use a polling endpoint (`GET /api/crons/status`)
|
|
that the UI checks every 30s while the window is focused. Badge count on the
|
|
Tasks tab icon when there are unread completions.
|
|
- **Background agent error alerts:** When a streaming session errors out (network
|
|
drop, model error, tool failure), and the user is not currently viewing that
|
|
session, show a persistent banner: "Session X encountered an error." Clicking
|
|
it navigates to that session.
|
|
- **Virtual scroll for session list:** Session list currently renders all sessions
|
|
in the DOM. Above ~100 sessions, the sidebar gets slow. Implement simple virtual
|
|
scroll: render only ~20 visible rows, reuse DOM nodes on scroll.
|
|
|
|
### Track C: Architecture
|
|
- SSE reconnect: if the SSE connection drops mid-stream, auto-reconnect once
|
|
(with the same stream_id). Currently a network blip ends the response silently.
|
|
|
|
**Tests:** ~14 new. Total: ~225.
|
|
**Hermes CLI parity impact:** High (cron visibility, error surfacing)
|
|
**Claude parity impact:** Medium (Claude has notification panel)
|
|
|
|
---
|
|
|
|
## Sprint 14 -- Project Organization + Session Management
|
|
|
|
**Theme:** Organize work the way you think, not just chronologically.
|
|
|
|
**Why now:** After 100+ sessions the sidebar is a flat chronological list.
|
|
Finding sessions from 2 weeks ago, or keeping a "MyProject" workspace separate
|
|
from personal work, requires the search box. This is the biggest remaining
|
|
daily organizational gap vs. Claude's project folders.
|
|
|
|
### Track A: Bugs
|
|
- Session search content scan (depth=5) is slow on large session histories.
|
|
Add server-side caching of search index.
|
|
- Date group headers ("Today / Yesterday / Earlier") use updated_at which can
|
|
be misleading for sessions touched by automated title-setting. Use created_at
|
|
for initial grouping, updated_at for sort order.
|
|
|
|
### Track B: Features
|
|
- **Session folders / projects:** A "Projects" section above the session list.
|
|
Each project is a named group. Sessions can be dragged into projects or
|
|
assigned via right-click. Stored in `projects.json`. Projects collapse/expand.
|
|
This is the single biggest Claude parity feature missing.
|
|
- **Pin sessions:** Star icon on any session to pin it to the top of the list
|
|
above date groups. Persisted on the session JSON as `pinned: true`.
|
|
- **Session tags:** Inline `#tag` syntax in session titles gets extracted and
|
|
shown as colored chips. Clicking a tag filters the list. No backend change
|
|
needed -- parsed client-side from title text.
|
|
- **Archive sessions:** A "More" overflow menu on each session (right-click or
|
|
long-press) with: Archive (hides from main list, accessible via filter),
|
|
Duplicate (new session with same workspace/model), Export JSON.
|
|
- **Import session from JSON:** Drag a `.json` export file into the sidebar to
|
|
restore it as a new session. Mirrors the existing JSON export.
|
|
|
|
### Track C: Architecture
|
|
- Session index v2: extend `_index.json` to include `tags`, `pinned`, and
|
|
`project_id` fields. Rebuild on session save. Enables fast client-side
|
|
filtering without disk reads.
|
|
|
|
**Tests:** ~16 new. Total: ~241.
|
|
**Hermes CLI parity impact:** Low (CLI has no session organization)
|
|
**Claude parity impact:** Very High (projects are a core Claude concept)
|
|
|
|
---
|
|
|
|
## Sprint 15 -- Artifacts + Code Execution
|
|
|
|
**Theme:** See outputs, not just text.
|
|
|
|
**Why now:** Claude's most distinctive feature is the artifact panel --
|
|
code runs inline, HTML renders in a sandboxed iframe, SVGs show as images.
|
|
This is the largest single capability gap between what we have and what Claude
|
|
feels like. It also directly enables the Hermes "code execution cell" feature
|
|
(Jupyter-style in-browser execution).
|
|
|
|
### Track A: Bugs
|
|
- Prism.js autoloader makes one CDN request per language encountered. On a
|
|
code-heavy session this causes noticeable latency. Bundle the top 10 languages
|
|
(Python, JS, bash, JSON, SQL, YAML, TypeScript, CSS, HTML, Rust) locally.
|
|
- Code blocks in long responses sometimes re-highlight on every renderMessages()
|
|
call. Debounce highlightCode() with requestAnimationFrame.
|
|
|
|
### Track B: Features
|
|
- **Artifact panel:** When Hermes produces a code block tagged as `html`, `svg`,
|
|
or `react`, a "Preview" button appears on that code block. Clicking it opens
|
|
a sandboxed `<iframe>` in the right panel showing the rendered output. The
|
|
preview updates live if Hermes edits the artifact in a follow-up.
|
|
- **Code execution cell:** A "Run" button on Python code blocks. Sends the code
|
|
to a new server endpoint (`POST /api/execute`) which runs it in a subprocess
|
|
with a 30-second timeout and streams stdout/stderr back as SSE. Output appears
|
|
below the code block inline. This is the Jupyter cell experience without
|
|
needing a kernel.
|
|
- **Mermaid diagram rendering:** Mermaid.js CDN (deferred). Code blocks tagged
|
|
as `mermaid` render as flow/sequence/gantt diagrams inline.
|
|
|
|
### Track C: Architecture
|
|
- Sandbox safety: `/api/execute` runs in a restricted subprocess (no network,
|
|
limited filesystem via a temp directory). Returns exit code, stdout, stderr,
|
|
and execution time.
|
|
- Artifact state: artifacts are tracked in `S.artifacts = {}` (code block hash
|
|
-> rendered content). Persisted in session JSON as `artifacts` array.
|
|
|
|
**Tests:** ~18 new. Total: ~259.
|
|
**Hermes CLI parity impact:** High (code execution closes the Jupyter gap)
|
|
**Claude parity impact:** Very High (artifacts are Claude's signature feature)
|
|
|
|
---
|
|
|
|
## Sprint 16 -- Voice + Multimodal Input
|
|
|
|
**Theme:** Input beyond the keyboard.
|
|
|
|
**Why now:** Voice is a meaningful quality-of-life feature for longer sessions
|
|
and is achievable with Whisper. Image input closes the last modality gap with
|
|
Claude (Claude accepts image paste natively -- we do too, but only as
|
|
file uploads, not clipboard screenshots into the conversation directly).
|
|
|
|
### Track A: Bugs
|
|
- Image paste currently requires a click-to-attach flow. Direct paste into the
|
|
message textarea should embed the image inline (as a preview chip) and queue
|
|
it for upload on Send. (Partially works -- clean up edge cases.)
|
|
- Large image uploads (>5MB) time out the upload step silently.
|
|
|
|
### Track B: Features
|
|
- **Voice input (Whisper):** A microphone icon in the composer. Hold to record,
|
|
release to transcribe via `POST /api/transcribe` (calls local Whisper or
|
|
OpenAI Whisper API). Transcribed text appears in the message input, editable
|
|
before send. Supports the full "voice -> text -> Hermes response" loop.
|
|
- **TTS playback:** A speaker icon on assistant messages. Calls a TTS endpoint
|
|
(ElevenLabs or OpenAI TTS) and plays the audio. Toggle per-message. Optional
|
|
auto-play mode in settings.
|
|
- **Vision input improvements:** Paste a screenshot directly from clipboard into
|
|
the conversation (not just the tray). Shows as an inline preview chip with
|
|
the image thumbnail. On Send, uploads and includes in the message.
|
|
|
|
### Track C: Architecture
|
|
- Audio pipeline: `POST /api/transcribe` streams audio bytes, returns transcript.
|
|
`GET /api/tts?text=...` returns audio/mpeg. Both use lazy import of Whisper
|
|
and TTS libraries to keep cold start fast.
|
|
|
|
**Tests:** ~12 new. Total: ~271.
|
|
**Hermes CLI parity impact:** Medium (voice not in CLI, but adds capability)
|
|
**Claude parity impact:** High (Claude has native voice mode)
|
|
|
|
---
|
|
|
|
## Sprint 17 -- Subagent Visibility + Agentic Transparency
|
|
|
|
**Theme:** Watch Hermes think, not just respond.
|
|
|
|
**Why now:** When Hermes delegates to subagents (delegate_task, spawns parallel
|
|
workstreams), the UI shows nothing. On long multi-agent tasks you have no idea
|
|
what's happening. This is the last major "CLI feels better" gap for power users.
|
|
|
|
### Track A: Bugs
|
|
- Tool cards for delegate_task show no information about what the subagent was
|
|
asked to do or what it returned.
|
|
- The activity bar text truncates at 55 chars -- tool previews for long terminal
|
|
commands show nothing useful.
|
|
|
|
### Track B: Features
|
|
- **Subagent delegation cards:** When `delegate_task` fires, show an expandable
|
|
card with the subagent's goal, status (pending/running/done), and result
|
|
summary. Multiple subagents from one call appear as a card group. Uses the
|
|
existing tool card infrastructure.
|
|
- **Background task monitor:** A "Tasks" indicator in the topbar (separate from
|
|
the cron Tasks panel). Shows count of active agent threads. Click opens a
|
|
popover listing all in-flight streams with session names and elapsed times.
|
|
Cancel any individual thread. This is the full job queue visibility the CLI
|
|
implicitly has via `ps aux`.
|
|
- **Thinking/reasoning display:** When the model emits reasoning tokens (o3,
|
|
Claude extended thinking), show them in a collapsible "Reasoning" card above
|
|
the response. Collapsed by default. This matches Claude's reasoning display.
|
|
|
|
### Track C: Architecture
|
|
- Task registry: extend STREAMS to include session name, start time, and task
|
|
description. New `GET /api/tasks/active` endpoint returns all running streams
|
|
with metadata.
|
|
|
|
**Tests:** ~14 new. Total: ~285.
|
|
**Hermes CLI parity impact:** Very High (subagent and task visibility is the
|
|
last major CLI gap)
|
|
**Claude parity impact:** High (Claude shows reasoning, tool use visibly)
|
|
|
|
---
|
|
|
|
## Sprint 18 -- Auth, HTTPS, and Production Hardening
|
|
|
|
**Theme:** Make this safe to leave running.
|
|
|
|
**Why now:** Everything else is done. This is the sprint you run when you want
|
|
to expose the UI beyond localhost -- to a team, a mobile device, or a public
|
|
address.
|
|
|
|
### Track A: Bugs
|
|
- Server has no request size limit on non-upload endpoints (potential DoS).
|
|
- Session JSON files have no size cap (a runaway agent could write GBs).
|
|
|
|
### Track B: Features
|
|
- **Password authentication:** A login page with a configurable password
|
|
(HERMES_WEBUI_PASSWORD env var). Signed cookie session (24h expiry).
|
|
Single-user model -- no accounts, no registration.
|
|
- **HTTPS / reverse proxy guide:** A one-page `DEPLOY.md` with instructions
|
|
for running behind nginx + Let's Encrypt on a VPS. Configuration snippets
|
|
for systemd service, nginx config, certbot.
|
|
- **Mobile responsive layout:** Collapsible sidebar (hamburger). Touch-friendly
|
|
session list (swipe to delete, tap to navigate). Composer expands on focus.
|
|
Right panel hidden by default on mobile, accessible via a Files tab.
|
|
- **Rate limiting:** Simple per-IP token bucket on the chat/start endpoint
|
|
(configurable, default 10 req/min) to prevent accidental hammering.
|
|
|
|
### Track C: Architecture
|
|
- Helmet headers: X-Content-Type-Options, X-Frame-Options, HSTS (when served
|
|
over HTTPS). Simple middleware in the Handler.
|
|
|
|
**Tests:** ~12 new. Total: ~297.
|
|
**Hermes CLI parity impact:** Low (CLI has no auth/HTTPS concerns)
|
|
**Claude parity impact:** Very High (Claude is authenticated, HTTPS only)
|
|
|
|
---
|
|
|
|
## Feature Parity Summary
|
|
|
|
### After Sprint 17 (Hermes CLI parity: complete)
|
|
|
|
| CLI Feature | Status |
|
|
|-------------|--------|
|
|
| Chat / agent loop | Done (v0.3) |
|
|
| Streaming responses | Done (v0.5) |
|
|
| Tool call visibility | Done (v0.0.7) |
|
|
| File ops (read/write/search/patch) | Done (v0.6) |
|
|
| Terminal commands | Done via workspace |
|
|
| Cron job management | Done (v0.9) |
|
|
| Skills management | Done (v0.9) |
|
|
| Memory read/write | Done (v0.9) |
|
|
| Session history | Done (v0.3) |
|
|
| Workspace switching | Done (v0.7) |
|
|
| Model selection | Done (v0.3) |
|
|
| Toolset control | Sprint 12 |
|
|
| Settings persistence | Sprint 12 |
|
|
| Subagent visibility | Sprint 17 |
|
|
| Background task monitor | Sprint 17 |
|
|
| Code execution (Jupyter) | Sprint 15 |
|
|
| Cron completion alerts | Sprint 13 |
|
|
| Virtual scroll (perf) | Sprint 13 |
|
|
|
|
### After Sprint 18 (Claude parity: ~90% complete)
|
|
|
|
| Claude Feature | Status |
|
|
|----------------|--------|
|
|
| Dark theme, 3-panel layout | Done (v0.1) |
|
|
| Streaming chat | Done (v0.5) |
|
|
| Model switching | Done (v0.3) |
|
|
| File attachments | Done (v0.6) |
|
|
| Syntax highlighting | Done (v0.0.6) |
|
|
| Tool use visibility | Done (v0.0.7) |
|
|
| Edit/regenerate messages | Done (v0.0.6) |
|
|
| Session management | Done (v0.6) |
|
|
| Artifacts (HTML/SVG preview) | Sprint 15 |
|
|
| Code execution inline | Sprint 15 |
|
|
| Mermaid diagrams | Sprint 15 |
|
|
| Projects / folders | Sprint 14 |
|
|
| Pinned/starred sessions | Sprint 14 |
|
|
| Reasoning display | Sprint 17 |
|
|
| Voice input | Sprint 16 |
|
|
| TTS playback | Sprint 16 |
|
|
| Notifications | Sprint 13 |
|
|
| Settings panel | Sprint 12 |
|
|
| Auth / login | Sprint 18 |
|
|
| HTTPS | Sprint 18 |
|
|
| Mobile layout | Sprint 18 |
|
|
| Sharing / public URLs | Not planned (requires server infra) |
|
|
| Claude-specific features | Not replicable (Projects AI, artifacts sync) |
|
|
|
|
### What is intentionally not planned
|
|
|
|
- **Sharing / public conversation URLs:** Requires a hosted backend with access
|
|
control and CDN. Out of scope for a personal VPS deployment.
|
|
- **Claude-specific model features:** Claude-native Projects memory, extended
|
|
artifacts sync, Anthropic's proprietary reasoning UI. These are Anthropic
|
|
infrastructure, not reproducible.
|
|
- **Real-time collaboration:** Multiple users in the same session simultaneously.
|
|
Single-user assumption throughout.
|
|
- **Plugin marketplace:** Hermes skills cover this use case already.
|
|
|
|
---
|
|
|
|
*Last updated: March 31, 2026*
|
|
*Current version: v0.1.0 | 190 tests*
|
|
*Next sprint: Sprint 11 (streaming smoothness + api/routes.py split)*
|