Hermes WebUI v0.1.0 — initial public release

2026-03-30 20:40:19 -07:00
commit a4e2174c29
41 changed files with 11380 additions and 0 deletions
--- a/SPRINTS.md
+++ b/SPRINTS.md
@@ -0,0 +1,407 @@
+# Hermes WebUI -- Forward Sprint Plan
+
+> Current state: v0.1.0 | 190 tests | Daily driver ready
+> This document plans the path from here to two targets:
+>
+> Target A: 1:1 feature parity with the Hermes CLI (everything you can do from the
+>           terminal, you can do from the browser)
+>
+> Target B: 1:1 parity with Claude's reproducible features (the full Claude
+>           browser UI experience, minus things only Anthropic can build)
+>
+> Sprints are ordered by impact. Each builds on the one before.
+> Past sprint history lives in CHANGELOG.md.
+
+---
+
+## Where we are now (v0.1.0)
+
+**CLI parity: ~80% complete.** Core agent loop, all tools visible, workspace
+file ops, cron/skills/memory CRUD, session management, streaming, cancel --
+all solid. Gaps are configuration, subagent visibility, and runtime controls.
+
+**Claude parity: ~55% complete.** Chat, streaming, file browser,
+session management, tool cards, syntax highlighting, model switching -- all
+present. Gaps are project organization, artifacts, voice, sharing, mobile.
+
+---
+
+## Sprint 11 -- Streaming Smoothness + Tool Card Incremental Render
+
+**Theme:** Make heavy agentic work feel fast and fluid.
+
+**Why now:** The biggest remaining daily friction point. During a 20-step task,
+every tool event triggers a full renderMessages() re-render of the entire
+message list. On fast tasks you can see flicker. This is the last thing that
+makes the UI feel noticeably worse than watching the CLI.
+
+### Track A: Bugs
+- Tool card DOM thrash: renderMessages() rebuilds all cards on each tool event.
+  Switch to incremental append (append new card to existing group, no full rebuild).
+- Scroll position lost on re-render during streaming (messages jump).
+
+### Track B: Features
+- **Incremental tool card streaming:** Instead of renderMessages() on each
+  tool event, maintain a live card group element per turn and append/update
+  cards in place. The assistant text row below the cards also updates
+  incrementally (already does via assistantBody.innerHTML).
+- **Tool card collapse-all / expand-all:** A small toggle in the topbar or
+  per-message to collapse all tool cards in a response. Useful when a response
+  has 10+ tool calls.
+- **Smooth scroll:** Pin scroll to bottom during streaming unless user has
+  manually scrolled up (read-back mode). Resume pinning when user scrolls
+  back to bottom.
+
+### Track C: Architecture
+- `api/routes.py`: extract the 49 if/elif route handlers from server.py's
+  Handler class into a dedicated routes module. server.py becomes a true
+  ~50-line shell: imports, Handler stub that delegates to routes, main().
+  Completes the server split started in Sprint 10.
+
+**Tests:** ~12 new. Total: ~196.
+**Hermes CLI parity impact:** Low (smoothness, not features)
+**Claude parity impact:** Low
+
+---
+
+## Sprint 12 -- Settings Panel + Toolset Control
+
+**Theme:** Configuration you can actually reach from the UI.
+
+**Why now:** Last remaining thing that forces a trip to the CLI or config files
+for basic setup. The model dropdown works but defaults aren't persisted
+server-side. Toolsets can't be toggled per session.
+
+### Track A: Bugs
+- Model dropdown doesn't sync when a session was created with a model not in
+  the current dropdown list (edge case from model additions).
+- Workspace validation on add doesn't check symlinks (shows as invalid when
+  it's actually a valid symlink to a directory).
+
+### Track B: Features
+- **Settings panel:** A gear icon in the topbar opens a slide-in settings panel.
+  Sections: Default Model (writes HERMES_WEBUI_DEFAULT_MODEL to a settings file),
+  Default Workspace (writes HERMES_WEBUI_DEFAULT_WORKSPACE), UI preferences
+  (font size, message density). Persisted server-side in `~/.hermes/webui-mvp/settings.json`.
+- **Toolset control per session:** A "Tools" chip in the session topbar opens
+  a popover listing all available toolsets (terminal, web, file, memory, etc.)
+  with toggles. Selected toolsets stored on the session and passed to AIAgent.
+  Matches the `--tools` flag behavior in the CLI.
+- **Rename file / Create folder:** Two small file tree ops that close the last
+  workspace management gap. Inline rename on double-click (same pattern as
+  session rename). Create folder via + menu next to the existing + file button.
+
+### Track C: Architecture
+- Settings schema: `settings.json` with typed fields, validated on load, with
+  sane defaults. Served via `GET /api/settings`, written via `POST /api/settings`.
+
+**Tests:** ~15 new. Total: ~211.
+**Hermes CLI parity impact:** High (toolset control is the last major CLI feature)
+**Claude parity impact:** Medium (settings exist in Claude as a panel)
+
+---
+
+## Sprint 13 -- Notification System + Background Visibility
+
+**Theme:** Know what Hermes is doing even when you're not watching.
+
+**Why now:** Cron jobs run silently. Background errors surface nowhere. You have
+no way to know a long-running task finished (or failed) while you were on another
+tab. This is a meaningful daily driver gap for anyone using cron heavily.
+
+### Track A: Bugs
+- Cron "Run now" button shows no feedback if the job errors immediately.
+- Sessions with very long message histories (100+ messages) cause noticeable
+  render lag on load (no virtual scroll yet).
+
+### Track B: Features
+- **Cron completion alerts:** When a cron job finishes (success or error), push
+  a toast notification to the UI. Use a polling endpoint (`GET /api/crons/status`)
+  that the UI checks every 30s while the window is focused. Badge count on the
+  Tasks tab icon when there are unread completions.
+- **Background agent error alerts:** When a streaming session errors out (network
+  drop, model error, tool failure), and the user is not currently viewing that
+  session, show a persistent banner: "Session X encountered an error." Clicking
+  it navigates to that session.
+- **Virtual scroll for session list:** Session list currently renders all sessions
+  in the DOM. Above ~100 sessions, the sidebar gets slow. Implement simple virtual
+  scroll: render only ~20 visible rows, reuse DOM nodes on scroll.
+
+### Track C: Architecture
+- SSE reconnect: if the SSE connection drops mid-stream, auto-reconnect once
+  (with the same stream_id). Currently a network blip ends the response silently.
+
+**Tests:** ~14 new. Total: ~225.
+**Hermes CLI parity impact:** High (cron visibility, error surfacing)
+**Claude parity impact:** Medium (Claude has notification panel)
+
+---
+
+## Sprint 14 -- Project Organization + Session Management
+
+**Theme:** Organize work the way you think, not just chronologically.
+
+**Why now:** After 100+ sessions the sidebar is a flat chronological list.
+Finding sessions from 2 weeks ago, or keeping a "MyProject" workspace separate
+from personal work, requires the search box. This is the biggest remaining
+daily organizational gap vs. Claude's project folders.
+
+### Track A: Bugs
+- Session search content scan (depth=5) is slow on large session histories.
+  Add server-side caching of search index.
+- Date group headers ("Today / Yesterday / Earlier") use updated_at which can
+  be misleading for sessions touched by automated title-setting. Use created_at
+  for initial grouping, updated_at for sort order.
+
+### Track B: Features
+- **Session folders / projects:** A "Projects" section above the session list.
+  Each project is a named group. Sessions can be dragged into projects or
+  assigned via right-click. Stored in `projects.json`. Projects collapse/expand.
+  This is the single biggest Claude parity feature missing.
+- **Pin sessions:** Star icon on any session to pin it to the top of the list
+  above date groups. Persisted on the session JSON as `pinned: true`.
+- **Session tags:** Inline `#tag` syntax in session titles gets extracted and
+  shown as colored chips. Clicking a tag filters the list. No backend change
+  needed -- parsed client-side from title text.
+- **Archive sessions:** A "More" overflow menu on each session (right-click or
+  long-press) with: Archive (hides from main list, accessible via filter),
+  Duplicate (new session with same workspace/model), Export JSON.
+- **Import session from JSON:** Drag a `.json` export file into the sidebar to
+  restore it as a new session. Mirrors the existing JSON export.
+
+### Track C: Architecture
+- Session index v2: extend `_index.json` to include `tags`, `pinned`, and
+  `project_id` fields. Rebuild on session save. Enables fast client-side
+  filtering without disk reads.
+
+**Tests:** ~16 new. Total: ~241.
+**Hermes CLI parity impact:** Low (CLI has no session organization)
+**Claude parity impact:** Very High (projects are a core Claude concept)
+
+---
+
+## Sprint 15 -- Artifacts + Code Execution
+
+**Theme:** See outputs, not just text.
+
+**Why now:** Claude's most distinctive feature is the artifact panel --
+code runs inline, HTML renders in a sandboxed iframe, SVGs show as images.
+This is the largest single capability gap between what we have and what Claude
+feels like. It also directly enables the Hermes "code execution cell" feature
+(Jupyter-style in-browser execution).
+
+### Track A: Bugs
+- Prism.js autoloader makes one CDN request per language encountered. On a
+  code-heavy session this causes noticeable latency. Bundle the top 10 languages
+  (Python, JS, bash, JSON, SQL, YAML, TypeScript, CSS, HTML, Rust) locally.
+- Code blocks in long responses sometimes re-highlight on every renderMessages()
+  call. Debounce highlightCode() with requestAnimationFrame.
+
+### Track B: Features
+- **Artifact panel:** When Hermes produces a code block tagged as `html`, `svg`,
+  or `react`, a "Preview" button appears on that code block. Clicking it opens
+  a sandboxed `<iframe>` in the right panel showing the rendered output. The
+  preview updates live if Hermes edits the artifact in a follow-up.
+- **Code execution cell:** A "Run" button on Python code blocks. Sends the code
+  to a new server endpoint (`POST /api/execute`) which runs it in a subprocess
+  with a 30-second timeout and streams stdout/stderr back as SSE. Output appears
+  below the code block inline. This is the Jupyter cell experience without
+  needing a kernel.
+- **Mermaid diagram rendering:** Mermaid.js CDN (deferred). Code blocks tagged
+  as `mermaid` render as flow/sequence/gantt diagrams inline.
+
+### Track C: Architecture
+- Sandbox safety: `/api/execute` runs in a restricted subprocess (no network,
+  limited filesystem via a temp directory). Returns exit code, stdout, stderr,
+  and execution time.
+- Artifact state: artifacts are tracked in `S.artifacts = {}` (code block hash
+  -> rendered content). Persisted in session JSON as `artifacts` array.
+
+**Tests:** ~18 new. Total: ~259.
+**Hermes CLI parity impact:** High (code execution closes the Jupyter gap)
+**Claude parity impact:** Very High (artifacts are Claude's signature feature)
+
+---
+
+## Sprint 16 -- Voice + Multimodal Input
+
+**Theme:** Input beyond the keyboard.
+
+**Why now:** Voice is a meaningful quality-of-life feature for longer sessions
+and is achievable with Whisper. Image input closes the last modality gap with
+Claude (Claude accepts image paste natively -- we do too, but only as
+file uploads, not clipboard screenshots into the conversation directly).
+
+### Track A: Bugs
+- Image paste currently requires a click-to-attach flow. Direct paste into the
+  message textarea should embed the image inline (as a preview chip) and queue
+  it for upload on Send. (Partially works -- clean up edge cases.)
+- Large image uploads (>5MB) time out the upload step silently.
+
+### Track B: Features
+- **Voice input (Whisper):** A microphone icon in the composer. Hold to record,
+  release to transcribe via `POST /api/transcribe` (calls local Whisper or
+  OpenAI Whisper API). Transcribed text appears in the message input, editable
+  before send. Supports the full "voice -> text -> Hermes response" loop.
+- **TTS playback:** A speaker icon on assistant messages. Calls a TTS endpoint
+  (ElevenLabs or OpenAI TTS) and plays the audio. Toggle per-message. Optional
+  auto-play mode in settings.
+- **Vision input improvements:** Paste a screenshot directly from clipboard into
+  the conversation (not just the tray). Shows as an inline preview chip with
+  the image thumbnail. On Send, uploads and includes in the message.
+
+### Track C: Architecture
+- Audio pipeline: `POST /api/transcribe` streams audio bytes, returns transcript.
+  `GET /api/tts?text=...` returns audio/mpeg. Both use lazy import of Whisper
+  and TTS libraries to keep cold start fast.
+
+**Tests:** ~12 new. Total: ~271.
+**Hermes CLI parity impact:** Medium (voice not in CLI, but adds capability)
+**Claude parity impact:** High (Claude has native voice mode)
+
+---
+
+## Sprint 17 -- Subagent Visibility + Agentic Transparency
+
+**Theme:** Watch Hermes think, not just respond.
+
+**Why now:** When Hermes delegates to subagents (delegate_task, spawns parallel
+workstreams), the UI shows nothing. On long multi-agent tasks you have no idea
+what's happening. This is the last major "CLI feels better" gap for power users.
+
+### Track A: Bugs
+- Tool cards for delegate_task show no information about what the subagent was
+  asked to do or what it returned.
+- The activity bar text truncates at 55 chars -- tool previews for long terminal
+  commands show nothing useful.
+
+### Track B: Features
+- **Subagent delegation cards:** When `delegate_task` fires, show an expandable
+  card with the subagent's goal, status (pending/running/done), and result
+  summary. Multiple subagents from one call appear as a card group. Uses the
+  existing tool card infrastructure.
+- **Background task monitor:** A "Tasks" indicator in the topbar (separate from
+  the cron Tasks panel). Shows count of active agent threads. Click opens a
+  popover listing all in-flight streams with session names and elapsed times.
+  Cancel any individual thread. This is the full job queue visibility the CLI
+  implicitly has via `ps aux`.
+- **Thinking/reasoning display:** When the model emits reasoning tokens (o3,
+  Claude extended thinking), show them in a collapsible "Reasoning" card above
+  the response. Collapsed by default. This matches Claude's reasoning display.
+
+### Track C: Architecture
+- Task registry: extend STREAMS to include session name, start time, and task
+  description. New `GET /api/tasks/active` endpoint returns all running streams
+  with metadata.
+
+**Tests:** ~14 new. Total: ~285.
+**Hermes CLI parity impact:** Very High (subagent and task visibility is the
+  last major CLI gap)
+**Claude parity impact:** High (Claude shows reasoning, tool use visibly)
+
+---
+
+## Sprint 18 -- Auth, HTTPS, and Production Hardening
+
+**Theme:** Make this safe to leave running.
+
+**Why now:** Everything else is done. This is the sprint you run when you want
+to expose the UI beyond localhost -- to a team, a mobile device, or a public
+address.
+
+### Track A: Bugs
+- Server has no request size limit on non-upload endpoints (potential DoS).
+- Session JSON files have no size cap (a runaway agent could write GBs).
+
+### Track B: Features
+- **Password authentication:** A login page with a configurable password
+  (HERMES_WEBUI_PASSWORD env var). Signed cookie session (24h expiry).
+  Single-user model -- no accounts, no registration.
+- **HTTPS / reverse proxy guide:** A one-page `DEPLOY.md` with instructions
+  for running behind nginx + Let's Encrypt on a VPS. Configuration snippets
+  for systemd service, nginx config, certbot.
+- **Mobile responsive layout:** Collapsible sidebar (hamburger). Touch-friendly
+  session list (swipe to delete, tap to navigate). Composer expands on focus.
+  Right panel hidden by default on mobile, accessible via a Files tab.
+- **Rate limiting:** Simple per-IP token bucket on the chat/start endpoint
+  (configurable, default 10 req/min) to prevent accidental hammering.
+
+### Track C: Architecture
+- Helmet headers: X-Content-Type-Options, X-Frame-Options, HSTS (when served
+  over HTTPS). Simple middleware in the Handler.
+
+**Tests:** ~12 new. Total: ~297.
+**Hermes CLI parity impact:** Low (CLI has no auth/HTTPS concerns)
+**Claude parity impact:** Very High (Claude is authenticated, HTTPS only)
+
+---
+
+## Feature Parity Summary
+
+### After Sprint 17 (Hermes CLI parity: complete)
+
+| CLI Feature | Status |
+|-------------|--------|
+| Chat / agent loop | Done (v0.3) |
+| Streaming responses | Done (v0.5) |
+| Tool call visibility | Done (v0.0.7) |
+| File ops (read/write/search/patch) | Done (v0.6) |
+| Terminal commands | Done via workspace |
+| Cron job management | Done (v0.9) |
+| Skills management | Done (v0.9) |
+| Memory read/write | Done (v0.9) |
+| Session history | Done (v0.3) |
+| Workspace switching | Done (v0.7) |
+| Model selection | Done (v0.3) |
+| Toolset control | Sprint 12 |
+| Settings persistence | Sprint 12 |
+| Subagent visibility | Sprint 17 |
+| Background task monitor | Sprint 17 |
+| Code execution (Jupyter) | Sprint 15 |
+| Cron completion alerts | Sprint 13 |
+| Virtual scroll (perf) | Sprint 13 |
+
+### After Sprint 18 (Claude parity: ~90% complete)
+
+| Claude Feature | Status |
+|----------------|--------|
+| Dark theme, 3-panel layout | Done (v0.1) |
+| Streaming chat | Done (v0.5) |
+| Model switching | Done (v0.3) |
+| File attachments | Done (v0.6) |
+| Syntax highlighting | Done (v0.0.6) |
+| Tool use visibility | Done (v0.0.7) |
+| Edit/regenerate messages | Done (v0.0.6) |
+| Session management | Done (v0.6) |
+| Artifacts (HTML/SVG preview) | Sprint 15 |
+| Code execution inline | Sprint 15 |
+| Mermaid diagrams | Sprint 15 |
+| Projects / folders | Sprint 14 |
+| Pinned/starred sessions | Sprint 14 |
+| Reasoning display | Sprint 17 |
+| Voice input | Sprint 16 |
+| TTS playback | Sprint 16 |
+| Notifications | Sprint 13 |
+| Settings panel | Sprint 12 |
+| Auth / login | Sprint 18 |
+| HTTPS | Sprint 18 |
+| Mobile layout | Sprint 18 |
+| Sharing / public URLs | Not planned (requires server infra) |
+| Claude-specific features | Not replicable (Projects AI, artifacts sync) |
+
+### What is intentionally not planned
+
+- **Sharing / public conversation URLs:** Requires a hosted backend with access
+  control and CDN. Out of scope for a personal VPS deployment.
+- **Claude-specific model features:** Claude-native Projects memory, extended
+  artifacts sync, Anthropic's proprietary reasoning UI. These are Anthropic
+  infrastructure, not reproducible.
+- **Real-time collaboration:** Multiple users in the same session simultaneously.
+  Single-user assumption throughout.
+- **Plugin marketplace:** Hermes skills cover this use case already.
+
+---
+
+*Last updated: March 31, 2026*
+*Current version: v0.1.0 | 190 tests*
+*Next sprint: Sprint 11 (streaming smoothness + api/routes.py split)*