Hermes WebUI v0.1.0 — initial public release
This commit is contained in:
407
SPRINTS.md
Normal file
407
SPRINTS.md
Normal file
@@ -0,0 +1,407 @@
|
||||
# Hermes WebUI -- Forward Sprint Plan
|
||||
|
||||
> Current state: v0.1.0 | 190 tests | Daily driver ready
|
||||
> This document plans the path from here to two targets:
|
||||
>
|
||||
> Target A: 1:1 feature parity with the Hermes CLI (everything you can do from the
|
||||
> terminal, you can do from the browser)
|
||||
>
|
||||
> Target B: 1:1 parity with Claude's reproducible features (the full Claude
|
||||
> browser UI experience, minus things only Anthropic can build)
|
||||
>
|
||||
> Sprints are ordered by impact. Each builds on the one before.
|
||||
> Past sprint history lives in CHANGELOG.md.
|
||||
|
||||
---
|
||||
|
||||
## Where we are now (v0.1.0)
|
||||
|
||||
**CLI parity: ~80% complete.** Core agent loop, all tools visible, workspace
|
||||
file ops, cron/skills/memory CRUD, session management, streaming, cancel --
|
||||
all solid. Gaps are configuration, subagent visibility, and runtime controls.
|
||||
|
||||
**Claude parity: ~55% complete.** Chat, streaming, file browser,
|
||||
session management, tool cards, syntax highlighting, model switching -- all
|
||||
present. Gaps are project organization, artifacts, voice, sharing, mobile.
|
||||
|
||||
---
|
||||
|
||||
## Sprint 11 -- Streaming Smoothness + Tool Card Incremental Render
|
||||
|
||||
**Theme:** Make heavy agentic work feel fast and fluid.
|
||||
|
||||
**Why now:** The biggest remaining daily friction point. During a 20-step task,
|
||||
every tool event triggers a full renderMessages() re-render of the entire
|
||||
message list. On fast tasks you can see flicker. This is the last thing that
|
||||
makes the UI feel noticeably worse than watching the CLI.
|
||||
|
||||
### Track A: Bugs
|
||||
- Tool card DOM thrash: renderMessages() rebuilds all cards on each tool event.
|
||||
Switch to incremental append (append new card to existing group, no full rebuild).
|
||||
- Scroll position lost on re-render during streaming (messages jump).
|
||||
|
||||
### Track B: Features
|
||||
- **Incremental tool card streaming:** Instead of renderMessages() on each
|
||||
tool event, maintain a live card group element per turn and append/update
|
||||
cards in place. The assistant text row below the cards also updates
|
||||
incrementally (already does via assistantBody.innerHTML).
|
||||
- **Tool card collapse-all / expand-all:** A small toggle in the topbar or
|
||||
per-message to collapse all tool cards in a response. Useful when a response
|
||||
has 10+ tool calls.
|
||||
- **Smooth scroll:** Pin scroll to bottom during streaming unless user has
|
||||
manually scrolled up (read-back mode). Resume pinning when user scrolls
|
||||
back to bottom.
|
||||
|
||||
### Track C: Architecture
|
||||
- `api/routes.py`: extract the 49 if/elif route handlers from server.py's
|
||||
Handler class into a dedicated routes module. server.py becomes a true
|
||||
~50-line shell: imports, Handler stub that delegates to routes, main().
|
||||
Completes the server split started in Sprint 10.
|
||||
|
||||
**Tests:** ~12 new. Total: ~196.
|
||||
**Hermes CLI parity impact:** Low (smoothness, not features)
|
||||
**Claude parity impact:** Low
|
||||
|
||||
---
|
||||
|
||||
## Sprint 12 -- Settings Panel + Toolset Control
|
||||
|
||||
**Theme:** Configuration you can actually reach from the UI.
|
||||
|
||||
**Why now:** Last remaining thing that forces a trip to the CLI or config files
|
||||
for basic setup. The model dropdown works but defaults aren't persisted
|
||||
server-side. Toolsets can't be toggled per session.
|
||||
|
||||
### Track A: Bugs
|
||||
- Model dropdown doesn't sync when a session was created with a model not in
|
||||
the current dropdown list (edge case from model additions).
|
||||
- Workspace validation on add doesn't check symlinks (shows as invalid when
|
||||
it's actually a valid symlink to a directory).
|
||||
|
||||
### Track B: Features
|
||||
- **Settings panel:** A gear icon in the topbar opens a slide-in settings panel.
|
||||
Sections: Default Model (writes HERMES_WEBUI_DEFAULT_MODEL to a settings file),
|
||||
Default Workspace (writes HERMES_WEBUI_DEFAULT_WORKSPACE), UI preferences
|
||||
(font size, message density). Persisted server-side in `~/.hermes/webui-mvp/settings.json`.
|
||||
- **Toolset control per session:** A "Tools" chip in the session topbar opens
|
||||
a popover listing all available toolsets (terminal, web, file, memory, etc.)
|
||||
with toggles. Selected toolsets stored on the session and passed to AIAgent.
|
||||
Matches the `--tools` flag behavior in the CLI.
|
||||
- **Rename file / Create folder:** Two small file tree ops that close the last
|
||||
workspace management gap. Inline rename on double-click (same pattern as
|
||||
session rename). Create folder via + menu next to the existing + file button.
|
||||
|
||||
### Track C: Architecture
|
||||
- Settings schema: `settings.json` with typed fields, validated on load, with
|
||||
sane defaults. Served via `GET /api/settings`, written via `POST /api/settings`.
|
||||
|
||||
**Tests:** ~15 new. Total: ~211.
|
||||
**Hermes CLI parity impact:** High (toolset control is the last major CLI feature)
|
||||
**Claude parity impact:** Medium (settings exist in Claude as a panel)
|
||||
|
||||
---
|
||||
|
||||
## Sprint 13 -- Notification System + Background Visibility
|
||||
|
||||
**Theme:** Know what Hermes is doing even when you're not watching.
|
||||
|
||||
**Why now:** Cron jobs run silently. Background errors surface nowhere. You have
|
||||
no way to know a long-running task finished (or failed) while you were on another
|
||||
tab. This is a meaningful daily driver gap for anyone using cron heavily.
|
||||
|
||||
### Track A: Bugs
|
||||
- Cron "Run now" button shows no feedback if the job errors immediately.
|
||||
- Sessions with very long message histories (100+ messages) cause noticeable
|
||||
render lag on load (no virtual scroll yet).
|
||||
|
||||
### Track B: Features
|
||||
- **Cron completion alerts:** When a cron job finishes (success or error), push
|
||||
a toast notification to the UI. Use a polling endpoint (`GET /api/crons/status`)
|
||||
that the UI checks every 30s while the window is focused. Badge count on the
|
||||
Tasks tab icon when there are unread completions.
|
||||
- **Background agent error alerts:** When a streaming session errors out (network
|
||||
drop, model error, tool failure), and the user is not currently viewing that
|
||||
session, show a persistent banner: "Session X encountered an error." Clicking
|
||||
it navigates to that session.
|
||||
- **Virtual scroll for session list:** Session list currently renders all sessions
|
||||
in the DOM. Above ~100 sessions, the sidebar gets slow. Implement simple virtual
|
||||
scroll: render only ~20 visible rows, reuse DOM nodes on scroll.
|
||||
|
||||
### Track C: Architecture
|
||||
- SSE reconnect: if the SSE connection drops mid-stream, auto-reconnect once
|
||||
(with the same stream_id). Currently a network blip ends the response silently.
|
||||
|
||||
**Tests:** ~14 new. Total: ~225.
|
||||
**Hermes CLI parity impact:** High (cron visibility, error surfacing)
|
||||
**Claude parity impact:** Medium (Claude has notification panel)
|
||||
|
||||
---
|
||||
|
||||
## Sprint 14 -- Project Organization + Session Management
|
||||
|
||||
**Theme:** Organize work the way you think, not just chronologically.
|
||||
|
||||
**Why now:** After 100+ sessions the sidebar is a flat chronological list.
|
||||
Finding sessions from 2 weeks ago, or keeping a "MyProject" workspace separate
|
||||
from personal work, requires the search box. This is the biggest remaining
|
||||
daily organizational gap vs. Claude's project folders.
|
||||
|
||||
### Track A: Bugs
|
||||
- Session search content scan (depth=5) is slow on large session histories.
|
||||
Add server-side caching of search index.
|
||||
- Date group headers ("Today / Yesterday / Earlier") use updated_at which can
|
||||
be misleading for sessions touched by automated title-setting. Use created_at
|
||||
for initial grouping, updated_at for sort order.
|
||||
|
||||
### Track B: Features
|
||||
- **Session folders / projects:** A "Projects" section above the session list.
|
||||
Each project is a named group. Sessions can be dragged into projects or
|
||||
assigned via right-click. Stored in `projects.json`. Projects collapse/expand.
|
||||
This is the single biggest Claude parity feature missing.
|
||||
- **Pin sessions:** Star icon on any session to pin it to the top of the list
|
||||
above date groups. Persisted on the session JSON as `pinned: true`.
|
||||
- **Session tags:** Inline `#tag` syntax in session titles gets extracted and
|
||||
shown as colored chips. Clicking a tag filters the list. No backend change
|
||||
needed -- parsed client-side from title text.
|
||||
- **Archive sessions:** A "More" overflow menu on each session (right-click or
|
||||
long-press) with: Archive (hides from main list, accessible via filter),
|
||||
Duplicate (new session with same workspace/model), Export JSON.
|
||||
- **Import session from JSON:** Drag a `.json` export file into the sidebar to
|
||||
restore it as a new session. Mirrors the existing JSON export.
|
||||
|
||||
### Track C: Architecture
|
||||
- Session index v2: extend `_index.json` to include `tags`, `pinned`, and
|
||||
`project_id` fields. Rebuild on session save. Enables fast client-side
|
||||
filtering without disk reads.
|
||||
|
||||
**Tests:** ~16 new. Total: ~241.
|
||||
**Hermes CLI parity impact:** Low (CLI has no session organization)
|
||||
**Claude parity impact:** Very High (projects are a core Claude concept)
|
||||
|
||||
---
|
||||
|
||||
## Sprint 15 -- Artifacts + Code Execution
|
||||
|
||||
**Theme:** See outputs, not just text.
|
||||
|
||||
**Why now:** Claude's most distinctive feature is the artifact panel --
|
||||
code runs inline, HTML renders in a sandboxed iframe, SVGs show as images.
|
||||
This is the largest single capability gap between what we have and what Claude
|
||||
feels like. It also directly enables the Hermes "code execution cell" feature
|
||||
(Jupyter-style in-browser execution).
|
||||
|
||||
### Track A: Bugs
|
||||
- Prism.js autoloader makes one CDN request per language encountered. On a
|
||||
code-heavy session this causes noticeable latency. Bundle the top 10 languages
|
||||
(Python, JS, bash, JSON, SQL, YAML, TypeScript, CSS, HTML, Rust) locally.
|
||||
- Code blocks in long responses sometimes re-highlight on every renderMessages()
|
||||
call. Debounce highlightCode() with requestAnimationFrame.
|
||||
|
||||
### Track B: Features
|
||||
- **Artifact panel:** When Hermes produces a code block tagged as `html`, `svg`,
|
||||
or `react`, a "Preview" button appears on that code block. Clicking it opens
|
||||
a sandboxed `<iframe>` in the right panel showing the rendered output. The
|
||||
preview updates live if Hermes edits the artifact in a follow-up.
|
||||
- **Code execution cell:** A "Run" button on Python code blocks. Sends the code
|
||||
to a new server endpoint (`POST /api/execute`) which runs it in a subprocess
|
||||
with a 30-second timeout and streams stdout/stderr back as SSE. Output appears
|
||||
below the code block inline. This is the Jupyter cell experience without
|
||||
needing a kernel.
|
||||
- **Mermaid diagram rendering:** Mermaid.js CDN (deferred). Code blocks tagged
|
||||
as `mermaid` render as flow/sequence/gantt diagrams inline.
|
||||
|
||||
### Track C: Architecture
|
||||
- Sandbox safety: `/api/execute` runs in a restricted subprocess (no network,
|
||||
limited filesystem via a temp directory). Returns exit code, stdout, stderr,
|
||||
and execution time.
|
||||
- Artifact state: artifacts are tracked in `S.artifacts = {}` (code block hash
|
||||
-> rendered content). Persisted in session JSON as `artifacts` array.
|
||||
|
||||
**Tests:** ~18 new. Total: ~259.
|
||||
**Hermes CLI parity impact:** High (code execution closes the Jupyter gap)
|
||||
**Claude parity impact:** Very High (artifacts are Claude's signature feature)
|
||||
|
||||
---
|
||||
|
||||
## Sprint 16 -- Voice + Multimodal Input
|
||||
|
||||
**Theme:** Input beyond the keyboard.
|
||||
|
||||
**Why now:** Voice is a meaningful quality-of-life feature for longer sessions
|
||||
and is achievable with Whisper. Image input closes the last modality gap with
|
||||
Claude (Claude accepts image paste natively -- we do too, but only as
|
||||
file uploads, not clipboard screenshots into the conversation directly).
|
||||
|
||||
### Track A: Bugs
|
||||
- Image paste currently requires a click-to-attach flow. Direct paste into the
|
||||
message textarea should embed the image inline (as a preview chip) and queue
|
||||
it for upload on Send. (Partially works -- clean up edge cases.)
|
||||
- Large image uploads (>5MB) time out the upload step silently.
|
||||
|
||||
### Track B: Features
|
||||
- **Voice input (Whisper):** A microphone icon in the composer. Hold to record,
|
||||
release to transcribe via `POST /api/transcribe` (calls local Whisper or
|
||||
OpenAI Whisper API). Transcribed text appears in the message input, editable
|
||||
before send. Supports the full "voice -> text -> Hermes response" loop.
|
||||
- **TTS playback:** A speaker icon on assistant messages. Calls a TTS endpoint
|
||||
(ElevenLabs or OpenAI TTS) and plays the audio. Toggle per-message. Optional
|
||||
auto-play mode in settings.
|
||||
- **Vision input improvements:** Paste a screenshot directly from clipboard into
|
||||
the conversation (not just the tray). Shows as an inline preview chip with
|
||||
the image thumbnail. On Send, uploads and includes in the message.
|
||||
|
||||
### Track C: Architecture
|
||||
- Audio pipeline: `POST /api/transcribe` streams audio bytes, returns transcript.
|
||||
`GET /api/tts?text=...` returns audio/mpeg. Both use lazy import of Whisper
|
||||
and TTS libraries to keep cold start fast.
|
||||
|
||||
**Tests:** ~12 new. Total: ~271.
|
||||
**Hermes CLI parity impact:** Medium (voice not in CLI, but adds capability)
|
||||
**Claude parity impact:** High (Claude has native voice mode)
|
||||
|
||||
---
|
||||
|
||||
## Sprint 17 -- Subagent Visibility + Agentic Transparency
|
||||
|
||||
**Theme:** Watch Hermes think, not just respond.
|
||||
|
||||
**Why now:** When Hermes delegates to subagents (delegate_task, spawns parallel
|
||||
workstreams), the UI shows nothing. On long multi-agent tasks you have no idea
|
||||
what's happening. This is the last major "CLI feels better" gap for power users.
|
||||
|
||||
### Track A: Bugs
|
||||
- Tool cards for delegate_task show no information about what the subagent was
|
||||
asked to do or what it returned.
|
||||
- The activity bar text truncates at 55 chars -- tool previews for long terminal
|
||||
commands show nothing useful.
|
||||
|
||||
### Track B: Features
|
||||
- **Subagent delegation cards:** When `delegate_task` fires, show an expandable
|
||||
card with the subagent's goal, status (pending/running/done), and result
|
||||
summary. Multiple subagents from one call appear as a card group. Uses the
|
||||
existing tool card infrastructure.
|
||||
- **Background task monitor:** A "Tasks" indicator in the topbar (separate from
|
||||
the cron Tasks panel). Shows count of active agent threads. Click opens a
|
||||
popover listing all in-flight streams with session names and elapsed times.
|
||||
Cancel any individual thread. This is the full job queue visibility the CLI
|
||||
implicitly has via `ps aux`.
|
||||
- **Thinking/reasoning display:** When the model emits reasoning tokens (o3,
|
||||
Claude extended thinking), show them in a collapsible "Reasoning" card above
|
||||
the response. Collapsed by default. This matches Claude's reasoning display.
|
||||
|
||||
### Track C: Architecture
|
||||
- Task registry: extend STREAMS to include session name, start time, and task
|
||||
description. New `GET /api/tasks/active` endpoint returns all running streams
|
||||
with metadata.
|
||||
|
||||
**Tests:** ~14 new. Total: ~285.
|
||||
**Hermes CLI parity impact:** Very High (subagent and task visibility is the
|
||||
last major CLI gap)
|
||||
**Claude parity impact:** High (Claude shows reasoning, tool use visibly)
|
||||
|
||||
---
|
||||
|
||||
## Sprint 18 -- Auth, HTTPS, and Production Hardening
|
||||
|
||||
**Theme:** Make this safe to leave running.
|
||||
|
||||
**Why now:** Everything else is done. This is the sprint you run when you want
|
||||
to expose the UI beyond localhost -- to a team, a mobile device, or a public
|
||||
address.
|
||||
|
||||
### Track A: Bugs
|
||||
- Server has no request size limit on non-upload endpoints (potential DoS).
|
||||
- Session JSON files have no size cap (a runaway agent could write GBs).
|
||||
|
||||
### Track B: Features
|
||||
- **Password authentication:** A login page with a configurable password
|
||||
(HERMES_WEBUI_PASSWORD env var). Signed cookie session (24h expiry).
|
||||
Single-user model -- no accounts, no registration.
|
||||
- **HTTPS / reverse proxy guide:** A one-page `DEPLOY.md` with instructions
|
||||
for running behind nginx + Let's Encrypt on a VPS. Configuration snippets
|
||||
for systemd service, nginx config, certbot.
|
||||
- **Mobile responsive layout:** Collapsible sidebar (hamburger). Touch-friendly
|
||||
session list (swipe to delete, tap to navigate). Composer expands on focus.
|
||||
Right panel hidden by default on mobile, accessible via a Files tab.
|
||||
- **Rate limiting:** Simple per-IP token bucket on the chat/start endpoint
|
||||
(configurable, default 10 req/min) to prevent accidental hammering.
|
||||
|
||||
### Track C: Architecture
|
||||
- Helmet headers: X-Content-Type-Options, X-Frame-Options, HSTS (when served
|
||||
over HTTPS). Simple middleware in the Handler.
|
||||
|
||||
**Tests:** ~12 new. Total: ~297.
|
||||
**Hermes CLI parity impact:** Low (CLI has no auth/HTTPS concerns)
|
||||
**Claude parity impact:** Very High (Claude is authenticated, HTTPS only)
|
||||
|
||||
---
|
||||
|
||||
## Feature Parity Summary
|
||||
|
||||
### After Sprint 17 (Hermes CLI parity: complete)
|
||||
|
||||
| CLI Feature | Status |
|
||||
|-------------|--------|
|
||||
| Chat / agent loop | Done (v0.3) |
|
||||
| Streaming responses | Done (v0.5) |
|
||||
| Tool call visibility | Done (v0.0.7) |
|
||||
| File ops (read/write/search/patch) | Done (v0.6) |
|
||||
| Terminal commands | Done via workspace |
|
||||
| Cron job management | Done (v0.9) |
|
||||
| Skills management | Done (v0.9) |
|
||||
| Memory read/write | Done (v0.9) |
|
||||
| Session history | Done (v0.3) |
|
||||
| Workspace switching | Done (v0.7) |
|
||||
| Model selection | Done (v0.3) |
|
||||
| Toolset control | Sprint 12 |
|
||||
| Settings persistence | Sprint 12 |
|
||||
| Subagent visibility | Sprint 17 |
|
||||
| Background task monitor | Sprint 17 |
|
||||
| Code execution (Jupyter) | Sprint 15 |
|
||||
| Cron completion alerts | Sprint 13 |
|
||||
| Virtual scroll (perf) | Sprint 13 |
|
||||
|
||||
### After Sprint 18 (Claude parity: ~90% complete)
|
||||
|
||||
| Claude Feature | Status |
|
||||
|----------------|--------|
|
||||
| Dark theme, 3-panel layout | Done (v0.1) |
|
||||
| Streaming chat | Done (v0.5) |
|
||||
| Model switching | Done (v0.3) |
|
||||
| File attachments | Done (v0.6) |
|
||||
| Syntax highlighting | Done (v0.0.6) |
|
||||
| Tool use visibility | Done (v0.0.7) |
|
||||
| Edit/regenerate messages | Done (v0.0.6) |
|
||||
| Session management | Done (v0.6) |
|
||||
| Artifacts (HTML/SVG preview) | Sprint 15 |
|
||||
| Code execution inline | Sprint 15 |
|
||||
| Mermaid diagrams | Sprint 15 |
|
||||
| Projects / folders | Sprint 14 |
|
||||
| Pinned/starred sessions | Sprint 14 |
|
||||
| Reasoning display | Sprint 17 |
|
||||
| Voice input | Sprint 16 |
|
||||
| TTS playback | Sprint 16 |
|
||||
| Notifications | Sprint 13 |
|
||||
| Settings panel | Sprint 12 |
|
||||
| Auth / login | Sprint 18 |
|
||||
| HTTPS | Sprint 18 |
|
||||
| Mobile layout | Sprint 18 |
|
||||
| Sharing / public URLs | Not planned (requires server infra) |
|
||||
| Claude-specific features | Not replicable (Projects AI, artifacts sync) |
|
||||
|
||||
### What is intentionally not planned
|
||||
|
||||
- **Sharing / public conversation URLs:** Requires a hosted backend with access
|
||||
control and CDN. Out of scope for a personal VPS deployment.
|
||||
- **Claude-specific model features:** Claude-native Projects memory, extended
|
||||
artifacts sync, Anthropic's proprietary reasoning UI. These are Anthropic
|
||||
infrastructure, not reproducible.
|
||||
- **Real-time collaboration:** Multiple users in the same session simultaneously.
|
||||
Single-user assumption throughout.
|
||||
- **Plugin marketplace:** Hermes skills cover this use case already.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: March 31, 2026*
|
||||
*Current version: v0.1.0 | 190 tests*
|
||||
*Next sprint: Sprint 11 (streaming smoothness + api/routes.py split)*
|
||||
Reference in New Issue
Block a user