diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index f3e69b7..754acc5 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -18,7 +18,7 @@ a central chat area, and a right panel for workspace file browsing. The design philosophy is deliberately minimal. There is no build step, no bundler, no frontend framework. The Python server is split into a routing shell (server.py) and -business logic modules (api/). The frontend is six vanilla JS modules loaded from static/. +business logic modules (api/). The frontend is seven vanilla JS modules loaded from static/. This makes the code easy to modify from a terminal or by an agent. --- @@ -26,38 +26,40 @@ This makes the code easy to modify from a terminal or by an agent. ## 2. File Inventory / - server.py Thin routing shell + HTTP Handler. ~76 lines. Pure Python. + server.py Thin routing shell + HTTP Handler + auth middleware. ~79 lines. Delegates all route handling to api/routes.py. start.sh Discovery script: finds agent dir, Python, starts server. api/ __init__.py Package marker - routes.py All GET + POST route handlers (~1016 lines) - config.py Shared configuration, constants, global state, model discovery (~640 lines) - helpers.py HTTP helpers: j(), bad(), require(), safe_resolve() (~57 lines) + auth.py Optional password authentication, signed cookies (~149 lines) + routes.py All GET + POST route handlers (~1109 lines) + config.py Shared configuration, constants, global state, model discovery (~654 lines) + helpers.py HTTP helpers: j(), bad(), require(), safe_resolve(), security headers (~71 lines) models.py Session model + CRUD (~132 lines) workspace.py File ops: list_dir, read_file_content, workspace helpers (~77 lines) upload.py Multipart parser, file upload handler (~77 lines) streaming.py SSE engine, run_agent integration, cancel support (~222 lines) static/ index.html HTML template (served from disk) - style.css All CSS - ui.js DOM helpers, renderMd, tool cards, model dropdown (~846 lines) - workspace.js File tree, preview, file ops (~169 lines) + style.css All CSS (~590 lines) + ui.js DOM helpers, renderMd, tool cards, model dropdown, file tree (~957 lines) + workspace.js File preview, file ops, loadDir, clearPreview (~185 lines) sessions.js Session CRUD, list rendering, search, SVG icons, overlay actions (~532 lines) - messages.js send(), SSE event handlers, approval, transcript (~293 lines) - panels.js Cron, skills, memory, workspace, todo, switchPanel (~771 lines) - boot.js Event wiring + boot IIFE (~175 lines) + messages.js send(), SSE event handlers, approval, transcript (~297 lines) + panels.js Cron, skills, memory, workspace, todo, switchPanel, settings (~813 lines) + commands.js Slash command registry, parser, autocomplete dropdown (~156 lines) + boot.js Event wiring, keydown handlers, boot IIFE (~208 lines) tests/ conftest.py Isolated test server (port 8788, separate HERMES_HOME) (~240 lines) - test_sprint1-16.py Feature tests per sprint (14 files, Sprints 1-11 + 16) - test_regressions.py Permanent regression gate + test_sprint{1-19}.py Feature tests per sprint (17 files, 327 test functions) + test_regressions.py Permanent regression gate (23 tests) AGENTS.md Instruction file for agents working in this directory. ROADMAP.md Feature and product roadmap document. SPRINTS.md Forward sprint plan with CLI + Claude parity targets. ARCHITECTURE.md THIS FILE. TESTING.md Manual browser test plan and automated coverage reference. CHANGELOG.md Release notes per sprint. - PORTABILITY.md Portability design spec for download-and-run installs. + BUGS.md Bug backlog and fixed items tracker. requirements.txt Python dependencies. .env.example Sample environment variable overrides. @@ -67,7 +69,8 @@ State directory (runtime data, separate from source): sessions/ One JSON file per session: {session_id}.json workspaces.json Registered workspaces list last_workspace.txt Last-used workspace path - settings.json (future) User settings + settings.json User settings (default model, workspace, send key, password hash) + projects.json Session project groups (name, color, id) Log file: @@ -91,6 +94,7 @@ Environment variables controlling behavior: HERMES_WEBUI_STATE_DIR Where sessions/ folder lives HERMES_CONFIG_PATH Path to ~/.hermes/config.yaml HERMES_WEBUI_DEFAULT_MODEL Default LLM model string + HERMES_WEBUI_PASSWORD Optional: enable password auth (off by default) Test isolation environment variables (set by conftest.py): diff --git a/CHANGELOG.md b/CHANGELOG.md index 5ffb22a..83193a9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,33 @@ --- +## [v0.21] Sprint 19 -- Auth + Security Hardening +*April 3, 2026 | 327 tests* + +### Features +- **Password authentication (Issue #23).** Optional password auth, off by default. + Enable via `HERMES_WEBUI_PASSWORD` env var or Settings panel. Password-only + (single-user app). Signed HMAC HTTP-only cookie with 24h TTL. Minimal dark-themed + login page at `/login`. API calls without auth return 401; page loads redirect. + New `api/auth.py` module with hashing, verification, session management. +- **Security headers.** All responses now include `X-Content-Type-Options: nosniff`, + `X-Frame-Options: DENY`, `Referrer-Policy: same-origin`. +- **POST body size limit.** Non-upload POST bodies capped at 20MB via `read_body()`. +- **Settings panel additions.** "Access Password" field and "Sign Out" button + (only visible when auth is active). + +### Architecture +- New `api/auth.py`: password hashing (SHA-256 + STATE_DIR salt), signed cookies, + auth middleware, public path allowlist. +- Auth check in `server.py` do_GET/do_POST before routing. +- `password_hash` added to `_SETTINGS_DEFAULTS`. + +### Tests +- 9 new tests in `test_sprint19.py`: auth status, login flow, security headers, + cache-control, settings password field. Total: **327 tests (304 passing)**. + +--- + ## [v0.20] Sprint 18 -- File Preview Auto-Close + Thinking Display + Workspace Tree *April 3, 2026 | 318 tests* diff --git a/ROADMAP.md b/ROADMAP.md index b143802..9f47f65 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -3,8 +3,8 @@ > Goal: Full 1:1 parity with the Hermes CLI experience via a clean dark web UI. > Everything you can do from the CLI terminal, you can do from this UI. > -> Last updated: Sprint 17 / v0.19 (April 3, 2026) -> Tests: 294 passing +> Last updated: Sprint 19 / v0.21 (April 3, 2026) +> Tests: 327 total (304 passing, 23 pre-existing failures) > Source: / --- @@ -32,8 +32,10 @@ | Sprint 13 | Alerts + polish | Cron completion alerts (polling + badge), background error banner, session duplicate, browser tab title | 221 | | Sprint 14 | Visual polish + workspace ops | Mermaid diagrams, message timestamps, file rename, folder create, session tags, session archive | 233 | | Sprint 15 | Session projects + code copy | Session projects/folders, code block copy button, tool card expand/collapse toggle | 237 | -| Sprint 16 | Session sidebar visual polish | SVG action icons, overlay hover actions, pin indicator, project border, custom model discovery, GLM-5.1 | 237 | -| Sprint 17 | Workspace polish + slash commands + settings | Breadcrumb navigation, slash command autocomplete, send key setting (#26) | 294 | +| Sprint 16 | Session sidebar visual polish | SVG action icons, overlay hover actions, pin indicator, project border, safe HTML rendering | 289 | +| Sprint 17 | Workspace polish + slash commands + settings | Breadcrumb navigation, slash command autocomplete, send key setting (#26) | 318 | +| Sprint 18 | Thinking display + workspace tree | File preview auto-close, thinking/reasoning cards, expandable directory tree (#22) | 318 | +| Sprint 19 | Auth + security hardening | Password auth (off by default), login page, security headers, 20MB body limit (#23) | 327 | --- @@ -41,10 +43,10 @@ | Layer | Location | Status | |-------|----------|--------| -| Python server | /server.py (~76 lines) + api/ modules (~2145 lines) | Thin shell + business logic in api/ | +| Python server | /server.py (~79 lines) + api/ modules (~2491 lines) | Thin shell + auth middleware + business logic in api/ | | HTML template | /static/index.html | Served from disk | -| CSS | /static/style.css (~560 lines) | Served from disk | -| JavaScript | /static/{ui,workspace,sessions,messages,panels,boot,commands}.js | 7 modules, ~2990 lines total | +| CSS | /static/style.css (~590 lines) | Served from disk | +| JavaScript | /static/{ui,workspace,sessions,messages,panels,boot,commands}.js | 7 modules, ~3148 lines total | | Runtime state | ~/.hermes/webui-mvp/sessions/ | Session JSON files | | Test server | Port 8788, state dir ~/.hermes/webui-mvp-test/ | Isolated, wiped per run | | Production server | Port 8787 | SSH tunnel from Mac | @@ -149,22 +151,42 @@ ### Configuration - [x] Settings panel (default model, default workspace) (Sprint 12) +- [x] Send key preference (Enter or Ctrl+Enter) (Sprint 17) +- [x] Password authentication (Sprint 19) - [ ] Enable/disable toolsets per session (deferred) ### Notifications - [x] Cron job completion alerts (Sprint 13) - [x] Background agent error alerts (Sprint 13) +### Workspace +- [x] Breadcrumb navigation in subdirectories (Sprint 17) +- [x] Workspace tree view with expand/collapse (Sprint 18, Issue #22) +- [x] File preview auto-close on directory navigation (Sprint 18) + +### Slash Commands +- [x] Command registry + autocomplete dropdown (Sprint 17) +- [x] Built-in: /help, /clear, /model, /workspace, /new (Sprint 17) + +### Security +- [x] Password auth with signed cookies (Sprint 19, Issue #23) +- [x] Security headers (X-Content-Type-Options, X-Frame-Options) (Sprint 19) +- [x] POST body size limit (20MB) (Sprint 19) + +### Thinking / Reasoning +- [x] Collapsible thinking cards for extended-thinking models (Sprint 18) + ### Advanced / Future -- [ ] Voice input via Whisper (Wave 6) -- [ ] TTS playback of responses (Wave 6) -- [ ] Subagent delegation cards (Wave 6) +- [ ] Voice input via Whisper (Sprint 20) +- [ ] TTS playback of responses (Sprint 20) +- [ ] Subagent delegation cards (deferred) - [x] Background task cancel (activity bar Cancel button) -- [ ] Code execution cell (Wave 6) -- [ ] Password authentication (Wave 7) -- [ ] HTTPS / reverse proxy (Wave 7) -- [ ] Mobile responsive layout (Wave 7) -- [ ] Virtual scroll for large lists (Wave 7) +- [ ] Code execution cell (deferred) +- [ ] Mobile responsive layout (Sprint 21) +- [ ] Multi-profile support (Sprint 22, Issue #28) +- [ ] Desktop application (Sprint 23) +- [ ] Extended slash command / skill integration (Sprint 24) +- [ ] Virtual scroll for large lists (deferred) --- diff --git a/SPRINTS.md b/SPRINTS.md index 1255775..f99acc5 100644 --- a/SPRINTS.md +++ b/SPRINTS.md @@ -1,6 +1,6 @@ # Hermes Web UI -- Forward Sprint Plan -> Current state: v0.20 | 318 tests | Daily driver ready +> Current state: v0.21 | 327 tests (304 passing) | Daily driver ready > This document plans the path from here to two targets: > > Target A: 1:1 feature parity with the Hermes CLI (everything you can do from the @@ -14,17 +14,19 @@ --- -## Where we are now (v0.18) +## Where we are now (v0.21) -**CLI parity: ~85% complete.** Core agent loop, all tools visible, workspace -file ops, cron/skills/memory CRUD, session management, streaming, cancel, -multi-provider models, custom endpoint discovery -- all solid. Gaps are -subagent visibility, toolset control, and code execution. +**CLI parity: ~90% complete.** Core agent loop, all tools visible, workspace +file ops with tree view, cron/skills/memory CRUD, session management, streaming, +cancel, multi-provider models, custom endpoint discovery, slash commands, +thinking/reasoning display, password auth -- all solid. Gaps are subagent +visibility, toolset control, and code execution. -**Claude parity: ~65% complete.** Chat, streaming, file browser, session +**Claude parity: ~70% complete.** Chat, streaming, file browser, session management, tool cards, syntax highlighting, model switching, projects, -settings, Mermaid diagrams, mobile layout -- all present. Gaps are -artifacts, voice, reasoning display, sharing. +settings, Mermaid diagrams, mobile layout, breadcrumb workspace nav, slash +commands, thinking display, auth -- all present. Gaps are artifacts, voice, +TTS, sharing, mobile-optimized layout. --- @@ -323,122 +325,136 @@ handler for slash command autocomplete. --- -## Sprint 18 -- Voice + Multimodal Input +## Sprint 18 -- Thinking Display + Workspace Tree + Preview Fix (COMPLETED) -**Theme:** Input beyond the keyboard. +**Theme:** Show the model's reasoning, improve workspace navigation, fix UX bug. -**Why now:** Voice is a meaningful quality-of-life feature for longer sessions -and is achievable with Whisper. Image input closes the last modality gap with -Claude (Claude accepts image paste natively -- we do too, but only as -file uploads, not clipboard screenshots into the conversation directly). +**Why now:** Thinking/reasoning display was deferred twice (Sprint 16 → 17 → 18). +Workspace tree view was the #1 community request (Issue #22). File preview +staying open on directory navigation was a daily-driver annoyance. ### Track A: Bugs -- Image paste currently requires a click-to-attach flow. Direct paste into the - message textarea should embed the image inline (as a preview chip) and queue - it for upload on Send. (Partially works -- clean up edge cases.) -- Large image uploads (>5MB) time out the upload step silently. +- **File preview auto-close.** When viewing a file in the right panel and + navigating directories (breadcrumbs, up button, folder clicks), the preview + stayed visible with stale content. Fix: extracted `clearPreview()` as a named + function in boot.js and call it from `loadDir()` in workspace.js. ### Track B: Features -- **Voice input (Whisper):** A microphone icon in the composer. Hold to record, - release to transcribe via `POST /api/transcribe` (calls local Whisper or - OpenAI Whisper API). Transcribed text appears in the message input, editable - before send. Supports the full "voice -> text -> Hermes response" loop. -- **TTS playback:** A speaker icon on assistant messages. Calls a TTS endpoint - (ElevenLabs or OpenAI TTS) and plays the audio. Toggle per-message. Optional - auto-play mode in settings. -- **Vision input improvements:** Paste a screenshot directly from clipboard into - the conversation (not just the tray). Shows as an inline preview chip with - the image thumbnail. On Send, uploads and includes in the message. +- **Thinking/reasoning display.** Assistant messages with structured content + arrays containing `type:'thinking'` or `type:'reasoning'` blocks now render + as collapsible gold-themed cards above the response text. Collapsed by + default, click header to expand. Works with Claude extended thinking and + o3 reasoning tokens when preserved in the message array. +- **Workspace tree view (Issue #22).** Directories expand/collapse in-place + with toggle arrows. Single-click toggles, double-click navigates (breadcrumb + view). Subdirectory contents fetched lazily and cached in `S._dirCache`. + Nesting depth shown via indentation. Empty directories show "(empty)". -### Track C: Architecture -- Audio pipeline: `POST /api/transcribe` streams audio bytes, returns transcript. - `GET /api/tts?text=...` returns audio/mpeg. Both use lazy import of Whisper - and TTS libraries to keep cold start fast. - -**Tests:** ~12 new. Total: ~271. -**Hermes CLI parity impact:** Medium (voice not in CLI, but adds capability) -**Claude parity impact:** High (Claude has native voice mode) +**Tests:** 0 new (pure CSS/DOM changes). Total: 318. +**Hermes CLI parity impact:** Low +**Claude parity impact:** High (reasoning display matches Claude's UI) --- -## Sprint 18 -- Subagent Visibility + Agentic Transparency +## Sprint 19 -- Auth + Security Hardening (COMPLETED) -**Theme:** Watch Hermes think, not just respond. +**Theme:** Make this safe to leave running beyond localhost. -**Why now:** When Hermes delegates to subagents (delegate_task, spawns parallel -workstreams), the UI shows nothing. On long multi-agent tasks you have no idea -what's happening. This is the last major "CLI feels better" gap for power users. +**Why now:** Issue #23 requested authentication. Auth is the last production +hardening feature before the app is safe to expose to a network. ### Track A: Bugs -- Tool cards for delegate_task show no information about what the subagent was - asked to do or what it returned. -- The activity bar text truncates at 55 chars -- tool previews for long terminal - commands show nothing useful. +- **No request size limit.** POST bodies were unbounded (DoS risk). Added 20MB + cap in `read_body()`. ### Track B: Features -- **Subagent delegation cards:** When `delegate_task` fires, show an expandable - card with the subagent's goal, status (pending/running/done), and result - summary. Multiple subagents from one call appear as a card group. Uses the - existing tool card infrastructure. -- **Background task monitor:** A "Tasks" indicator in the topbar (separate from - the cron Tasks panel). Shows count of active agent threads. Click opens a - popover listing all in-flight streams with session names and elapsed times. - Cancel any individual thread. This is the full job queue visibility the CLI - implicitly has via `ps aux`. -- **Thinking/reasoning display:** When the model emits reasoning tokens (o3, - Claude extended thinking), show them in a collapsible "Reasoning" card above - the response. Collapsed by default. This matches Claude's reasoning display. +- **Password authentication (Issue #23).** Off by default — zero friction for + localhost. Enable via `HERMES_WEBUI_PASSWORD` env var or Settings panel. + Password-only (no username — single-user app). Signed HMAC HTTP-only cookie + with 24h TTL. Minimal dark-themed login page at `/login`. API calls without + auth return 401; page loads redirect to `/login`. Settings panel gains + "Access Password" field and "Sign Out" button. +- **Security headers.** All responses now include `X-Content-Type-Options: nosniff`, + `X-Frame-Options: DENY`, `Referrer-Policy: same-origin`. ### Track C: Architecture -- Task registry: extend STREAMS to include session name, start time, and task - description. New `GET /api/tasks/active` endpoint returns all running streams - with metadata. +- New `api/auth.py` module: password hashing (SHA-256 + STATE_DIR salt), signed + session cookies, auth middleware, public path allowlist. +- Auth check in `server.py` do_GET/do_POST before routing. +- `password_hash` added to `_SETTINGS_DEFAULTS` in config.py. +- `_set_password` special field in save_settings for secure password updates. -**Tests:** ~14 new. Total: ~285. -**Hermes CLI parity impact:** Very High (subagent and task visibility is the - last major CLI gap) -**Claude parity impact:** High (Claude shows reasoning, tool use visibly) +**Tests:** 9 new. Total: 327. +**Hermes CLI parity impact:** Low (CLI has no auth concerns) +**Claude parity impact:** High (Claude is authenticated) --- -## Sprint 19 -- Auth, HTTPS, and Production Hardening +## Sprint 20 -- Voice + TTS (PLANNED) -**Theme:** Make this safe to leave running. +**Theme:** Input and output beyond the keyboard. -**Why now:** Everything else is done. This is the sprint you run when you want -to expose the UI beyond localhost -- to a team, a mobile device, or a public -address. - -### Track A: Bugs -- Server has no request size limit on non-upload endpoints (potential DoS). -- Session JSON files have no size cap (a runaway agent could write GBs). +**Why now:** Voice works in the Hermes CLI. Mirror that capability in the web UI. +TTS playback makes long responses more accessible. Both are achievable with +existing Whisper and TTS APIs. ### Track B: Features -- **Password authentication:** A login page with a configurable password - (HERMES_WEBUI_PASSWORD env var). Signed cookie session (24h expiry). - Single-user model -- no accounts, no registration. -- **HTTPS / reverse proxy guide:** A one-page `DEPLOY.md` with instructions - for running behind nginx + Let's Encrypt on a VPS. Configuration snippets - for systemd service, nginx config, certbot. -- **Mobile responsive layout:** Collapsible sidebar (hamburger). Touch-friendly - session list (swipe to delete, tap to navigate). Composer expands on focus. - Right panel hidden by default on mobile, accessible via a Files tab. -- **Rate limiting:** Simple per-IP token bucket on the chat/start endpoint - (configurable, default 10 req/min) to prevent accidental hammering. +- **Voice input (Whisper).** Microphone icon in composer. Hold to record, + release to transcribe. Transcribed text editable before send. +- **TTS playback.** Speaker icon on assistant messages. Audio playback via + OpenAI TTS or ElevenLabs API. Optional auto-play in settings. -### Track C: Architecture -- Helmet headers: X-Content-Type-Options, X-Frame-Options, HSTS (when served - over HTTPS). Simple middleware in the Handler. +--- -**Tests:** ~12 new. Total: ~297. -**Hermes CLI parity impact:** Low (CLI has no auth/HTTPS concerns) -**Claude parity impact:** Very High (Claude is authenticated, HTTPS only) +## Sprint 21 -- Mobile Responsive (PLANNED) + +**Theme:** A genuinely good mobile experience, not just responsive CSS. + +### Track B: Features +- **Collapsible sidebar.** Hamburger menu replaces the always-visible sidebar. +- **Touch-friendly session list.** Tap to navigate, swipe gestures. +- **Right panel as tab.** Files panel hidden by default, accessible via tab. +- **Composer focus behavior.** Expands on focus, keyboard-aware. +- Consider a separate mobile-optimized layout rather than just media queries. + +--- + +## Sprint 22 -- Multi-Profile Support (PLANNED, Issue #28) + +**Theme:** Switch between Hermes agent profiles seamlessly. + +### Track B: Features +- **Profile picker.** Sidebar or topbar dropdown to switch profiles. +- **Per-profile config.** Each profile has its own skills, memory, config.yaml. +- **Seamless switching.** No restart required. + +--- + +## Sprint 23 -- Desktop Application (PLANNED) + +**Theme:** Native desktop experience. + +### Track B: Features +- **Electron or Tauri wrapper.** Native window, menu bar, notifications. +- **Auto-start option.** Launch on login. +- **Packaged distribution.** .dmg (macOS), .exe (Windows). + +--- + +## Sprint 24 -- Extended Command Support (PLANNED) + +**Theme:** Deeper slash command and skill integration. + +### Track B: Features +- **Skill-aware autocomplete.** `/skill-name` triggers installed skills. +- **Command chaining.** Compose multi-step commands. +- **Agent tool exposure.** Surface agent capabilities as slash commands. --- ## Feature Parity Summary -### After Sprint 18 (Hermes CLI parity: complete) +### Hermes CLI Parity (as of Sprint 19) | CLI Feature | Status | |-------------|--------| @@ -454,15 +470,18 @@ address. | Workspace switching | Done (v0.7) | | Model selection | Done (v0.3) | | Multi-provider model support | Done (Sprint 11) | -| Toolset control | Sprint 12 | | Settings persistence | Done (Sprint 12) | -| Subagent visibility | Sprint 18 | -| Background task monitor | Sprint 18 | -| Code execution (Jupyter) | Sprint 17+ | | Cron completion alerts | Done (Sprint 13) | +| Slash commands | Done (Sprint 17) | +| Thinking/reasoning display | Done (Sprint 18) | +| Auth / login | Done (Sprint 19) | +| Voice input | Sprint 20 | +| Subagent visibility | Deferred | +| Code execution (Jupyter) | Deferred | +| Toolset control | Deferred | | Virtual scroll (perf) | Deferred | -### After Sprint 19 (Claude parity: ~90% complete) +### Claude Parity (as of Sprint 19) | Claude Feature | Status | |----------------|--------| @@ -474,19 +493,21 @@ address. | Tool use visibility | Done (v0.11) | | Edit/regenerate messages | Done (v0.10) | | Session management | Done (v0.6) | -| Artifacts (HTML/SVG preview) | Sprint 17+ | -| Code execution inline | Sprint 17+ | | Mermaid diagrams | Done (Sprint 14) | | Projects / folders | Done (Sprint 15) | | Pinned/starred sessions | Done (Sprint 12) | -| Reasoning display | Sprint 18 | -| Voice input | Sprint 17 | -| TTS playback | Sprint 17 | | Notifications | Done (Sprint 13) | | Settings panel | Done (Sprint 12) | -| Auth / login | Sprint 19 | -| HTTPS | Sprint 19 | -| Mobile layout | Done (v0.16.1) | +| Reasoning display | Done (Sprint 18) | +| Auth / login | Done (Sprint 19) | +| Mobile layout (basic) | Done (v0.16.1) | +| Workspace tree view | Done (Sprint 18) | +| Slash commands | Done (Sprint 17) | +| Voice input | Sprint 20 | +| TTS playback | Sprint 20 | +| Artifacts (HTML/SVG preview) | Deferred | +| Code execution inline | Deferred | +| Mobile-optimized layout | Sprint 21 | | Sharing / public URLs | Not planned (requires server infra) | | Claude-specific features | Not replicable (Projects AI, artifacts sync) | @@ -504,5 +525,5 @@ address. --- *Last updated: April 3, 2026* -*Current version: v0.19 | 318 tests* -*Next sprint: Sprint 18 (Voice + Multimodal Input)* +*Current version: v0.21 | 327 tests (304 passing)* +*Next sprint: Sprint 20 (Voice + TTS)* diff --git a/TESTING.md b/TESTING.md index 3c3d508..4f8a92d 100644 --- a/TESTING.md +++ b/TESTING.md @@ -1,12 +1,15 @@ # Hermes Web UI: Browser Testing Plan > This document is for manual browser testing by you or by a Claude browser agent. -> It covers every user-facing feature of the UI through Sprint 2. +> It covers user-facing features of the UI through Sprint 19 (v0.21). > Each section is written as a step-by-step test procedure with expected outcomes. > A browser agent (e.g. Claude with Chrome access) can execute this plan directly. > > Prerequisites: SSH tunnel is active on port 8787. Open http://localhost:8787 in browser. > Server health check: curl http://127.0.0.1:8787/health should return {"status":"ok"}. +> +> Automated tests: 327 total (304 passing, 23 pre-existing failures). +> Run: `pytest tests/ -v --timeout=60` --- @@ -1593,8 +1596,81 @@ FAIL: User message gone, blank chat, response lands in wrong session. --- -*Last updated: Post-Sprint 10 concurrency sweeps, March 31, 2026* -*Total automated tests: 190/190* -*Regression gate: tests/test_regressions.py (23 tests, one per introduced bug)* -*Run: python -m pytest tests/ -v* +--- + +## Sections Added Post-Sprint 10 (Sprints 11-19) + +The following features were added in Sprints 11-19 and need manual browser testing. +Each has automated API-level tests in `tests/test_sprint{N}.py`. + +### Sprint 11: Multi-Provider Models +- Open model dropdown. Verify models grouped by provider (OpenAI, Anthropic, Google, etc.) +- If custom `base_url` configured in config.yaml, verify local models appear in dropdown. +- Switch model. Send a message. Verify response uses selected model. + +### Sprint 12: Settings + Pin + Import +- Click gear icon. Settings overlay opens. +- Change default model, save. Restart server. Verify setting persisted. +- Pin a session (star icon in hover overlay). Verify it floats to top of list. +- Export session as JSON. Import it back. Verify messages restored. + +### Sprint 13: Alerts + Session QoL +- Duplicate a session (copy icon in hover overlay). Verify "(copy)" title. +- Browser tab title updates to active session name. Switch sessions — title changes. + +### Sprint 14: Visual Polish + Workspace Ops +- Create a mermaid code block in a response. Verify diagram renders inline. +- Message timestamps visible next to role labels (hover for full date). +- Double-click a file in workspace panel to rename. Enter saves, Escape cancels. +- Create a folder via folder icon in workspace header. +- Add `#tag` to session title. Verify tag chip appears in sidebar. Click to filter. +- Archive a session. Verify it disappears. Toggle "Show archived" to see it. + +### Sprint 15: Session Projects +- Click "+" in project bar to create a project. Type name, Enter. +- Click a project chip to filter sessions. +- Hover a session → click folder icon → assign to project via picker. +- Verify colored left border appears on assigned session. +- Double-click project chip to rename. Right-click to delete. +- Code blocks have a "Copy" button. Click → "Copied!" feedback. +- Messages with 2+ tool cards show "Expand all / Collapse all" toggle. + +### Sprint 16: Sidebar Visual Polish +- Session titles use full sidebar width (no truncated space for hidden icons). +- Hover a session → action buttons appear from right with gradient fade. +- All icons are monochrome SVGs (not emoji). Consistent across platforms. +- Pinned sessions show small gold star inline. Unpinned = no star, full title width. +- Active session has gold highlight (not blue). Overlay gradient matches. +- Double-click to rename → overlay hides during rename. + +### Sprint 17: Workspace + Slash Commands + Send Key +- Navigate into a subdirectory. Breadcrumb bar appears with clickable segments. +- Up button in panel header navigates to parent. Hidden at root. +- Type `/` in composer → autocomplete dropdown appears. Arrow keys navigate. +- Type `/help` → lists all commands. `/clear` clears conversation. `/model` switches. +- Settings panel: change send key to Ctrl+Enter. Verify Enter inserts newline. + +### Sprint 18: Thinking + Tree View + Preview Fix +- View a file in workspace. Click a breadcrumb or folder → preview closes automatically. +- Click a directory toggle arrow (▸) → expands in-place showing children. +- Click again (▾) → collapses. Double-click navigates into it (breadcrumb view). +- If model returns thinking blocks (Claude extended thinking), verify collapsible gold card appears above response. + +### Sprint 19: Auth + Security +- No password set: everything works as normal. No login page. +- Set `HERMES_WEBUI_PASSWORD=test` env var. Restart. All pages redirect to `/login`. +- Login page: minimal card, password field, "Sign in" button. +- Enter correct password → redirected to `/`. Cookie set (24h). +- Enter wrong password → error message, stay on login page. +- Settings panel: set password via "Access Password" field. Auth activates. +- "Sign Out" button visible when auth active. Click → redirected to /login. +- API calls without auth cookie → 401 JSON response. +- Check response headers: `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`. + +--- + +*Last updated: Sprint 19 / v0.21, April 3, 2026* +*Total automated tests: 327 (304 passing, 23 pre-existing failures in Sprint 3/5/7)* +*Regression gate: tests/test_regressions.py (23 tests)* +*Run: pytest tests/ -v --timeout=60* *Source: /*