diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index 3e60456..5c31119 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -33,14 +33,14 @@ jobs: username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - # Extract semver tags: e.g. v0.28 -> 0.28, latest + # Extract tags from the git ref (supports vX.Y and vX.Y.Z formats) - name: Extract metadata id: meta uses: docker/metadata-action@v5 with: images: ghcr.io/${{ github.repository }} tags: | - type=match,pattern=v(.*),group=1 + type=match,pattern=v(\d+\.\d+(?:\.\d+)?),group=1 type=raw,value=latest # Build and push multi-arch image (amd64 + arm64) diff --git a/CHANGELOG.md b/CHANGELOG.md index ebc7542..9ac08e1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,8 +5,8 @@ --- -## [v0.29] CLI Session Bridge (Community: @thadreber-web) -*April 3, 2026 | 426 tests* +## [v0.30] CLI Session Bridge (Community: @thadreber-web) +*April 4, 2026 | 426 tests* ### Features - **CLI session bridge.** The WebUI now reads sessions from the hermes-agent's @@ -32,6 +32,104 @@ --- +## [v0.29] Sprint 23: Agentic Transparency + Polish +*April 4, 2026 | 424 tests* + +### Features + +- **Token/cost display.** Agent usage (input tokens, output tokens, estimated + cost) is now read after each conversation and persisted on the session. + A muted badge appears below the last assistant message when enabled. + Off by default — toggle via the Settings panel checkbox or `/usage` slash + command. Persists server-side across refreshes. + +- **Subagent delegation cards.** `subagent_progress` events now render with + a 🔀 icon and a blue indented left border to visually distinguish child + tool activity from parent tool calls. `delegate_task` cards display as + "Delegate task" with cleaner formatting. + +- **Skill picker in cron create form.** The "New Job" form now has a search + input + tag chip picker for attaching skills to cron jobs. Skills fetched + from `/api/skills`, filtered on keyup, added/removed as tag chips. + `submitCronCreate()` sends `skills` array in the POST body. Backend already + supported the field — this was a pure frontend gap. + +- **Skill linked files viewer.** Skill preview panel now renders a "Linked + Files" section below SKILL.md content when a skill has `references/`, + `templates/`, `scripts/`, or `assets/` subdirectories. Clicking a file + loads it in the preview panel with syntax highlighting. + New `file` query param on `GET /api/skills/content` serves linked files + with path traversal protection. + +- **Workspace tree state persists across refreshes.** Expanded directory + paths are saved to `localStorage` keyed by workspace path + (`hermes-webui-expanded:{path}`). On every root load (page refresh, + session switch), the saved state is restored and previously-expanded + directories are pre-fetched so the tree renders fully on first paint. + +- **Timestamps fixed.** `api/streaming.py` now stamps `timestamp` on every + message that lacks one at conversation completion. The `done` SSE event + also stamps `_ts` on the last assistant message immediately. Timestamps + were already rendered in the UI (Sprint 14, hover-to-reveal) but most + messages had no timestamp field, so nothing ever showed. + +- **`/usage` slash command.** Instant toggle for token usage display. + Shows a toast, persists to server, updates the Settings checkbox if open, + re-renders immediately. + +### Bug Fixes + +- **XSS via inline onclick + esc().** Skill names and file paths embedded in + `onclick` HTML attributes used `esc()` for encoding. `esc()` converts `'` + to `'` (HTML-safe) but browsers decode it back before executing JS, + allowing skill names with apostrophes to break out of string literals. + Fixed by switching to `data-*` attributes + `addEventListener`. + +- **rglob wildcard injection.** The `name` query param for + `/api/skills/content?file=` was passed directly to `SKILLS_DIR.rglob()`, + which accepts glob patterns. `name=*` would match an arbitrary directory + and use it as the trust base for path traversal checking. + Fixed by rejecting names containing `* ? [ ]` metacharacters with 400. + +- **`_fmtTokens(null)` returned "null".** `String(null)` = `"null"` would + appear in the usage badge for sessions missing fields. Fixed with a + `!n || n < 0` guard returning `'0'`. + +- **Usage badge on wrong row.** Badge used `:last-child` which could target + a user message row. Fixed by adding `data-role` to message rows and + scanning backwards for the last `assistant` row. + +- **Tool name resolution.** Tool call entries in session JSON sometimes + stored the literal string `"tool"` as the name when the call ID couldn't + be resolved. Fixed: defaults to empty string and skips unresolvable entries. + +- **Inline import inside loop.** `import json as _j2` inside the done-handler + loop in `streaming.py` moved to module-level. + +### Session Model + +- Added `input_tokens`, `output_tokens`, `estimated_cost` fields to Session + (defaults: 0, 0, None). Included in `compact()`, session JSON, and all + API responses. Backward-compatible via `**kwargs`. + +- Added `args` capture to `tool_calls` session JSON entries (truncated + snapshot of tool inputs, up to 6 keys / 120 chars each). + +### Settings + +- New `show_token_usage` boolean setting (default: `false`). Stored in + `settings.json`, loaded on boot alongside `send_key`. + +### Tests + +- Renamed `test_sprint24.py` → `test_sprint23.py`. +- Strengthened session usage assertions (explicit field presence checks). +- Added: path traversal rejection test, wildcard name rejection test, + cron create with skills array test. +- Total: 424 tests (up from 415). + +--- + ## [v0.28.1] CI Pipeline + Multi-Arch Docker Builds *April 3, 2026 | 426 tests* @@ -944,4 +1042,4 @@ Three-panel layout: sessions sidebar, chat area, workspace panel. --- -*Last updated: v0.29, April 3, 2026 | Tests: 426* +*Last updated: v0.30, April 4, 2026 | Tests: 426* diff --git a/ROADMAP.md b/ROADMAP.md index a603d53..b237e84 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -3,8 +3,8 @@ > Goal: Full 1:1 parity with the Hermes CLI experience via a clean dark web UI. > Everything you can do from the CLI terminal, you can do from this UI. > -> Last updated: v0.28.1 (April 3, 2026) -> Tests: 426 total (403 passing, 23 pre-existing failures) +> Last updated: v0.29 (April 4, 2026) +> Tests: 424 total (401 passing, 23 pre-existing failures) > Source: / --- @@ -39,6 +39,7 @@ | Sprint 20 | Voice input + send button | Voice input (Web Speech API), send button icon-circle with pop-in animation | 415 | | Sprint 21 | Mobile responsive + Docker | Hamburger sidebar, bottom nav, files slide-over, Docker support (#21, #7) | 415 | | Sprint 22 | Multi-profile support | Profile picker, management panel, seamless switching, per-session tracking (#28) | 415 | +| Sprint 23 | Agentic transparency | Token/cost display, subagent cards, skill picker in cron, skill linked files, workspace tree persistence, timestamp fixes | 424 | --- @@ -76,7 +77,7 @@ - [x] Copy message to clipboard (hover icon on each bubble) - [x] Edit last user message and regenerate - [ ] Branch/fork conversation (Wave 3) -- [ ] Token/cost estimate per message (Wave 3) +- [x] Token/cost estimate per message (Sprint 23) ### Tool Visibility - [x] Tool progress in activity bar (moved out of composer footer) @@ -137,7 +138,7 @@ - [x] Edit existing cron job - [x] Delete cron job - [x] View full cron run history (expandable per job) -- [ ] Skill picker in cron create form (Wave 3) +- [x] Skill picker in cron create form (Sprint 23) ### Skills - [x] List all skills grouped by category (Skills sidebar tab) @@ -146,7 +147,7 @@ - [x] Create skill - [x] Edit skill - [x] Delete skill -- [ ] View skill linked files (Wave 3) +- [x] View skill linked files (Sprint 23) ### Memory - [x] View personal notes (MEMORY.md) rendered as markdown (Memory tab) diff --git a/SPRINTS.md b/SPRINTS.md index 9ced12b..00b24a1 100644 --- a/SPRINTS.md +++ b/SPRINTS.md @@ -1,6 +1,6 @@ # Hermes Web UI -- Forward Sprint Plan -> Current state: v0.29 | 426 tests | Daily driver ready +> Current state: v0.30 | 426 tests | Daily driver ready > This document plans the path from here to two targets: > > Target A: 1:1 feature parity with the Hermes CLI (everything you can do from the @@ -567,25 +567,260 @@ and switchToProfile() didn't refresh workspaces or sessions. --- -## Sprint 24 -- Desktop Application (PLANNED) +## Sprint 24 -- Web Polish + Bug Fix Pass (PLANNED) -**Theme:** Native desktop experience. +**Theme:** Stabilize, harden, and close the last meaningful web UI gaps before +shifting focus to distribution. Goal is a release that's genuinely ready for +wider user adoption -- no rough edges, no obvious missing pieces. + +**Why now:** Sprint 23 completed the core agentic transparency features. The +remaining web roadmap items are diminishing-returns polish. Rather than +grinding through marginal features, this sprint cleans up what's there, fixes +bugs users will actually hit, and closes a few real gaps before recommending +the app to others. + +### Track A: Bug Fixes +- **Cron edit form has no skill picker.** Sprint 23 added skill picker to the + create form but not the edit form. cronEditSave() doesn't include skills in + the update body, so existing skills survive an edit but can't be changed. + Fix: add the same skill picker UI to the inline edit form and include + `skills` in the update POST body. +- **S.lastUsage dead code.** messages.js sets `S.lastUsage` from `d.usage` at + done-time, but nothing reads it. The usage badge reads cumulative session + totals from `S.session.input_tokens` instead. Either wire `S.lastUsage` into + a per-turn display or remove the dead assignment. +- **_cronSkillsCache never invalidated.** Skills picker shows stale data if + skills are added/removed mid-session. Add a cache-bust when the skills panel + is opened or a skill is saved/deleted. +- **Tool args not shown on session reload.** Tool call cards in history show + name and result snippet but not the args (args only exist in the live SSE + event). Sprint 23 added args to the session JSON -- verify they're actually + rendering in the settled history cards. ### Track B: Features -- **Electron or Tauri wrapper.** Native window, menu bar, notifications. -- **Auto-start option.** Launch on login. -- **Packaged distribution.** .dmg (macOS), .exe (Windows). +- **Cron edit: skill picker parity.** As above -- make create and edit forms + identical in capability. +- **Per-turn cost display.** The current usage badge shows cumulative session + totals attached to the last message, which is misleading. Either: (a) show + per-turn cost from `S.lastUsage` immediately after each response instead of + cumulative, or (b) show cumulative in the session topbar/header instead of + attached to a message bubble. Pick the cleaner UX. +- **Virtual scroll for long session/skill lists.** When session count or skill + count gets large (100+), the sidebar becomes sluggish. Add a simple virtual + scroll or windowed render -- only render visible items + a buffer above/below. + CSS `contain: strict` + IntersectionObserver approach, no library needed. + +### Track C: Code Quality +- **SPRINTS.md + ROADMAP.md + CHANGELOG.md updated** to reflect Sprint 23 + completion (agentic transparency) and correct test counts. +- **Remove stale Sprint 23 description** from SPRINTS.md (the "Profile/Workspace + coherence" text is from an older plan; Sprint 23 actually shipped agentic + transparency features). +- **CHANGELOG entry for v0.29** covering Sprint 23 deliverables. + +**Estimated tests:** ~10 new. Target total: ~435. +**Hermes CLI parity impact:** Low +**Claude parity impact:** Low +**User-facing value:** Medium -- removes rough edges that would bother new users --- -## Sprint 24 -- Extended Command Support (PLANNED) +## Sprint 25 -- macOS Desktop Application (PLANNED) -**Theme:** Deeper slash command and skill integration. +**Theme:** Native Mac desktop app. Single download, runs entirely offline, +feels like a real application -- not a browser tab. -### Track B: Features -- **Skill-aware autocomplete.** `/skill-name` triggers installed skills. -- **Command chaining.** Compose multi-step commands. -- **Agent tool exposure.** Surface agent capabilities as slash commands. +**Why this matters:** The web UI requires an SSH tunnel or a server setup to +use. A .app bundle that a user can double-click and immediately have a working +Hermes interface is genuinely differentiating. No other open-source Hermes +interface ships as a native Mac app. This is the highest-leverage remaining +investment for user adoption. + +**Approach: Swift + WKWebView (not Electron)** + +The right architecture is a thin native Swift shell (~300-500 lines) that: +1. Bundles the existing Python server and all api/ modules inside the .app +2. Spawns the server as a subprocess on a random local port at launch +3. Opens a WKWebView window pointed at that localhost port +4. Handles Mac app lifecycle natively (dock icon, cmd+Q, window management, + app menu, about box) +5. Bridges a small set of native Mac capabilities that WKWebView can't do + +**Why not Electron:** WKWebView is Safari's engine -- dramatically lighter than +Chromium. No 200MB node_modules. No separate update daemon. The .app is ~30MB +including the Python runtime, vs 150MB+ for Electron. + +**Why not full native Swift UI:** Would require rewriting the entire frontend +from scratch. The web UI is already fast, dark-themed, and feature-complete. +The thin shell approach gets 95% of the benefit at 5% of the cost. + +### Track A: Swift App Shell + +**Files to create:** +``` +desktop/ + HermesApp.swift -- @main entry point, NSApp delegate + AppDelegate.swift -- lifecycle: start server on launch, stop on quit + WindowController.swift -- NSWindow + WKWebView setup, cmd shortcuts + ServerManager.swift -- spawn/monitor Python subprocess, pick free port + MenuBuilder.swift -- native app menu (File, Edit, View, Window, Help) + Info.plist -- bundle ID, display name, version, icon + Assets.xcassets/ -- app icon (1024x1024 + all required sizes) + HermesApp.xcodeproj/ -- Xcode project file +``` + +**ServerManager.swift responsibilities:** +- Find Python: check bundled runtime first, fall back to system python3 +- Pick a free port (bind to :0, read assigned port, close, use it) +- Spawn: `python3 server.py --port {port}` as a child Process +- Monitor: if server crashes, show an error sheet and offer restart +- Shutdown: SIGTERM on app quit, wait up to 3s, then SIGKILL + +**WKWebView configuration:** +- `allowsBackForwardNavigationGestures = false` (it's a single-page app) +- `WKUserContentController` for JS bridge (native notifications, file picker) +- Wait for server health check before loading (poll /health, show loading + spinner in the native window while waiting, typically <1s) +- `userAgent` override so the server can detect desktop app context + +**Native menu items (beyond defaults):** +- File > New Session (Cmd+N) -- calls JS `newSession()` +- File > New Window (Cmd+Shift+N) -- opens second window with its own WKWebView +- View > Toggle Sidebar (Cmd+Shift+S) +- Window > Zoom, Minimize (standard) +- Help > About Hermes, Check for Updates (links to GitHub releases page) + +### Track B: Python Bundling + +Two options, in order of preference: + +**Option A: Require system Python (simpler, recommended for v1)** +- Check for `python3` at known paths: `/usr/bin/python3`, homebrew paths, + pyenv paths +- If not found: show a one-time setup sheet with instructions +- Pros: tiny download (~5MB for the Swift app + web assets), no bundling complexity +- Cons: user needs Python installed (most developers do; target audience does too) + +**Option B: Bundle python-standalone (self-contained, larger)** +- Use `python-build-standalone` (from Astral/uv project): pre-built Python + 3.11 binaries, ~30MB compressed, no Xcode toolchain needed to build +- Extract to `~/Library/Application Support/Hermes/python/` on first launch +- Install `requirements.txt` via bundled pip into a local venv +- Pros: zero dependencies, works on a clean Mac +- Cons: first launch takes ~10-20s for extraction + pip install; ~30MB download + +**Recommendation:** Ship v1 with Option A. Add Option B as an optional +"standalone" download for non-developers. + +### Track C: Distribution + +**GitHub Releases (primary):** +- Build with `xcodebuild -scheme HermesApp -configuration Release -archivePath` +- `xcodebuild -exportArchive` to produce a .app bundle +- `hdiutil create` to produce a .dmg with drag-to-Applications installer UI +- Upload .dmg as a GitHub Release asset via `gh release create` +- CI: add `.github/workflows/mac-release.yml` -- trigger on `vX.Y.Z-mac` tag + +**Code signing:** +- Without an Apple Developer account: distribute as unsigned, users must + right-click > Open on first launch (standard for open-source Mac apps) +- With a free Apple Developer account: ad-hoc signing removes the Gatekeeper + warning without paying $99/year (no notarization, but much better UX) +- With paid account ($99/year): full notarization, no warnings, direct download + +**Recommended for v1:** ad-hoc signing (free, good enough for early adopters). +Document the right-click > Open workaround in the README for unsigned builds. + +**Universal binary (Intel + Apple Silicon):** +```bash +xcodebuild archive -scheme HermesApp -destination "generic/platform=macOS" +``` +Both architectures in one .app. No separate downloads needed. + +### Track D: Native Integrations (v1 scope) + +**System notifications for cron completion:** +- The web UI polls `/api/cron/alerts` and shows in-page banners +- The Mac app can additionally post `UNUserNotificationCenter` notifications +- JS bridge: `window.webkit.messageHandlers.notify.postMessage({title, body})` +- Swift handler: posts a native notification with the cron job name and output + summary -- appears in Notification Center, works even when app is in background + +**File picker for workspace add:** +- Currently: user types a path string into the workspace add form +- Mac app: intercept workspace-add form submission, open `NSOpenPanel` instead, + return the selected path to the JS via `evaluateJavaScript` +- Much better UX -- standard Mac folder picker, no typing paths + +**Dock badge for pending approvals:** +- When an agent approval is waiting, set `NSApp.dockTile.badgeLabel = "1"` +- Clear badge when approval is resolved +- JS bridge fires when approval card appears/disappears + +**Menu bar mode (optional, v2):** +- A small status bar item (⚗️ icon in menu bar) that opens a compact popover +- Popover shows current session status, last message, quick-compose field +- Useful for running Hermes in the background without a full window + +### Track E: Testing + +Since the Swift app is thin glue, most testing remains in the existing pytest +suite (server still runs identically). New Swift-specific tests: +- `ServerManagerTests.swift`: verify port picking, process spawn, health wait +- UI tests via `XCUITest`: launch app, wait for WKWebView to load, verify + title bar shows "Hermes", verify /health responds +- Smoke test in CI: `xcodebuild test -scheme HermesApp` + +### Implementation Order + +1. `ServerManager.swift` + basic `AppDelegate` -- get Python server spawning + and health-check working from Swift +2. `WindowController.swift` -- WKWebView loading, loading spinner while + server starts +3. App icon + Info.plist -- make it look like a real app +4. `MenuBuilder.swift` -- native menus + keyboard shortcuts +5. JS bridge for notifications -- most impactful native integration +6. DMG build script + GitHub Actions CI +7. (Optional) File picker bridge, dock badge + +### What to NOT do in v1 + +- Windows or Linux wrapper (different toolchain; do Mac first, assess demand) +- Full Swift/SwiftUI rewrite of the frontend (months of work, wrong tradeoff) +- App Store submission (sandboxing breaks local server; not worth the effort) +- Auto-update mechanism (GitHub releases + manual download is fine for v1) +- Menu bar mode (cool but not v1 scope) + +### Files to create in the repo + +``` +desktop/mac/ + HermesApp/ + HermesApp.swift + AppDelegate.swift + WindowController.swift + ServerManager.swift + MenuBuilder.swift + Assets.xcassets/ + Info.plist + HermesApp.xcodeproj/ + README.md -- build instructions, requirements, signing notes +.github/workflows/ + mac-release.yml -- build + sign + upload DMG on tag push +``` + +The server code (`server.py`, `api/`, `static/`, `requirements.txt`) is +referenced from the repo root -- no duplication. The .app bundle copies them +at build time. + +**Estimated effort:** 2-3x a typical web sprint (new language, new toolchain, +bundling complexity). Realistic for a focused weekend or a dedicated agent run +with clear instructions. + +**Hermes CLI parity impact:** N/A (different distribution channel) +**Claude parity impact:** Medium (Claude.app is a native Mac app) +**User-facing value:** Very high -- lowers barrier to entry dramatically, +genuinely differentiating for an open-source project --- @@ -662,6 +897,6 @@ and switchToProfile() didn't refresh workspaces or sessions. --- -*Last updated: April 3, 2026* -*Current version: v0.29 | 426 tests* -*Next sprint: Sprint 24 (Desktop Application)* +*Last updated: April 4, 2026* +*Current version: v0.30 | 426 tests* +*Next sprint: Sprint 24 (Web Polish + Bug Fix Pass)* diff --git a/TESTING.md b/TESTING.md index 35c704b..90f9070 100644 --- a/TESTING.md +++ b/TESTING.md @@ -8,7 +8,7 @@ > Prerequisites: SSH tunnel is active on port 8787. Open http://localhost:8787 in browser. > Server health check: curl http://127.0.0.1:8787/health should return {"status":"ok"}. > -> Automated tests: 415 total (392 passing, 23 pre-existing failures). +> Automated tests: 424 total (401 passing, 23 pre-existing failures). > Run: `pytest tests/ -v --timeout=60` --- diff --git a/static/index.html b/static/index.html index c2e4003..93e40e4 100644 --- a/static/index.html +++ b/static/index.html @@ -13,7 +13,7 @@