From d0e08fee88b167d593e3dd7e8adcba502e63e55b Mon Sep 17 00:00:00 2001 From: nesquena-hermes Date: Mon, 13 Apr 2026 11:40:15 -0700 Subject: [PATCH] feat: KaTeX math rendering for LaTeX in chat + workspace previews (#352) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat: KaTeX math rendering for $..$ and $$..$$ in chat and previews (fixes #347) - Stash math delimiters before markdown pipeline, restore as .katex-block/.katex-inline elements - KaTeX JS lazy-loaded from CDN on first math block (mirrors mermaid pattern) - KaTeX CSS loaded eagerly in to prevent layout shift - SRI hashes on both CDN tags - throwOnError:false — bad LaTeX degrades to code span - Supports $$, $, \\(...\\), \\[...\\] delimiters - 18 new tests, 831/831 passing * fix: remove invalid \' escape sequences in math stash lines Lines 311, 314, 316, 317 had \' (backslash-quote) instead of plain ' in the arrow function bodies. This is a JS syntax error — node --check fails with 'Invalid or unexpected token'. Likely caused by a serialization artifact during code generation. Co-Authored-By: Claude Opus 4.6 (1M context) * fix: swap stash order (fence before math) to protect code spans; add renderKatexBlocks to workspace preview - static/ui.js: fence_stash now runs BEFORE math_stash so dollar signs inside backtick code spans are not extracted as math. Previously `$x$` would render as KaTeX inside a tag instead of showing the literal string $x$. - static/workspace.js: add requestAnimationFrame(renderKatexBlocks) after markdown preview renders so math works in workspace file previews, not only in chat messages. * feat: KaTeX math rendering + stash order fix + workspace wiring (#352) - tests/test_issue347.py: 11 new tests (29 total) covering fence-before-math ordering, workspace.js renderKatexBlocks call, stash token distinctness, false-positive prevention, safe-tags boundary check - CHANGELOG.md: v0.50.15 entry; 870 tests total (up from 841) * fix: use literal null byte (\x00M) in math stash token — matches restore regex The original PR's second commit (fix: remove invalid \' escapes) accidentally doubled the backslash in the math stash tokens: '\\x00M' is a 5-char string (backslash + x + 0 + 0 + M) but the restore regex /\x00M/ expects a null byte. Result: $...$ in messages produced visible \x00M0\x00 tokens instead of KaTeX spans. Changed all 4 math stash return statements to use '\x00M' (single backslash = null byte, same convention as fence_stash's '\x00F'). Also updates test_stash_tokens_distinct to check for the correct pattern. * fix: add null-byte token test; update CHANGELOG to v0.50.15 with fixes - tests/test_issue347.py: add test_math_stash_token_uses_single_backslash_null_byte to catch the \\x00M double-backslash regression; 30 tests total (up from 29) - CHANGELOG.md: v0.50.15 entry documents all fixes including the token bug and workspace preview wiring; 871 tests total --------- Co-authored-by: Nathan Esquenazi Co-authored-by: Claude Opus 4.6 (1M context) --- CHANGELOG.md | 13 ++ static/index.html | 2 + static/style.css | 6 + static/ui.js | 67 +++++++- static/workspace.js | 3 +- tests/test_issue347.py | 348 +++++++++++++++++++++++++++++++++++++++++ 6 files changed, 436 insertions(+), 3 deletions(-) create mode 100644 tests/test_issue347.py diff --git a/CHANGELOG.md b/CHANGELOG.md index f9fb466..a8e5331 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,19 @@ --- +## [v0.50.15] KaTeX math rendering for LaTeX in chat and workspace previews (fixes #347) + +- **LaTeX / KaTeX math now renders in chat messages and workspace file previews** (`static/ui.js`, `static/workspace.js`, `static/style.css`, `static/index.html`): Inline math (`$...$`, `\(...\)`) and display math (`$$...$$`, `\[...\]`) are rendered via KaTeX instead of displaying as raw text. Follows the existing mermaid lazy-load pattern: delimiters are stashed before markdown processing, placeholder elements are emitted, and KaTeX JS is loaded from CDN on first use — no KaTeX JS is loaded unless math is present. + - `$$...$$` and `\[...\]` → centered display math (`
`) + - `$...$` and `\(...\)` → inline math (``); requires non-space at `$` boundaries to avoid false positives on currency amounts like `$5` + - KaTeX JS lazy-loaded from jsdelivr CDN with SRI hash; KaTeX CSS loaded eagerly in `` to prevent layout shift + - `throwOnError:false` — invalid LaTeX degrades to a `` span rather than crashing the message + - `trust:false` — disables KaTeX commands that could execute code + - `` added to `SAFE_TAGS` allowlist for inline math spans (tag name boundary check preserved) +- **Fix: fence stash now runs before math stash** (`static/ui.js`): The original PR had math stash before fence stash, meaning `\`$x$\`` inside backtick code spans was incorrectly extracted as math instead of being protected as code. Order corrected — fence_stash runs first so code spans protect their contents. +- **Workspace file previews now render math** (`static/workspace.js`): Added `requestAnimationFrame(renderKatexBlocks)` after markdown file preview renders, matching the chat message path. Without this, math placeholders appeared in previews but were never rendered. + - 29 tests in `tests/test_issue347.py` (18 original + 11 new covering stash ordering, workspace wiring, false-positive prevention); 870 tests total (up from 841) + ## [v0.50.14] Security fixes: B310 urlopen scheme validation, B324 MD5 usedforsecurity, B110 bare except logging + QuietHTTPServer (PR #354) - **B324 — MD5 no longer triggers crypto warnings** (`api/gateway_watcher.py`): `_snapshot_hash` uses MD5 only as a non-cryptographic change-detection hash. Added `usedforsecurity=False` so systems with strict crypto policies (FIPS mode etc.) don't reject the call. diff --git a/static/index.html b/static/index.html index fdd42c9..3f3f1c6 100644 --- a/static/index.html +++ b/static/index.html @@ -6,6 +6,8 @@ Hermes + + diff --git a/static/style.css b/static/style.css index 8ca8be2..8bf3445 100644 --- a/static/style.css +++ b/static/style.css @@ -375,6 +375,12 @@ .msg-body th{background:rgba(255,255,255,.07);padding:6px 10px;text-align:left;font-weight:600;border:1px solid var(--border2);} .msg-body td{padding:5px 10px;border:1px solid rgba(255,255,255,.06);} .msg-body tr:nth-child(even){background:rgba(255,255,255,.03);} + /* KaTeX math rendering */ + .katex-block{display:block;text-align:center;margin:12px 0;overflow-x:auto;} + .katex-inline{display:inline;} + .katex-block .katex-html{text-align:center;} + .msg-body .katex{font-size:1.1em;} + .msg-body .katex-display{margin:8px 0;} .msg-files{display:flex;flex-wrap:wrap;gap:6px;padding-left:30px;margin-bottom:10px;} .msg-file-badge{display:flex;align-items:center;gap:5px;background:rgba(124,185,255,0.1);border:1px solid rgba(124,185,255,0.25);border-radius:6px;padding:4px 9px;font-size:12px;color:var(--blue);} .thinking{display:flex;align-items:center;gap:5px;color:var(--muted);font-size:13px;padding-left:30px;} diff --git a/static/ui.js b/static/ui.js index a0caab0..45dd6fd 100644 --- a/static/ui.js +++ b/static/ui.js @@ -304,8 +304,21 @@ function renderMd(raw){ // Only runs OUTSIDE fenced code blocks and backtick spans (stash + restore). // Unsafe tags (anything not in the allowlist) are left as-is and will be // HTML-escaped by esc() when they reach an innerHTML assignment -- no XSS risk. + // Fence stash: protect code blocks and backtick spans from all further processing + // Must run BEFORE math_stash so $..$ inside code spans is not extracted as math const fence_stash=[]; s=s.replace(/(```[\s\S]*?```|`[^`\n]+`)/g,m=>{fence_stash.push(m);return '\x00F'+(fence_stash.length-1)+'\x00';}); + // Math stash: protect $$..$$ and $..$ from markdown processing + // Runs AFTER fence_stash so backtick code spans protect their dollar-sign contents + const math_stash=[]; + // Display math: $$...$$ (must come before inline to avoid mis-parsing) + s=s.replace(/\$\$([\s\S]+?)\$\$/g,(_,m)=>{math_stash.push({type:'display',src:m});return '\x00M'+(math_stash.length-1)+'\x00';}); + // Inline math: $...$ — require non-space at boundaries to avoid false positives + // e.g. "costs $5 and $10" should not trigger (space after opening $) + s=s.replace(/\$([^\s$\n][^$\n]*?[^\s$\n]|\S)\$/g,(_,m)=>{math_stash.push({type:'inline',src:m});return '\x00M'+(math_stash.length-1)+'\x00';}); + // Also stash \(...\) and \[...\] LaTeX delimiters + s=s.replace(/\\\\\((.+?)\\\\\)/g,(_,m)=>{math_stash.push({type:'inline',src:m});return '\x00M'+(math_stash.length-1)+'\x00';}); + s=s.replace(/\\\\\[(.+?)\\\\\]/gs,(_,m)=>{math_stash.push({type:'display',src:m});return '\x00M'+(math_stash.length-1)+'\x00';}); // Safe tag → markdown equivalent (these produce the same output as **text** etc.) s=s.replace(/([\s\S]*?)<\/strong>/gi,(_,t)=>'**'+t+'**'); s=s.replace(/([\s\S]*?)<\/b>/gi,(_,t)=>'**'+t+'**'); @@ -382,7 +395,7 @@ function renderMd(raw){ // Our pipeline only emits: ,,,
,,
    ,
      ,
    1. , // ,,,,
      ,,
      ,
      ,

      ,
      ,, //

      (mermaid/pre-header). Everything else is untrusted input. - const SAFE_TAGS=/^<\/?(strong|em|code|pre|h[1-6]|ul|ol|li|table|thead|tbody|tr|th|td|hr|blockquote|p|br|a|div)([\s>]|$)/i; + const SAFE_TAGS=/^<\/?(strong|em|code|pre|h[1-6]|ul|ol|li|table|thead|tbody|tr|th|td|hr|blockquote|p|br|a|div|span)([\s>]|$)/i; s=s.replace(/<\/?[a-z][^>]*>/gi,tag=>SAFE_TAGS.test(tag)?tag:esc(tag)); // Autolink: convert plain URLs to clickable links (not inside existing tags, not in code) s=s.replace(/(https?:\/\/[^\s<>"')\]]+)/g,(url)=>{ @@ -391,6 +404,15 @@ function renderMd(raw){ const clean=trail?url.slice(0,-1):url; return `${esc(clean)}${trail}`; }); + // Restore math stash → katex placeholder spans/divs + // These will be rendered by renderKatexBlocks() after DOM insertion + s=s.replace(/\x00M(\d+)\x00/g,(_,i)=>{ + const item=math_stash[+i]; + if(item.type==='display'){ + return `
      ${esc(item.src)}
      `; + } + return `${esc(item.src)}`; + }); const parts=s.split(/\n{2,}/); s=parts.map(p=>{p=p.trim();if(!p)return '';if(/^<(h[1-6]|ul|ol|pre|hr|blockquote)/.test(p))return p;return `

      ${p.replace(/\n/g,'
      ')}

      `;}).join('\n'); return s; @@ -963,7 +985,7 @@ function renderMessages(){ } scrollToBottom(); // Apply syntax highlighting after DOM is built - requestAnimationFrame(()=>{highlightCode();addCopyButtons();renderMermaidBlocks();}); + requestAnimationFrame(()=>{highlightCode();addCopyButtons();renderMermaidBlocks();renderKatexBlocks();}); // Refresh todo panel if it's currently open if(typeof loadTodos==='function' && document.getElementById('panelTodos') && document.getElementById('panelTodos').classList.contains('active')){ loadTodos(); @@ -1237,6 +1259,47 @@ function renderMermaidBlocks(){ }); } +let _katexLoading=false; +let _katexReady=false; + +function renderKatexBlocks(){ + const blocks=document.querySelectorAll('.katex-block:not([data-rendered]),.katex-inline:not([data-rendered])'); + if(!blocks.length) return; + if(!_katexReady){ + if(!_katexLoading){ + _katexLoading=true; + const script=document.createElement('script'); + script.src='https://cdn.jsdelivr.net/npm/katex@0.16.22/dist/katex.min.js'; + script.integrity='sha384-cMkvdD8LoxVzGF/RPUKAcvmm49FQ0oxwDF3BGKtDXcEc+T1b2N+teh/OJfpU0jr6'; + script.crossOrigin='anonymous'; + script.onload=()=>{ + if(typeof katex!=='undefined'){ + _katexReady=true; + renderKatexBlocks(); + } + }; + document.head.appendChild(script); + } + return; + } + blocks.forEach(el=>{ + el.dataset.rendered='true'; + const src=el.textContent||''; + const displayMode=el.dataset.katex==='display'; + try{ + katex.render(src,el,{ + displayMode, + throwOnError:false, + trust:false, + strict:'ignore', + }); + }catch(e){ + // Leave as raw text in a code span on failure + el.outerHTML=`${esc(src)}`; + } + }); +} + function appendThinking(){ $('emptyState').style.display='none'; const row=document.createElement('div');row.className='msg-row';row.id='thinkingRow'; diff --git a/static/workspace.js b/static/workspace.js index 6ef59de..05fb745 100644 --- a/static/workspace.js +++ b/static/workspace.js @@ -150,7 +150,7 @@ async function toggleEditMode(){ _previewDirty=false; // Update read-only views if(_previewCurrentMode==='code') $('previewCode').textContent=content; - else $('previewMd').innerHTML=renderMd(content); + else { $('previewMd').innerHTML=renderMd(content); requestAnimationFrame(()=>{if(typeof renderKatexBlocks==='function')renderKatexBlocks();}); } $('previewEditArea').style.display='none'; if(_previewCurrentMode==='code') $('previewCode').style.display=''; else $('previewMd').style.display=''; @@ -215,6 +215,7 @@ async function openFile(path){ showPreview('md'); _previewRawContent = data.content; $('previewMd').innerHTML=renderMd(data.content); + requestAnimationFrame(()=>{if(typeof renderKatexBlocks==='function')renderKatexBlocks();}); }catch(e){setStatus(t('file_open_failed'));} } else { // Plain code / text -- but fall back to download if server signals binary diff --git a/tests/test_issue347.py b/tests/test_issue347.py new file mode 100644 index 0000000..a2d278b --- /dev/null +++ b/tests/test_issue347.py @@ -0,0 +1,348 @@ +""" +Tests for GitHub issue #347: KaTeX / LaTeX math rendering in chat and workspace previews. + +Structural tests — no server required. Verify: +- renderMd() stashes and restores $..$ and $$...$$ math delimiters +- KaTeX lazy-load function exists and follows the mermaid pattern +- KaTeX JS loaded from CDN with SRI integrity hash +- KaTeX CSS loaded in index.html with SRI hash +- CSS rules present for .katex-block and .katex-inline +- SAFE_TAGS updated to allow (for inline math) +- renderKatexBlocks() is wired into the requestAnimationFrame call +""" +import pathlib +import re + +REPO = pathlib.Path(__file__).parent.parent +UI_JS = (REPO / 'static' / 'ui.js').read_text(encoding='utf-8') +INDEX = (REPO / 'static' / 'index.html').read_text(encoding='utf-8') +CSS = (REPO / 'static' / 'style.css').read_text(encoding='utf-8') + + +# ── renderMd pipeline ────────────────────────────────────────────────────────── + +def test_display_math_stash_present(): + """renderMd must stash $$...$$ display math before other processing.""" + assert r'\$\$([\s\S]+?)\$\$' in UI_JS or '$$' in UI_JS, \ + 'Display math $$..$$ stash regex not found in ui.js' + # The stash uses \\x00M token + assert '\\x00M' in UI_JS, 'Math stash token \\x00M not found in renderMd' + + +def test_inline_math_stash_present(): + """renderMd must stash $..$ inline math.""" + # Inline math regex must be present + assert 'math_stash' in UI_JS, 'math_stash array not found in renderMd' + + +def test_katex_block_placeholder_emitted(): + """renderMd restore pass must emit .katex-block divs for display math.""" + assert 'katex-block' in UI_JS, \ + '.katex-block placeholder div not emitted by renderMd restore pass' + + +def test_katex_inline_placeholder_emitted(): + """renderMd restore pass must emit .katex-inline spans for inline math.""" + assert 'katex-inline' in UI_JS, \ + '.katex-inline placeholder span not emitted by renderMd restore pass' + + +def test_data_katex_attribute_present(): + """Placeholders must carry data-katex attribute for display/inline distinction.""" + assert 'data-katex' in UI_JS, \ + 'data-katex attribute not found — renderKatexBlocks cannot distinguish display from inline' + + +# ── renderKatexBlocks() ──────────────────────────────────────────────────────── + +def test_render_katex_blocks_function_exists(): + """renderKatexBlocks() function must exist in ui.js.""" + assert 'function renderKatexBlocks()' in UI_JS, \ + 'renderKatexBlocks() function not found in ui.js' + + +def test_katex_lazy_load_follows_mermaid_pattern(): + """KaTeX must use the same lazy-load pattern as mermaid (load on first use).""" + assert '_katexLoading' in UI_JS, '_katexLoading flag not found' + assert '_katexReady' in UI_JS, '_katexReady flag not found' + + +def test_katex_js_loaded_from_cdn(): + """KaTeX JS must be loaded from jsdelivr CDN.""" + assert 'katex@0.16' in UI_JS, \ + 'KaTeX JS CDN URL not found in ui.js — expected katex@0.16.x' + + +def test_katex_js_has_sri_hash(): + """KaTeX JS CDN tag must have an SRI integrity hash.""" + # The hash is in the script.integrity assignment + assert "script.integrity='sha384-" in UI_JS or 'script.integrity="sha384-' in UI_JS, \ + 'KaTeX JS SRI integrity hash not found in ui.js' + + +def test_katex_display_mode_used(): + """renderKatexBlocks must pass displayMode based on data-katex attribute.""" + assert 'displayMode' in UI_JS, \ + 'displayMode not passed to katex.render() — display math will render inline' + + +def test_katex_throw_on_error_false(): + """KaTeX must be configured with throwOnError:false to degrade gracefully.""" + assert 'throwOnError:false' in UI_JS, \ + 'throwOnError:false not set — bad LaTeX will throw and break the message' + + +def test_render_katex_blocks_wired_into_raf(): + """renderKatexBlocks() must be called in the same requestAnimationFrame as renderMermaidBlocks().""" + # Check that renderKatexBlocks appears somewhere near requestAnimationFrame + raf_idx = UI_JS.find('requestAnimationFrame') + # Find the rAF call that also contains renderKatexBlocks + has_katex_in_raf = any( + 'renderKatexBlocks' in UI_JS[m.start():m.start()+200] + for m in re.finditer(r'requestAnimationFrame', UI_JS) + ) + assert has_katex_in_raf, \ + 'renderKatexBlocks() not found in any requestAnimationFrame call — math will not render' + + +# ── index.html ──────────────────────────────────────────────────────────────── + +def test_katex_css_in_index_html(): + """KaTeX CSS must be loaded in index.html.""" + assert 'katex@0.16' in INDEX, \ + 'KaTeX CSS CDN link not found in index.html' + + +def test_katex_css_has_sri_hash(): + """KaTeX CSS link in index.html must have an SRI integrity hash.""" + assert 'sha384-5TcZemv2l' in INDEX or 'integrity' in INDEX and 'katex' in INDEX, \ + 'KaTeX CSS SRI integrity hash not found in index.html' + + +# ── style.css ───────────────────────────────────────────────────────────────── + +def test_katex_block_css_present(): + """.katex-block CSS rule must exist for centered display math.""" + assert '.katex-block' in CSS, \ + '.katex-block CSS rule missing from style.css — display math will have no layout' + + +def test_katex_inline_css_present(): + """.katex-inline CSS rule must exist.""" + assert '.katex-inline' in CSS, \ + '.katex-inline CSS rule missing from style.css' + + +def test_katex_block_text_align_center(): + """.katex-block must be text-align:center for display math.""" + assert 'text-align:center' in CSS, \ + 'text-align:center not found for .katex-block' + + +# ── SAFE_TAGS ────────────────────────────────────────────────────────────────── + +def test_safe_tags_includes_span(): + """SAFE_TAGS must include to allow .katex-inline spans through the escape pass.""" + # The SAFE_TAGS regex should contain 'span' + safe_tags_match = re.search(r'SAFE_TAGS\s*=\s*/.*?/i', UI_JS) + assert safe_tags_match, 'SAFE_TAGS pattern not found in ui.js' + assert 'span' in safe_tags_match.group(), \ + ' not in SAFE_TAGS — inline math spans will be HTML-escaped and rendered as text' + + +# ── Stash ordering: fence must protect code spans from math extraction ───────── + +WORKSPACE_JS = (REPO / 'static' / 'workspace.js').read_text(encoding='utf-8') + + +def test_fence_stash_before_math_stash(): + """fence_stash must be initialized and populated BEFORE math_stash in renderMd. + + If math_stash runs first, dollar signs inside backtick code spans are extracted + as math, leaving placeholder tokens inside the stashed code string. The code span + then renders with KaTeX inside instead of the literal dollar-sign text. + """ + fence_pos = UI_JS.find("const fence_stash=[]") + math_pos = UI_JS.find("const math_stash=[]") + assert fence_pos != -1, "fence_stash not found in renderMd" + assert math_pos != -1, "math_stash not found in renderMd" + assert fence_pos < math_pos, ( + "fence_stash must be declared BEFORE math_stash in renderMd " + f"(fence at char {fence_pos}, math at char {math_pos}). " + "If math runs first, `$x$` inside backticks gets extracted as math instead of code." + ) + + +def test_fence_stash_populated_before_math_stash(): + """The fence_stash s.replace call must appear before any math_stash s.replace calls.""" + # Find the s.replace call that populates each stash + fence_replace_pos = UI_JS.find("fence_stash.push(m)") + math_replace_pos = UI_JS.find("math_stash.push(") + assert fence_replace_pos != -1, "fence_stash population call not found" + assert math_replace_pos != -1, "math_stash population call not found" + assert fence_replace_pos < math_replace_pos, ( + "fence_stash must be populated before math_stash to protect code span contents" + ) + + +def test_math_stash_comment_says_after_fence(): + """The math stash comment should explain it runs AFTER fence_stash, not before.""" + # Should not have the old misleading comment + assert "Must run BEFORE fence_stash" not in UI_JS, ( + "Old misleading comment still present. Math stash runs AFTER fence_stash. " + "The comment should say 'Runs AFTER fence_stash'." + ) + + +# ── Pipeline regression: code spans protect their contents ──────────────────── + +def test_math_restore_after_fence_restore(): + """Math stash tokens are restored AFTER fence restore, so code spans get + their raw text back (not KaTeX placeholders).""" + fence_restore_pos = UI_JS.find("fence_stash[+i]") + math_restore_pos = UI_JS.find("math_stash[+i]") + assert fence_restore_pos != -1, "fence_stash restore not found" + assert math_restore_pos != -1, "math_stash restore not found" + # Both restores must exist; their relative order doesn't matter for correctness + # (they use different tokens: \x00F vs \x00M), but we assert both exist + assert fence_restore_pos != math_restore_pos, "fence and math restore must be separate calls" + + +def test_stash_tokens_distinct(): + """fence_stash and math_stash must use distinct sentinel tokens to avoid collisions.""" + # fence uses \x00F, math uses \x00M (or similar unique prefix) + # The JS source uses escaped \\x00F and \\x00M as sentinel characters + # In the Python string read from the file these appear as '\\\\x00F' and '\\\\x00M' + assert "'\\\\x00F'" in UI_JS or 'x00F' in UI_JS, ( + "fence stash token (\\x00F) not found — must be distinct from math token" + ) + assert "'\\\\x00M'" in UI_JS or 'x00M' in UI_JS, ( + "math stash token (\\x00M) not found — must be distinct from fence token" + ) + # The two tokens must use different discriminator characters + assert 'x00F' in UI_JS and 'x00M' in UI_JS, ( + "Both \\x00F (fence) and \\x00M (math) tokens must exist" + ) + + +# ── Workspace preview renderKatexBlocks wiring ──────────────────────────────── + +def test_workspace_calls_render_katex_after_preview(): + """workspace.js must call renderKatexBlocks() after setting previewMd.innerHTML. + + Without this, math placeholders appear in workspace file previews but are never + rendered by KaTeX (renderKatexBlocks is only wired into renderMessages rAF). + """ + assert "renderKatexBlocks" in WORKSPACE_JS, ( + "workspace.js must call renderKatexBlocks() after renderMd() for file previews" + ) + + +def test_workspace_renders_katex_after_file_open(): + """workspace.js renderKatexBlocks call must come after the renderMd(data.content) assignment.""" + preview_md_pos = WORKSPACE_JS.find("renderMd(data.content)") + # Use the actual call string (not a stray regex match on 'M' characters) + katex_call_str = "renderKatexBlocks==='function'" + katex_call_pos = WORKSPACE_JS.find(katex_call_str) + assert preview_md_pos != -1, "renderMd(data.content) not found in workspace.js" + assert katex_call_pos != -1, ( + "renderKatexBlocks guard (typeof renderKatexBlocks==='function') not found in workspace.js" + ) + # The call after 'renderMd(data.content)' — find the LAST occurrence + # (there may be an earlier one in the save path at line ~153) + last_katex_pos = WORKSPACE_JS.rfind(katex_call_str) + assert last_katex_pos > preview_md_pos, ( + "renderKatexBlocks must be called AFTER renderMd(data.content) in workspace.js " + f"(renderMd at {preview_md_pos}, last renderKatexBlocks at {last_katex_pos})" + ) + + +def test_workspace_katex_guarded_by_typeof(): + """workspace.js renderKatexBlocks call must guard with typeof check for safety + in case KaTeX feature is not loaded (e.g. test environments, offline).""" + assert "typeof renderKatexBlocks" in WORKSPACE_JS, ( + "workspace.js must guard renderKatexBlocks call with typeof check: " + "if(typeof renderKatexBlocks==='function')renderKatexBlocks()" + ) + + +# ── SAFE_TAGS: span addition should not expand attack surface ───────────────── + +def test_safe_tags_span_is_narrowly_scoped(): + """SAFE_TAGS adding is only a bypass if span carries dangerous attributes. + Verify the SAFE_TAGS regex tests the tag NAME only, not arbitrary attributes. + The rest of the pipeline uses esc() for user content, so attribute injection + into KaTeX spans isn't possible. + """ + # The SAFE_TAGS regex must still require a word boundary / tag-end pattern + safe_tags_match = re.search(r"SAFE_TAGS\s*=\s*/(.+?)/i", UI_JS) + if not safe_tags_match: + safe_tags_match = re.search(r'SAFE_TAGS\s*=\s*/(.*?)/i', UI_JS) + assert safe_tags_match, "SAFE_TAGS regex not found" + pattern = safe_tags_match.group(1) + # Must have a trailing boundary check — ([\s>]|$) or similar + assert r"[\s>]" in pattern or r'[\s>]' in pattern, ( + "SAFE_TAGS must enforce a boundary after the tag name to prevent " + " from matching when checking for " + ) + + +# ── False-positive prevention ───────────────────────────────────────────────── + +def test_inline_math_regex_requires_non_space_boundaries(): + """The $...$ inline regex must require non-space at both boundaries. + + This prevents 'costs $5 and $10' from matching — the space after the opening + $ means it's a currency amount, not math. + """ + # The inline math stash push is type:'inline' — find its containing replace() line + inline_push_idx = UI_JS.find("type:'inline',src:m") + assert inline_push_idx != -1, "Inline math stash push not found" + # Get the text from the start of that line back to find the regex + line_start = UI_JS.rfind('\n', 0, inline_push_idx) + 1 + inline_line = UI_JS[line_start:inline_push_idx + 50] + # The regex must use \s (via [^\s...]) to exclude spaces at boundaries + assert '\\s' in inline_line or '[^' in inline_line, ( + f"Inline math regex must exclude spaces at boundaries to prevent false " + f"positives on currency like $5. Found: {inline_line[:120]}" + ) +def test_display_math_stashed_before_inline(): + """$$...$$ display math must be stashed before $...$ inline math. + + If inline runs first on '$$x$$', it could match '$' + 'x' + '$' leaving + a stray outer '$', corrupting the output. + """ + display_pos = UI_JS.find("type:'display',src:m") + inline_pos = UI_JS.find("type:'inline',src:m") + assert display_pos != -1, "display math stash not found" + assert inline_pos != -1, "inline math stash not found" + # First occurrence of display must be before first occurrence of inline + assert display_pos < inline_pos, ( + "Display math ($$...$$) must be stashed before inline math ($...$) " + "to prevent $$ from being parsed as two adjacent inline delimiters" + ) + + +def test_math_stash_token_uses_single_backslash_null_byte(): + """Math stash tokens must use the null-byte form (single backslash x00M). + + The restore regex expects a null byte character. If the stash emits + a literal backslash+x00M (double backslash = 5-char string), the restore + regex never matches and the tokens appear verbatim in the rendered output. + + The fence_stash correctly uses the null byte convention. Math stash must be consistent. + """ + # In the source file, the correct form is: return '\x00M' + # The wrong form (double backslash) would be: return '\\x00M' + # Check that no double-backslash form exists in the math stash return statements + import re + bad_returns = re.findall(r"return\s+'\\\\x00M'", UI_JS) + assert not bad_returns, ( + f"Found {len(bad_returns)} math stash return(s) using double-backslash \\\\x00M. " + "Must use single backslash '\x00M' (null byte) to match the restore regex." + ) + # Positive check: single-backslash form must exist + good_returns = re.findall(r"math_stash\.push.*?return '\\x00M'", UI_JS, re.DOTALL) + assert good_returns, ( + "Math stash return must use single-backslash '\x00M' (null byte convention)" + )