v0.50.207: batch of 10 PRs — TPS stat, SSE guard, session polish, cron UX, folder create, model errors, session speed, title gen (#1031)

* fix: remove orphaned i18n keys from top-level LOCALES object Three Traditional Chinese translation keys (cmd_status, memory_saved, profile_delete_title) were placed outside any locale block between the en and ru blocks in static/i18n.js. They became top-level properties of the LOCALES object, causing them to appear as invalid language options in the Settings > Preferences dropdown. The correct translations already exist in the zh-Hant locale block. Fixes #1008 * fix: block stale SSE events from polluting new session's DOM - appendThinking(): guard with !S.session||!S.activeStreamId to drop events from a previous session's SSE stream during a session switch - appendLiveToolCard(): same guard for consistency - finalizeThinkingCard(): scroll thinking-card-body to top when scroll is pinned, so completed response is immediately visible - appendThinking(): auto-scroll thinking card body to bottom while streaming if user is watching (scroll pinned) * Fix empty agent sessions in sidebar * fix: resolve cron UI UX issues — icon ambiguity, toast overlap, running status Fixes #995 — three sub-issues in the Cron Jobs UI: 1. Dual play icons ambiguous: Resume button now shows a distinct play+bar icon (play triangle + vertical line) instead of the identical triangle used by Run now. 2. Toast notification overlapping header buttons: Added position:relative; z-index:10 to .main-view-header so it stacks above the fixed toast (z-index:100 within its layer). 3. No running status after trigger: After triggering a job, the status badge immediately shows 'running…' with a CSS spinner animation, and polls the cron list every 3s (up to 30s) to refresh when the job completes. - Added cron_status_running i18n key in all 5 locales (en, es, de, ru, zh, zh-Hant) - Added .detail-badge.running CSS class with spinner animation - New functions: _setCronDetailStatus(), _startCronRunningPoll() * fix(#1011): address review feedback — poll cleanup, badge persistence, 30s fallback - _clearCronDetail() now clears _cronRunningPoll interval on navigation - Poll re-applies 'running' badge after loadCrons() re-render (prevents flicker) - When poll ends (30s max), detail re-renders with actual status as fallback * feat: create folder and add space directly from UI (#782) - After creating a folder via the file tree New folder button, offer to add it as a space via confirm dialog - Add Create folder if it doesnt exist checkbox in the New Space form - Backend: support create flag in /api/workspaces/add to mkdir before validation - i18n: 4 new keys (folder_add_as_space_title/msg/btn, workspace_auto_create_folder) in all 6 locales * fix: validate workspace path before mkdir to prevent orphan directories Review feedback (critical): the previous code called mkdir() before validate_workspace_to_add(), which meant a rejected path (e.g. system dir) would leave an orphan directory on disk. New flow: 1. Resolve path and check against blocked system roots BEFORE any mutation 2. mkdir() only if path passes the blocklist check 3. Full validation (exists, is_dir) after mkdir Also imports _workspace_blocked_roots for the pre-mutation blocklist check. * fix(#1014): classify model-not-found errors with helpful message - Add model_not_found error type to streaming.py exception classifier - Detect 404, 'not found', 'does not exist', 'invalid model' patterns - Strip HTML tags from provider error messages (nginx 404 pages, etc.) - Add model_not_found branch to apperror handler in messages.js - Add i18n key model_not_found_label in all 6 locales - 15 tests covering detection, sanitization, frontend, and i18n * feat(ui): add live TPS stat to header Adds a TPS (Tokens Per Second) chip to the right of the header title bar that updates live while AI output is streaming. Metering (api/metering.py) - Tracks per-session output + reasoning tokens via GlobalMeter singleton - Per-session TPS = total_tokens / elapsed_time - Global TPS = average of active sessions' TPS values - HIGH/LOW are max/min of global_tps snapshots over a 60-minute rolling window (only recorded when > 0, so idle periods are excluded) - Thread-safe with a single lock Metering events emitted from streaming.py - Throttled at 100ms from token/reasoning/tool callbacks so the display updates rapidly during fast token streams - 1Hz ticker as fallback for slow streams (exits when no active sessions) - Final stats emitted on stream end Routes (api/routes.py) - Removed POST /api/metering/interval endpoint (dynamic interval via focus/blur was replaced with simple always-1s-when-active approach) UI (static/messages.js, index.html, style.css) - TPS chip in titlebar: shows 'N.N t/s . N.N high . N.N low' - Default: '0.0 t/s . 0.0 high' when idle - Display updates on every metering SSE event (throttled to 100ms) * feat: session restore speed + title gen reasoning hardening (#1025, #1026) PR #1025 (@franksong2702): Speed up large session restore paths - GET /api/session?messages=0 now parses only metadata before the messages array - Metadata-only loads no longer populate the full-session LRU cache - Frontend lazy fetch uses resolve_model=0 to avoid cold model-catalog lookup - Hard reload no longer waits for populateModelDropdown() before restoring session PR #1026 (@franksong2702): Harden auto title generation for reasoning models - Raises title-gen completion budget to 512 tokens (reasoning-safe) - Retries once with 1024 tokens on empty content / finish_reason:length - Applies retry to both auxiliary and active-agent fallback routes - Preserves underlying failure reason in title_status on local fallback Co-authored-by: Frank Song <franksong2702@gmail.com> * feat: session attention indicators in right slot + last_message_at timestamps (#1024) PR #1024 (@franksong2702): Polish session attention indicators - Streaming spinners and unread dots now reuse the right-side actions slot - Running/unread rows hide timestamps; idle/read rows keep right-aligned timestamps - Date group carets point down when expanded, right when collapsed - Pinned group no longer repeats pinned-star icon per row - Running indicators appear immediately after send (local busy state while /api/sessions catches up) - Sidebar sorting/grouping/timestamps now prefer last_message_at (derived from last real message) so metadata-only saves don't make old sessions appear under Today Co-authored-by: Frank Song <franksong2702@gmail.com> * docs: v0.50.207 release notes — 10 PRs, 2169 tests (+36) --------- Co-authored-by: bergeouss <bergeouss@users.noreply.github.com> Co-authored-by: Josh <josh@fyul.link> Co-authored-by: Frank Song <franksong2702@gmail.com> Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
2026-05-25 11:10:18 +00:00 · 2026-04-25 13:07:35 -07:00
parent 12a8c051fb
commit ad8e10304c
24 changed files with 1655 additions and 261 deletions
@@ -3,7 +3,24 @@
 ## [Unreleased]

 ### Fixed
- **Reasoning chip now appears after the model chip** in the composer toolbar — model is a more fundamental choice and should be stable in position regardless of whether reasoning is active. Order: Profile → Workspace → Model → Reasoning. (`static/index.html`)
+
+## v0.50.207 — 2026-04-25
+
+### Added
+- **Live TPS stat in header** — a monospace chip in the titlebar shows tokens per second during streaming, with HIGH watermark from the past hour. Emitted via SSE at 1 Hz during active streams; hidden when idle. (`api/metering.py`, `api/streaming.py`, `static/messages.js`, `static/style.css`) [#1005 @JKJameson]
+
+### Fixed
+- **Stale SSE events no longer pollute the new session's DOM on session switch** — `appendThinking()` and `appendLiveToolCard()` now guard against events from a prior session's stream arriving after the user has switched sessions. Thinking card also auto-scrolls to top on completion so the response is immediately visible. (`static/ui.js`) [#1006 @JKJameson]
+- **Show agent sessions no longer shows empty/unimportable rows** — `state.db` can contain agent session rows before any messages are written. The sidebar now filters those out consistently across both the regular `/api/sessions` path and the gateway SSE watcher. (`api/agent_sessions.py`, `api/gateway_watcher.py`, `api/models.py`) [#1009 @franksong2702]
+- **Three orphaned i18n keys removed from language dropdown** — `cmd_status`, `memory_saved`, and `profile_delete_title` were placed outside any locale block in `static/i18n.js`, causing them to appear as invalid language options. (`static/i18n.js`) [#1010 @bergeouss]
+- **Cron panel UX polish** — Resume button SVG now uses a ▶| icon to distinguish it from Run; toast overlap fixed with `z-index` on the header; running-state badge with spinner shows during active jobs; `_cronRunningPoll` clears correctly on panel close. (`static/panels.js`, `static/index.html`, `static/style.css`, `static/i18n.js`) [#1011 @bergeouss]
+- **Create Folder and Add as Space from the browser** — users can now create directories and immediately register them as workspace spaces without SSH access; server validates paths against blocked roots before `mkdir`. (`api/routes.py`, `static/ui.js`, `static/panels.js`, `static/i18n.js`) [#1018 @bergeouss]
+- **Model-not-found errors now show a helpful message** — when a provider returns a 404 (e.g. Qwen model not available), the error is classified and a user-friendly hint appears instead of a raw HTML page. All 6 locales covered. (`api/streaming.py`, `static/messages.js`, `static/i18n.js`) [#1022 @bergeouss]
+- **Session attention indicators moved to right-side actions slot** — streaming spinners and unread dots no longer sit before the session title, avoiding title shifts. Running/unread rows hide the timestamp; idle/read rows keep right-aligned timestamps. Date group carets now point down/right correctly. Pinned group no longer repeats the star icon per row. (`static/sessions.js`, `static/style.css`) [#1024 @franksong2702]
+- **Session sidebar dates now use the last real message time** — sorting, grouping, and relative timestamps prefer `last_message_at` derived from the last non-tool message instead of metadata-only `updated_at`, so changing session settings doesn't move old conversations to Today. (`api/models.py`, `api/routes.py`) [#1024 @franksong2702]
+- **Running indicators appear immediately after send** — the sidebar now treats the active local busy session and local in-flight sessions as streaming while `/api/sessions` catches up. (`static/messages.js`, `static/sessions.js`) [#1024 @franksong2702]
+- **Large session switching and reload no longer block on cold model-catalog resolution** — `GET /api/session?messages=0` now parses only the JSON metadata prefix; metadata-only loads skip the full-session LRU cache; the frontend lazy fetch passes `resolve_model=0`; hard reload no longer waits for `populateModelDropdown()`. (`api/models.py`, `api/routes.py`, `static/boot.js`, `static/sessions.js`, `static/ui.js`) [#1025 @franksong2702]
+- **Auto title generation hardened for reasoning models** — title generation now uses a 512-token reasoning-safe budget, retries once with 1024 tokens on empty content or `finish_reason: length`, and preserves the underlying failure reason in `title_status` when falling back to a local summary. (`api/streaming.py`) [#1026 @franksong2702]

 ## v0.50.206 — 2026-04-25

@@ -8,7 +8,7 @@
 > Prerequisites: SSH tunnel is active on port 8787. Open http://localhost:8787 in browser.
 > Server health check: curl http://127.0.0.1:8787/health should return {"status":"ok"}.
 >
-> Automated coverage: 2107 tests collected via `pytest tests/ --collect-only -q`. Includes onboarding coverage for bootstrap/static wizard presence, real provider config persistence (`config.yaml` + `.env`), the `/api/onboarding/*` backend, the onboarding skip/existing-config guard, and CSS regression coverage for smooth thinking/tool card disclosure animation.
+> Automated coverage: 2169 tests collected via `pytest tests/ --collect-only -q`. Includes onboarding coverage for bootstrap/static wizard presence, real provider config persistence (`config.yaml` + `.env`), the `/api/onboarding/*` backend, the onboarding skip/existing-config guard, and CSS regression coverage for smooth thinking/tool card disclosure animation.
 > Run: `pytest tests/ -v --timeout=60`
 >
 > Local regression focus: verify that a previously closed workspace panel stays visually closed from first paint through boot completion on desktop refresh; there should be no brief open-then-close flash.
@@ -0,0 +1,55 @@
+"""Shared helpers for reading Hermes Agent sessions from state.db."""
+import logging
+import sqlite3
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+
+def read_importable_agent_session_rows(db_path: Path, limit: int = 200, log=None) -> list[dict]:
+    """Return non-WebUI agent sessions that have readable message rows.
+
+    Hermes Agent can create rows in ``state.db.sessions`` before a session has
+    any messages. WebUI cannot import those rows, so both the regular
+    ``/api/sessions`` path and the gateway SSE watcher must filter them the
+    same way.
+    """
+    db_path = Path(db_path)
+    if not db_path.exists():
+        return []
+
+    log = log or logger
+    with sqlite3.connect(str(db_path)) as conn:
+        conn.row_factory = sqlite3.Row
+        cur = conn.cursor()
+
+        # Older Hermes Agent versions may not have source tracking. Without a
+        # source column we cannot safely distinguish WebUI rows from agent rows.
+        cur.execute("PRAGMA table_info(sessions)")
+        session_cols = {row[1] for row in cur.fetchall()}
+        if 'source' not in session_cols:
+            log.warning(
+                "agent session listing skipped: state.db at %s has no 'source' column "
+                "(older hermes-agent?). Agent sessions unavailable. "
+                "Upgrade hermes-agent to fix this.",
+                db_path,
+            )
+            return []
+
+        cur.execute(
+            """
+            SELECT s.id, s.title, s.model, s.message_count,
+                   s.started_at, s.source,
+                   COUNT(m.id) AS actual_message_count,
+                   MAX(m.timestamp) AS last_activity
+            FROM sessions s
+            LEFT JOIN messages m ON m.session_id = s.id
+            WHERE s.source IS NOT NULL AND s.source != 'webui'
+            GROUP BY s.id
+            HAVING COUNT(m.id) > 0
+            ORDER BY COALESCE(MAX(m.timestamp), s.started_at) DESC
+            LIMIT ?
+            """,
+            (int(limit),),
+        )
+        return [dict(row) for row in cur.fetchall()]
@@ -13,12 +13,12 @@ import json
 import logging
 import os
 import queue
-import sqlite3
 import threading
 import time
 from pathlib import Path

 from api.config import HOME
+from api.agent_sessions import read_importable_agent_session_rows

 logger = logging.getLogger(__name__)

@@ -55,33 +55,18 @@ def _get_agent_sessions_from_db() -> list:
        return []

    try:
-        with sqlite3.connect(str(db_path)) as conn:
-            conn.row_factory = sqlite3.Row
-            cur = conn.cursor()
-            cur.execute("""
-                SELECT s.id, s.title, s.model, s.message_count,
-                       s.started_at, s.source,
-                       MAX(m.timestamp) AS last_activity
-                FROM sessions s
-                LEFT JOIN messages m ON m.session_id = s.id
-                WHERE s.source IS NOT NULL AND s.source != 'webui'
-                GROUP BY s.id
-                HAVING COUNT(m.id) > 0
-                ORDER BY COALESCE(MAX(m.timestamp), s.started_at) DESC
-                LIMIT 200
-            """)
-            sessions = []
-            for row in cur.fetchall():
-                sessions.append({
-                    'session_id': row['id'],
-                    'title': row['title'] or 'Agent Session',
-                    'model': row['model'] or None,
-                    'message_count': row['message_count'] or 0,
-                    'created_at': row['started_at'],
-                    'updated_at': row['last_activity'] or row['started_at'],
-                    'source': row['source'] or 'cli',
-                })
-            return sessions
+        sessions = []
+        for row in read_importable_agent_session_rows(db_path, limit=200, log=logger):
+            sessions.append({
+                'session_id': row['id'],
+                'title': row['title'] or 'Agent Session',
+                'model': row['model'] or None,
+                'message_count': row['message_count'] or row['actual_message_count'] or 0,
+                'created_at': row['started_at'],
+                'updated_at': row['last_activity'] or row['started_at'],
+                'source': row['source'] or 'cli',
+            })
+        return sessions
    except Exception:
        return []

@@ -0,0 +1,187 @@
+"""
+Hermes Web UI -- Streaming performance metering.
+
+Tracks Tokens Per Second (TPS) across all active WebUI sessions, and the
+HIGH/LOW TPS values observed over the past 60 minutes.  Metering data is
+emitted via SSE events so the header label can update live during a stream.
+
+Architecture
+────────────
+Each streaming session is tracked independently.  TPS per session is:
+
+    session_tps = total_tokens / (last_token_ts - first_token_ts)
+
+The global tps is the average of all currently active sessions' TPS values.
+This correctly represents the system's real-time capacity regardless of how
+many sessions are running or how long each has been streaming.
+
+For HIGH/LOW tracking, every stats snapshot records the current global tps
+(only when > 0 — idle periods are skipped) into a rolling 60-minute history.
+The max/min of that history gives the peak throughput observed over the past hour.
+
+The ticker in streaming.py calls get_interval() — it returns 1.0 when sessions
+are actively receiving tokens so the header updates at 1 Hz, and 10.0 when idle
+so the ticker exits and no idle readings are emitted.
+
+Usage from api/streaming.py
+─────────────────────────────
+  from api.metering import meter
+
+  meter().begin_session(stream_id)                     # stream starts
+  meter().record_token(stream_id, running_output)     # per output token
+  meter().record_reasoning(stream_id, running_reasoning_len)  # per reasoning token
+
+The SSE `metering` event payload:
+  {
+    "tps": 47.3,    # average TPS across active sessions (real-time)
+    "high": 52.1,   # highest average TPS observed in the past 60 minutes
+    "low":  31.4,   # lowest average TPS (excl. readings < 1 tps, to ignore idle)
+    "active": 1,    # sessions currently streaming
+  }
+"""
+
+from __future__ import annotations
+
+import threading
+import time
+from dataclasses import dataclass
+
+_HOUR_SECS = 3600.0   # rolling window for HIGH/LOW tracking
+_STALE_SECS = 60.0    # consider a session inactive after this
+
+
+@dataclass
+class _SessionMeter:
+    output_tokens: int = 0
+    reasoning_tokens: int = 0
+    first_token_ts: float = 0.0   # time.monotonic() of first token received
+    last_token_ts: float = 0.0    # time.monotonic() of last token received
+
+    def total_tokens(self) -> int:
+        return self.output_tokens + self.reasoning_tokens
+
+    def tps(self) -> float:
+        if self.first_token_ts == 0.0 or self.last_token_ts <= self.first_token_ts:
+            return 0.0
+        return self.total_tokens() / (self.last_token_ts - self.first_token_ts)
+
+
+class GlobalMeter:
+    """Thread-safe global streaming meter.
+
+    Tracks per-session TPS, averages them for a global tps, and maintains a
+    60-minute rolling history of global tps snapshots for HIGH/LOW reporting.
+    """
+
+    __slots__ = (
+        '_lock',
+        '_sessions',        # stream_id -> _SessionMeter
+        '_readings',        # [(monotonic_ts, tps), ...] rolling 60-minute history
+        '_window_start',    # monotonic ts of current window
+    )
+
+    def __init__(self) -> None:
+        self._lock = threading.Lock()
+        self._sessions: dict[str, _SessionMeter] = {}
+        self._readings: list[tuple[float, float]] = []
+        self._window_start: float = time.monotonic()
+
+    # ── Public API ────────────────────────────────────────────────────────────
+
+    def begin_session(self, stream_id: str) -> None:
+        with self._lock:
+            self._sessions[stream_id] = _SessionMeter()
+
+    def get_interval(self) -> float:
+        """Return 1.0 when sessions are actively receiving tokens, 10.0 when idle.
+
+        Used by the streaming ticker to run at 1 Hz during work and exit when
+        there is nothing to measure.
+        """
+        now = time.monotonic()
+        with self._lock:
+            # Only count sessions that have received at least one token recently.
+            active_sids = {
+                sid for sid, s in self._sessions.items()
+                if s.first_token_ts > 0 and (now - s.last_token_ts) <= _STALE_SECS
+            }
+            return 1.0 if active_sids else 10.0
+
+    def record_token(self, stream_id: str, running_output_tokens: int) -> None:
+        now = time.monotonic()
+        with self._lock:
+            s = self._sessions.get(stream_id)
+            if s is None:
+                return
+            if s.first_token_ts == 0.0:
+                s.first_token_ts = now
+            s.last_token_ts = now
+            s.output_tokens = running_output_tokens
+
+    def record_reasoning(self, stream_id: str, running_reasoning_tokens: int) -> None:
+        now = time.monotonic()
+        with self._lock:
+            s = self._sessions.get(stream_id)
+            if s is None:
+                return
+            if s.first_token_ts == 0.0:
+                s.first_token_ts = now
+            s.last_token_ts = now
+            s.reasoning_tokens = running_reasoning_tokens
+
+    def end_session(self, stream_id: str, final_output_tokens: int, input_tokens: int = 0) -> None:
+        with self._lock:
+            self._sessions.pop(stream_id, None)
+
+    def get_stats(self) -> dict:
+        now = time.monotonic()
+        with self._lock:
+            # Prune stale sessions
+            stale = [
+                sid for sid, s in self._sessions.items()
+                if s.first_token_ts > 0 and (now - s.last_token_ts) > _STALE_SECS
+            ]
+            for sid in stale:
+                self._sessions.pop(sid, None)
+
+            # Reset window if everything went stale
+            if not self._sessions:
+                self._window_start = now
+
+            # Compute global tps: average of per-session TPS values
+            active = [s for s in self._sessions.values() if s.first_token_ts > 0]
+            if active:
+                global_tps = sum(s.tps() for s in active) / len(active)
+            else:
+                global_tps = 0.0
+
+            # Prune readings older than 1 hour
+            cutoff = now - _HOUR_SECS
+            self._readings = [(ts, v) for ts, v in self._readings if ts > cutoff]
+
+            # Only record this snapshot for HIGH/LOW if there is active work.
+            # This prevents idle periods from flooding the history and keeps
+            # HIGH/LOW meaningful for the past hour of actual throughput.
+            if global_tps > 0:
+                self._readings.append((now, global_tps))
+
+            # HIGH/LOW from the past hour (skip near-zero idle readings)
+            active_readings = [v for _, v in self._readings if v >= 1.0]
+            high = max(active_readings) if active_readings else 0.0
+            low = min(active_readings) if active_readings else 0.0
+
+            return {
+                'tps': round(global_tps, 1),
+                'high': round(high, 1),
+                'low': round(low, 1),
+                'active': len(self._sessions),
+            }
+
+
+# ── Module-level singleton ─────────────────────────────────────────────────────
+
+_meter = GlobalMeter()
+
+
+def meter() -> GlobalMeter:
+    return _meter
@@ -15,6 +15,7 @@ from api.config import (
    get_effective_default_model,
 )
 from api.workspace import get_last_workspace
+from api.agent_sessions import read_importable_agent_session_rows

 logger = logging.getLogger(__name__)

@@ -193,6 +194,114 @@ def _active_stream_ids():
 def _is_streaming_session(active_stream_id, active_stream_ids):
    return bool(active_stream_id and active_stream_id in active_stream_ids)

+def _session_sort_timestamp(session):
+    if isinstance(session, dict):
+        return session.get('last_message_at') or session.get('updated_at') or 0
+    return _last_message_timestamp(getattr(session, 'messages', None)) or getattr(session, 'updated_at', 0) or 0
+
+
+def _message_timestamp(message):
+    if not isinstance(message, dict):
+        return None
+    raw = message.get('_ts') or message.get('timestamp')
+    try:
+        return float(raw) if raw is not None else None
+    except (TypeError, ValueError):
+        return None
+
+
+def _last_message_timestamp(messages):
+    if not isinstance(messages, list):
+        return None
+    for message in reversed(messages):
+        if isinstance(message, dict) and message.get('role') == 'tool':
+            continue
+        ts = _message_timestamp(message)
+        if ts:
+            return ts
+    return None
+
+
+def _find_top_level_json_key(text, key):
+    """Return the byte offset of a top-level JSON object key, if present."""
+    depth = 0
+    i = 0
+    n = len(text)
+    while i < n:
+        ch = text[i]
+        if ch == '"':
+            start = i
+            i += 1
+            escaped = False
+            chars = []
+            while i < n:
+                c = text[i]
+                if escaped:
+                    chars.append(c)
+                    escaped = False
+                elif c == '\\':
+                    escaped = True
+                elif c == '"':
+                    break
+                else:
+                    chars.append(c)
+                i += 1
+            if i >= n:
+                return None
+            if depth == 1 and ''.join(chars) == key:
+                j = i + 1
+                while j < n and text[j] in ' \t\r\n':
+                    j += 1
+                if j < n and text[j] == ':':
+                    return start
+        elif ch in '{[':
+            depth += 1
+        elif ch in '}]':
+            depth -= 1
+        i += 1
+    return None
+
+
+def _read_metadata_json_prefix(path, max_prefix_bytes=65536):
+    """Read only the metadata portion before the top-level messages array."""
+    buf = ''
+    with open(path, 'r', encoding='utf-8') as f:
+        while len(buf.encode('utf-8')) < max_prefix_bytes:
+            chunk = f.read(4096)
+            if not chunk:
+                return None
+            buf += chunk
+            messages_pos = _find_top_level_json_key(buf, 'messages')
+            if messages_pos is None:
+                continue
+            prefix = buf[:messages_pos].rstrip()
+            if prefix.endswith(','):
+                prefix = prefix[:-1].rstrip()
+            return f'{prefix}\n}}'
+    return None
+
+
+def _lookup_index_message_count(session_id):
+    """Return the indexed message count without loading the full session file."""
+    try:
+        entries = json.loads(SESSION_INDEX_FILE.read_text(encoding='utf-8'))
+    except Exception:
+        return None
+    if not isinstance(entries, list):
+        return None
+    for entry in entries:
+        if entry.get('session_id') != session_id:
+            continue
+        count = entry.get('message_count')
+        if isinstance(count, int) and count >= 0:
+            return count
+        try:
+            count = int(count)
+        except (TypeError, ValueError):
+            return None
+        return count if count >= 0 else None
+    return None
+

 class Session:
    def __init__(self, session_id: str=None, title: str='Untitled',
@@ -231,6 +340,7 @@ class Session:
        self.pending_started_at = pending_started_at
        self.compression_anchor_visible_idx = compression_anchor_visible_idx
        self.compression_anchor_message_key = compression_anchor_message_key
+        self._metadata_message_count = None

    @property
    def path(self):
@@ -255,7 +365,8 @@ class Session:
        meta['tool_calls'] = self.tool_calls
        # Fields not in METADATA_FIELDS (e.g. last_usage, message_count) go at the end
        extra = {k: v for k, v in self.__dict__.items()
-                 if k not in METADATA_FIELDS and k not in ('messages', 'tool_calls')}
+                 if k not in METADATA_FIELDS and k not in ('messages', 'tool_calls')
+                 and not k.startswith('_')}
        payload = json.dumps({**meta, **extra}, ensure_ascii=False, indent=2)
        tmp = self.path.with_suffix(f'.tmp.{os.getpid()}.{threading.current_thread().ident}')
        try:
@@ -288,10 +399,9 @@ class Session:
        """Load only the compact metadata fields, skipping the messages array.

        Session JSON files have metadata fields (session_id, title, model, etc.)
-        at the top level, before the large messages array. We read only the
-        first ~1KB — enough to capture all compact() fields — then parse just
-        that prefix. Falls back to load() if the prefix doesn't contain enough
-        fields or if the file is unexpectedly small.
+        at the top level, before the large messages array. Read only up to the
+        top-level "messages" field and synthesize a small metadata-only object.
+        Falls back to load() for legacy or unexpected file layouts.
        """
        if not sid or not all(c in '0123456789abcdefghijklmnopqrstuvwxyz_' for c in sid):
            return None
@@ -299,26 +409,18 @@ class Session:
        if not p.exists():
            return None
        try:
-            # Read just the first 1 KB — metadata comes before messages array
-            with open(p, 'r', encoding='utf-8') as f:
-                prefix = f.read(1024)
+            prefix = _read_metadata_json_prefix(p)
            if not prefix:
                return cls.load(sid)
            parsed = json.loads(prefix)
-            # Verify we got the essential fields.
-            # With metadata-first save() ordering, messages appears at byte ~567.
-            # For sessions <= ~512 bytes total the entire messages array fits in the
-            # first 1 KB and we get a valid list. For larger sessions json.loads
-            # fails on the truncated buffer (unterminated string), so we fall back
-            # to full load. The one exception is a truncation inside a string value
-            # that happens to produce valid JSON with a truncated string — guard
-            # against that by requiring messages to be a list.
            needed = {'session_id', 'title', 'created_at', 'updated_at'}
            if not needed.issubset(parsed.keys()):
                return cls.load(sid)
-            if not isinstance(parsed.get('messages'), list):
-                return cls.load(sid)
-            return cls(**parsed)
+            parsed['messages'] = []
+            parsed['tool_calls'] = []
+            session = cls(**parsed)
+            session._metadata_message_count = _lookup_index_message_count(sid)
+            return session
        except Exception:
            # Corrupt prefix or decode error — fall back to full load
            return cls.load(sid)
@@ -330,9 +432,14 @@ class Session:
            'title': self.title,
            'workspace': self.workspace,
            'model': self.model,
-            'message_count': len(self.messages),
+            'message_count': (
+                self._metadata_message_count
+                if self._metadata_message_count is not None
+                else len(self.messages)
+            ),
            'created_at': self.created_at,
            'updated_at': self.updated_at,
+            'last_message_at': _last_message_timestamp(self.messages) or self.updated_at,
            'pinned': self.pinned,
            'archived': self.archived,
            'project_id': self.project_id,
@@ -352,9 +459,10 @@ class Session:
 def get_session(sid, metadata_only=False):
    """Load a session, optionally with metadata only (skipping the messages array).

-    When metadata_only=True the session is still cached so the full load on the
-    next access is fast. Use this when you only need compact() metadata and not
-    the actual message history (e.g., for fast sidebar switching).
+    Metadata-only loads intentionally do not populate the full-session cache.
+    Otherwise a later full load could return a compact object with an empty
+    messages list. Use this when you only need compact() metadata and not the
+    actual message history (e.g., for fast sidebar switching).
    """
    with LOCK:
        if sid in SESSIONS:
@@ -362,6 +470,8 @@ def get_session(sid, metadata_only=False):
            return SESSIONS[sid]
    if metadata_only:
        s = Session.load_metadata_only(sid)
+        if s:
+            return s
    else:
        s = Session.load(sid)
    if s:
@@ -413,6 +523,18 @@ def all_sessions():
                s for s in index
                if _index_entry_exists(s.get('session_id'))
            ]
+            backfilled = []
+            for i, s in enumerate(index):
+                if 'last_message_at' not in s:
+                    full = Session.load(s.get('session_id'))
+                    if full:
+                        index[i] = full.compact()
+                        backfilled.append(full)
+            if backfilled:
+                try:
+                    _write_session_index(updates=backfilled)
+                except Exception:
+                    logger.debug("Failed to persist last_message_at backfill")
            for s in index:
                s['is_streaming'] = _is_streaming_session(
                    s.get('active_stream_id'),
@@ -426,7 +548,7 @@ def all_sessions():
                        include_runtime=True,
                        active_stream_ids=active_stream_ids,
                    )
-            result = sorted(index_map.values(), key=lambda s: (s.get('pinned', False), s['updated_at']), reverse=True)
+            result = sorted(index_map.values(), key=lambda s: (s.get('pinned', False), _session_sort_timestamp(s)), reverse=True)
            # Hide empty Untitled sessions from the UI (created by tests, page refreshes, etc.)
            # Exempt sessions younger than 60 s so a brand-new session stays visible (#789)
            _now = time.time()
@@ -454,7 +576,7 @@ def all_sessions():
            logger.debug("Failed to load session from %s", p)
    for s in SESSIONS.values():
        if all(s.session_id != x.session_id for x in out): out.append(s)
-    out.sort(key=lambda s: (getattr(s, 'pinned', False), s.updated_at), reverse=True)
+    out.sort(key=lambda s: (getattr(s, 'pinned', False), _session_sort_timestamp(s)), reverse=True)
    _now = time.time()
    result = [s.compact(include_runtime=True, active_stream_ids=active_stream_ids) for s in out if not (
        s.title == 'Untitled'
@@ -528,16 +650,11 @@ def get_cli_sessions() -> list:
    """Read CLI sessions from the agent's SQLite store and return them as
    dicts in a format the WebUI sidebar can render alongside local sessions.

-    Returns empty list if the SQLite DB is missing, the sqlite3 module is
-    unavailable, or any error occurs -- the bridge is purely additive and never
-    crashes the WebUI.
+    Returns empty list if the SQLite DB is missing or any error occurs -- the
+    bridge is purely additive and never crashes the WebUI.
    """
    import os
    cli_sessions = []
-    try:
-        import sqlite3
-    except ImportError:
-        return cli_sessions

    # Use the active WebUI profile's HERMES_HOME to find state.db.
    # The active profile is determined by what the user has selected in the UI
@@ -566,59 +683,30 @@ def get_cli_sessions() -> list:
        _cli_profile = None  # older agent -- fall back to no profile

    try:
-        with sqlite3.connect(str(db_path)) as conn:
-            conn.row_factory = sqlite3.Row
-            cur = conn.cursor()
-            # Introspect schema to handle older hermes-agent versions that
-            # may not have a 'source' column. Without this check the query raises
-            # OperationalError which is silently swallowed, causing the empty-list bug.
-            cur.execute("PRAGMA table_info(sessions)")
-            _session_cols = {row[1] for row in cur.fetchall()}
-            if 'source' not in _session_cols:
-                import logging as _logging
-                _logging.getLogger(__name__).warning(
-                    "get_cli_sessions(): state.db at %s has no 'source' column "
-                    "(older hermes-agent?). CLI sessions unavailable. "
-                    "Upgrade hermes-agent to fix this.",
-                    db_path,
-                )
-                return cli_sessions
+        for row in read_importable_agent_session_rows(db_path, limit=200, log=logger):
+            sid = row['id']
+            raw_ts = row['last_activity'] or row['started_at']
+            # Prefer the CLI session's own profile from the DB; fall back to
+            # the active CLI profile so sidebar filtering works either way.
+            profile = _cli_profile  # CLI DB has no profile column; use active profile

-            cur.execute("""
-                SELECT s.id, s.title, s.model, s.message_count,
-                       s.started_at, s.source,
-                       MAX(m.timestamp) AS last_activity
-                FROM sessions s
-                LEFT JOIN messages m ON m.session_id = s.id
-                WHERE s.source IS NOT NULL AND s.source != 'webui'
-                GROUP BY s.id
-                ORDER BY COALESCE(MAX(m.timestamp), s.started_at) DESC
-                LIMIT 200
-            """)
-            for row in cur.fetchall():
-                sid = row['id']
-                raw_ts = row['last_activity'] or row['started_at']
-                # Prefer the CLI session's own profile from the DB; fall back to
-                # the active CLI profile so sidebar filtering works either way.
-                profile = _cli_profile  # CLI DB has no profile column; use active profile
-
-                _source = row['source'] or 'cli'
-                _display_title = row['title'] or f'{_source.title()} Session'
-                cli_sessions.append({
-                    'session_id': sid,
-                    'title': _display_title,
-                    'workspace': str(get_last_workspace()),
-                    'model': row['model'] or None,
-                    'message_count': row['message_count'] or 0,
-                    'created_at': row['started_at'],
-                    'updated_at': raw_ts,
-                    'pinned': False,
-                    'archived': False,
-                    'project_id': None,
-                    'profile': profile,
-                    'source_tag': _source,
-                    'is_cli_session': True,
-                })
+            _source = row['source'] or 'cli'
+            _display_title = row['title'] or f'{_source.title()} Session'
+            cli_sessions.append({
+                'session_id': sid,
+                'title': _display_title,
+                'workspace': str(get_last_workspace()),
+                'model': row['model'] or None,
+                'message_count': row['message_count'] or row['actual_message_count'] or 0,
+                'created_at': row['started_at'],
+                'updated_at': raw_ts,
+                'pinned': False,
+                'archived': False,
+                'project_id': None,
+                'profile': profile,
+                'source_tag': _source,
+                'is_cli_session': True,
+            })
    except Exception as _cli_err:
        # DB schema changed, locked, or corrupted -- log warning so admins can diagnose.
        # Still degrade gracefully (don't crash the WebUI).
@@ -329,6 +329,7 @@ from api.workspace import (
    safe_resolve_ws,
    resolve_trusted_workspace,
    validate_workspace_to_add,
+    _workspace_blocked_roots,
 )
 from api.upload import handle_upload, handle_transcribe
 from api.streaming import _sse, _run_agent_streaming, cancel_stream
@@ -680,19 +681,26 @@ def handle_get(handler, parsed) -> bool:
        import time as _time
        _t0 = _time.monotonic()
        _debug_slow = os.environ.get("HERMES_DEBUG_SLOW", "")
-        sid = parse_qs(parsed.query).get("session_id", [""])[0]
+        query = parse_qs(parsed.query)
+        sid = query.get("session_id", [""])[0]
        if not sid:
            return j(handler, {"error": "session_id is required"}, status=400)
        # ?messages=0 skips the message payload for fast session switching.
        # The frontend uses this when switching conversations in the sidebar
        # (only needs metadata). The full message array is loaded lazily
        # via ?messages=1 when the message panel opens.
-        load_messages = parse_qs(parsed.query).get("messages", ["1"])[0] != "0"
+        load_messages = query.get("messages", ["1"])[0] != "0"
+        resolve_model_default = "1" if load_messages else "0"
+        resolve_model = query.get("resolve_model", [resolve_model_default])[0] != "0"
        try:
            _t1 = _time.monotonic()
            s = get_session(sid, metadata_only=(not load_messages))
            _t2 = _time.monotonic()
-            effective_model = _resolve_effective_session_model_for_display(s)
+            effective_model = (
+                _resolve_effective_session_model_for_display(s)
+                if resolve_model
+                else None
+            )
            _t3 = _time.monotonic()
            raw = s.compact() | {
                "messages": s.messages if load_messages else [],
@@ -735,6 +743,8 @@ def handle_get(handler, parsed) -> bool:
                    "message_count": len(msgs),
                    "created_at": (cli_meta or {}).get("created_at", 0),
                    "updated_at": (cli_meta or {}).get("updated_at", 0),
+                    "last_message_at": (cli_meta or {}).get("last_message_at")
+                    or (cli_meta or {}).get("updated_at", 0),
                    "pinned": False,
                    "archived": False,
                    "project_id": None,
@@ -783,7 +793,10 @@ def handle_get(handler, parsed) -> bool:
        else:
            deduped_cli = []
        merged = webui_sessions + deduped_cli
-        merged.sort(key=lambda s: s.get("updated_at", 0) or 0, reverse=True)
+        merged.sort(
+            key=lambda s: s.get("last_message_at") or s.get("updated_at", 0) or 0,
+            reverse=True,
+        )
        safe_merged = []
        for s in merged:
            item = dict(s)
@@ -3027,8 +3040,25 @@ def _handle_create_dir(handler, body):
 def _handle_workspace_add(handler, body):
    path_str = body.get("path", "").strip()
    name = body.get("name", "").strip()
+    auto_create = body.get("create", False)
    if not path_str:
        return bad(handler, "path is required")
+    # Validate the path is NOT a blocked system root BEFORE any filesystem mutation.
+    # This prevents creating orphan directories on rejected paths (#782 review).
+    candidate = Path(path_str).expanduser().resolve()
+    for blocked in _workspace_blocked_roots():
+        try:
+            candidate.relative_to(blocked)
+            return bad(handler, f"Path points to a system directory: {candidate}")
+        except ValueError:
+            pass
+    # Now safe to create the directory if requested
+    if auto_create:
+        try:
+            candidate.mkdir(parents=True, exist_ok=True)
+        except (OSError, PermissionError) as e:
+            return bad(handler, f"Could not create directory: {_sanitize_error(e)}")
+    # Full validation (exists, is_dir) — should pass now that dir exists
    try:
        p = validate_workspace_to_add(path_str)
    except ValueError as e:
@@ -25,6 +25,7 @@ from api.config import (
    resolve_model_provider,
 )
 from api.helpers import redact_session_data
+from api.metering import meter

 # Global lock for os.environ writes. Per-session locks (_agent_lock) prevent
 # concurrent runs of the SAME session, but two DIFFERENT sessions can still
@@ -292,9 +293,71 @@ def _aux_title_timeout(default: float = 15.0) -> float:
        return default

 def _title_completion_budget(provider: str = '', model: str = '', base_url: str = '') -> int:
-    if _is_minimax_route(provider, model, base_url):
-        return 384
-    return 160
+    # Title generation is a small auxiliary task, but reasoning models may
+    # spend a surprising amount of the completion budget before emitting final
+    # content.  Keep the budget high enough for MiniMax/Kimi-style reasoning
+    # responses without making title generation depend on provider-specific
+    # one-off branches.
+    return 512
+
+
+def _title_retry_completion_budget(provider: str = '', model: str = '', base_url: str = '') -> int:
+    return max(1024, _title_completion_budget(provider, model, base_url) * 2)
+
+
+def _title_retry_status(status: str) -> bool:
+    return status in {
+        'llm_length',
+        'llm_length_aux',
+        'llm_empty_reasoning',
+        'llm_empty_reasoning_aux',
+    }
+
+
+def _safe_obj_value(obj, key: str):
+    if obj is None:
+        return None
+    if isinstance(obj, dict):
+        return obj.get(key)
+    value = getattr(obj, key, None)
+    # Missing MagicMock attrs stringify as mock reprs and look truthy.  Treat
+    # them as absent so tests model real provider objects accurately.
+    if value.__class__.__module__.startswith('unittest.mock'):
+        return None
+    return value
+
+
+def _safe_text_value(value) -> str:
+    if value is None:
+        return ''
+    if value.__class__.__module__.startswith('unittest.mock'):
+        return ''
+    return str(value or '').strip()
+
+
+def _extract_title_response(resp, *, aux: bool = False) -> tuple[str, str]:
+    """Return (content, empty_status) from an OpenAI-compatible response."""
+    suffix = '_aux' if aux else ''
+    try:
+        choices = _safe_obj_value(resp, 'choices') or []
+        choice = choices[0] if choices else None
+        message = _safe_obj_value(choice, 'message')
+        content = _safe_text_value(_safe_obj_value(message, 'content'))
+        if content:
+            return content, ''
+        finish_reason = _safe_text_value(_safe_obj_value(choice, 'finish_reason')).lower()
+        reasoning = (
+            _safe_text_value(_safe_obj_value(message, 'reasoning'))
+            or _safe_text_value(_safe_obj_value(message, 'reasoning_content'))
+            or _safe_text_value(_safe_obj_value(message, 'thinking'))
+        )
+        if finish_reason == 'length':
+            return '', f'llm_length{suffix}'
+        if reasoning:
+            return '', f'llm_empty_reasoning{suffix}'
+        return '', f'llm_empty{suffix}'
+    except Exception:
+        return '', f'llm_empty{suffix}'


 def generate_title_raw_via_aux(
@@ -308,41 +371,43 @@ def generate_title_raw_via_aux(
    if not user_text or not assistant_text:
        return None, 'missing_exchange'
    qa, prompts = _title_prompts(user_text, assistant_text)
-    max_tokens = _title_completion_budget(provider, model, base_url)
+    base_max_tokens = _title_completion_budget(provider, model, base_url)
    reasoning_extra = {"reasoning": {"enabled": False}}
    if _is_minimax_route(provider, model, base_url):
        reasoning_extra["reasoning_split"] = True
    try:
        _timeout = _aux_title_timeout()
        from agent.auxiliary_client import call_llm
+        last_status = 'llm_error_aux'
        for idx, prompt in enumerate(prompts):
            messages = [
                {"role": "system", "content": prompt},
                {"role": "user", "content": qa},
            ]
+            budgets = [base_max_tokens]
            try:
-                resp = call_llm(
-                    task='title_generation',
-                    provider=provider or None,
-                    model=model or None,
-                    base_url=base_url or None,
-                    messages=messages,
-                    max_tokens=max_tokens,
-                    temperature=0.2,
-                    timeout=_timeout,
-                    extra_body=reasoning_extra,
-                )
-                raw = ''
-                try:
-                    raw = resp.choices[0].message.content or ''
-                except Exception:
-                    raw = ''
-                raw = str(raw or '').strip()
-                if raw:
-                    return raw, ('llm_aux' if idx == 0 else 'llm_aux_retry')
+                for budget_idx, max_tokens in enumerate(budgets):
+                    resp = call_llm(
+                        task='title_generation',
+                        provider=provider or None,
+                        model=model or None,
+                        base_url=base_url or None,
+                        messages=messages,
+                        max_tokens=max_tokens,
+                        temperature=0.2,
+                        timeout=_timeout,
+                        extra_body=reasoning_extra,
+                    )
+                    raw, empty_status = _extract_title_response(resp, aux=True)
+                    if raw:
+                        return raw, ('llm_aux' if idx == 0 and budget_idx == 0 else 'llm_aux_retry')
+                    last_status = empty_status or 'llm_empty_aux'
+                    if budget_idx == 0 and _title_retry_status(last_status):
+                        budgets.append(_title_retry_completion_budget(provider, model, base_url))
            except Exception as e:
+                last_status = 'llm_error_aux'
                logger.debug("Aux title generation attempt %s failed: %s", idx + 1, e)
-        return None, 'llm_error_aux'
+        return None, last_status
    except Exception as e:
        logger.debug("Aux title generation failed: %s", e)
        return None, 'llm_error_aux'
@@ -356,7 +421,7 @@ def generate_title_raw_via_agent(agent, user_text: str, assistant_text: str) ->
        return None, 'missing_agent'

    qa, prompts = _title_prompts(user_text, assistant_text)
-    max_tokens = _title_completion_budget(
+    base_max_tokens = _title_completion_budget(
        getattr(agent, 'provider', ''),
        getattr(agent, 'model', ''),
        getattr(agent, 'base_url', ''),
@@ -370,57 +435,70 @@ def generate_title_raw_via_agent(agent, user_text: str, assistant_text: str) ->
                {"role": "system", "content": prompt},
                {"role": "user", "content": qa},
            ]
+            budgets = [base_max_tokens]
            try:
-                raw = ""
-                if getattr(agent, 'api_mode', '') == 'codex_responses':
-                    codex_kwargs = agent._build_api_kwargs(api_messages)
-                    codex_kwargs.pop('tools', None)
-                    if 'max_output_tokens' in codex_kwargs:
-                        codex_kwargs['max_output_tokens'] = max_tokens
-                    resp = agent._run_codex_stream(codex_kwargs)
-                    assistant_message, _ = agent._normalize_codex_response(resp)
-                    raw = (assistant_message.content or '') if assistant_message else ''
-                elif getattr(agent, 'api_mode', '') == 'anthropic_messages':
-                    from agent.anthropic_adapter import build_anthropic_kwargs, normalize_anthropic_response
-                    ant_kwargs = build_anthropic_kwargs(
-                        model=agent.model,
-                        messages=api_messages,
-                        tools=None,
-                        max_tokens=max_tokens,
-                        reasoning_config=disabled_reasoning,
-                        is_oauth=getattr(agent, '_is_anthropic_oauth', False),
-                        preserve_dots=agent._anthropic_preserve_dots(),
-                        base_url=getattr(agent, '_anthropic_base_url', None),
-                    )
-                    resp = agent._anthropic_messages_create(ant_kwargs)
-                    assistant_message, _ = normalize_anthropic_response(
-                        resp, strip_tool_prefix=getattr(agent, '_is_anthropic_oauth', False)
-                    )
-                    raw = (assistant_message.content or '') if assistant_message else ''
-                else:
-                    api_kwargs = agent._build_api_kwargs(api_messages)
-                    api_kwargs.pop('tools', None)
-                    api_kwargs['temperature'] = 0.1
-                    api_kwargs['timeout'] = 15.0
-                    if _is_minimax_route(getattr(agent, 'provider', ''), getattr(agent, 'model', ''), getattr(agent, 'base_url', '')):
-                        extra_body = dict(api_kwargs.get('extra_body') or {})
-                        extra_body['reasoning_split'] = True
-                        api_kwargs['extra_body'] = extra_body
-                    if 'max_completion_tokens' in api_kwargs:
-                        api_kwargs['max_completion_tokens'] = max_tokens
+                last_status = 'llm_empty'
+                for budget_idx, max_tokens in enumerate(budgets):
+                    raw = ""
+                    empty_status = ''
+                    if getattr(agent, 'api_mode', '') == 'codex_responses':
+                        codex_kwargs = agent._build_api_kwargs(api_messages)
+                        codex_kwargs.pop('tools', None)
+                        if 'max_output_tokens' in codex_kwargs:
+                            codex_kwargs['max_output_tokens'] = max_tokens
+                        resp = agent._run_codex_stream(codex_kwargs)
+                        assistant_message, _ = agent._normalize_codex_response(resp)
+                        raw = (assistant_message.content or '') if assistant_message else ''
+                        if not raw:
+                            empty_status = 'llm_empty'
+                    elif getattr(agent, 'api_mode', '') == 'anthropic_messages':
+                        from agent.anthropic_adapter import build_anthropic_kwargs, normalize_anthropic_response
+                        ant_kwargs = build_anthropic_kwargs(
+                            model=agent.model,
+                            messages=api_messages,
+                            tools=None,
+                            max_tokens=max_tokens,
+                            reasoning_config=disabled_reasoning,
+                            is_oauth=getattr(agent, '_is_anthropic_oauth', False),
+                            preserve_dots=agent._anthropic_preserve_dots(),
+                            base_url=getattr(agent, '_anthropic_base_url', None),
+                        )
+                        resp = agent._anthropic_messages_create(ant_kwargs)
+                        assistant_message, _ = normalize_anthropic_response(
+                            resp, strip_tool_prefix=getattr(agent, '_is_anthropic_oauth', False)
+                        )
+                        raw = (assistant_message.content or '') if assistant_message else ''
+                        if not raw:
+                            empty_status = 'llm_empty'
                    else:
-                        api_kwargs['max_tokens'] = max_tokens
-                    resp = agent._ensure_primary_openai_client(reason='title_generation').chat.completions.create(
-                        **api_kwargs,
-                    )
-                    try:
-                        raw = resp.choices[0].message.content or ""
-                    except Exception:
-                        raw = ""
-                raw = str(raw or '').strip()
-                if raw:
-                    return raw, ('llm' if idx == 0 else 'llm_retry')
+                        api_kwargs = agent._build_api_kwargs(api_messages)
+                        api_kwargs.pop('tools', None)
+                        api_kwargs['temperature'] = 0.1
+                        api_kwargs['timeout'] = 15.0
+                        if _is_minimax_route(getattr(agent, 'provider', ''), getattr(agent, 'model', ''), getattr(agent, 'base_url', '')):
+                            extra_body = dict(api_kwargs.get('extra_body') or {})
+                            extra_body['reasoning_split'] = True
+                            api_kwargs['extra_body'] = extra_body
+                        if 'max_completion_tokens' in api_kwargs:
+                            api_kwargs['max_completion_tokens'] = max_tokens
+                        else:
+                            api_kwargs['max_tokens'] = max_tokens
+                        resp = agent._ensure_primary_openai_client(reason='title_generation').chat.completions.create(
+                            **api_kwargs,
+                        )
+                        raw, empty_status = _extract_title_response(resp)
+                    raw = str(raw or '').strip()
+                    if raw:
+                        return raw, ('llm' if idx == 0 and budget_idx == 0 else 'llm_retry')
+                    last_status = empty_status or 'llm_empty'
+                    if budget_idx == 0 and _title_retry_status(last_status):
+                        budgets.append(_title_retry_completion_budget(
+                            getattr(agent, 'provider', ''),
+                            getattr(agent, 'model', ''),
+                            getattr(agent, 'base_url', ''),
+                        ))
            except Exception as e:
+                last_status = 'llm_error'
                logger.debug(
                    "Agent title generation attempt %s failed: provider=%s model=%s error=%s",
                    idx + 1,
@@ -428,7 +506,7 @@ def generate_title_raw_via_agent(agent, user_text: str, assistant_text: str) ->
                    getattr(agent, 'model', None),
                    e,
                )
-        return None, 'llm_error'
+        return None, last_status
    except Exception as e:
        logger.debug("Agent title generation failed: %s", e)
        return None, 'llm_error'
@@ -611,6 +689,11 @@ def _run_background_title_update(session_id: str, user_text: str, assistant_text
            if next_title:
                logger.debug("Using local fallback for session title generation")
                source = 'fallback'
+        fallback_reason = (
+            f'local_summary:{llm_status}'
+            if source == 'fallback' and llm_status
+            else 'local_summary'
+        )
        wrote_title = False
        effective_title = current
        if next_title:
@@ -638,7 +721,7 @@ def _run_background_title_update(session_id: str, user_text: str, assistant_text

        if wrote_title:
            if source == 'fallback':
-                _put_title_status(put_event, session_id, source, 'local_summary', effective_title, raw_preview)
+                _put_title_status(put_event, session_id, source, fallback_reason, effective_title, raw_preview)
            else:
                _put_title_status(put_event, session_id, source, llm_status, effective_title, raw_preview)
            put_event('title', {'session_id': session_id, 'title': effective_title})
@@ -919,6 +1002,28 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
        CANCEL_FLAGS[stream_id] = cancel_event
        STREAM_PARTIAL_TEXT[stream_id] = ''  # start accumulating partial text (#893)

+    # Register this stream with the global streaming meter
+    meter().begin_session(stream_id)
+
+    # Metering ticker — emits a metering event at 1 Hz while sessions are active.
+    # When get_interval() returns >= 10.0 (no active sessions), the ticker exits
+    # so no idle readings are emitted and the SSE consumer sees nothing.
+    _metering_stop = threading.Event()
+
+    def _metering_ticker():
+        while True:
+            interval = meter().get_interval()
+            if interval >= 10.0:
+                break  # nothing active — stop the ticker
+            if _metering_stop.wait(interval):
+                break  # stream was cancelled or ended — exit
+            stats = meter().get_stats()
+            stats['session_id'] = stream_id
+            put('metering', stats)
+
+    _metering_thread = threading.Thread(target=_metering_ticker, daemon=True)
+    _metering_thread.start()
+
    def put(event, data):
        # If cancelled, drop all further events except the cancel event itself
        if cancel_event.is_set() and event not in ('cancel', 'error'):
@@ -1061,6 +1166,19 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
            _reasoning_text = ''  # accumulates reasoning/thinking trace for persistence
            _live_tool_calls = []  # tool progress fallback when final messages omit tool IDs

+            # Throttle: emit metering events at most every 100 ms so the TPS label
+            # feels live during fast token streams without flooding the SSE channel.
+            _metering_last_emit = [time.monotonic() - 1]  # fire immediately on first token
+
+            def _emit_metering():
+                now = time.monotonic()
+                if now - _metering_last_emit[0] < 0.1:
+                    return
+                _metering_last_emit[0] = now
+                stats = meter().get_stats()
+                stats['session_id'] = stream_id
+                put('metering', stats)
+
            def on_token(text):
                nonlocal _token_sent
                if text is None:
@@ -1070,6 +1188,9 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
                if stream_id in STREAM_PARTIAL_TEXT:
                    STREAM_PARTIAL_TEXT[stream_id] += str(text)
                put('token', {'text': text})
+                # Update global throughput meter
+                meter().record_token(stream_id, len(STREAM_PARTIAL_TEXT[stream_id]))
+                _emit_metering()

            def on_reasoning(text):
                nonlocal _reasoning_text
@@ -1077,6 +1198,9 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
                    return
                _reasoning_text += str(text)
                put('reasoning', {'text': str(text)})
+                # Track reasoning tokens in the meter so TPS reflects all AI output
+                meter().record_reasoning(stream_id, len(_reasoning_text))
+                _emit_metering()

            # Pre-initialise the activity counter here so on_tool (which
            # closes over it) never captures an unbound name even if this
@@ -1084,6 +1208,7 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
            _checkpoint_activity = [0]

            def on_tool(*cb_args, **cb_kwargs):
+                nonlocal _reasoning_text
                event_type = None
                name = None
                preview = None
@@ -1103,7 +1228,10 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
                if event_type in ('reasoning.available', '_thinking'):
                    reason_text = preview if event_type == 'reasoning.available' else name
                    if reason_text:
+                        _reasoning_text += str(reason_text)
                        put('reasoning', {'text': str(reason_text)})
+                        meter().record_reasoning(stream_id, len(_reasoning_text))
+                        _emit_metering()
                    return

                args_snap = {}
@@ -1623,6 +1751,10 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
            # (reasoning trace already attached + saved above, before s.save())
            raw_session = s.compact() | {'messages': s.messages, 'tool_calls': tool_calls}
            put('done', {'session': redact_session_data(raw_session), 'usage': usage})
+            # Emit metering stats for the header TPS label
+            meter_stats = meter().get_stats()
+            meter_stats['session_id'] = session_id
+            put('metering', meter_stats)
            if _should_bg_title and _u0 and _a0:
                threading.Thread(
                    target=_run_background_title_update,
@@ -1635,6 +1767,8 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
                # activeSid = original session_id so they must match for stream_end to close.
                put('stream_end', {'session_id': session_id})
        finally:
+            # Stop the live metering ticker
+            _metering_stop.set()
            # Unregister the gateway approval callback and unblock any threads
            # still waiting on approval (e.g. stream cancelled mid-approval).
            if _approval_registered and _unreg_notify is not None:
@@ -1660,6 +1794,13 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
    except Exception as e:
        print('[webui] stream error:\n' + traceback.format_exc(), flush=True)
        err_str = str(e)
+        # Sanitize HTML from provider error responses — some providers return
+        # full HTML pages (e.g. nginx "404 page not found") instead of JSON errors.
+        # Strip HTML tags to avoid rendering raw markup in the chat message.
+        _stripped = re.sub(r'<[^>]+>', ' ', err_str)
+        _stripped = re.sub(r'\s+', ' ', _stripped).strip()
+        if _stripped != err_str:
+            err_str = _stripped
        _exc_lower = err_str.lower()
        # Classify before saving so the error message can be persisted to the session.
        # Check quota exhaustion first — OpenAI billing 429s use insufficient_quota which
@@ -1683,6 +1824,16 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
            or 'invalid api key' in _exc_lower
            or 'no cookie auth credentials' in _exc_lower
        )
+        _exc_is_not_found = (
+            '404' in err_str
+            or 'not found' in _exc_lower
+            or 'does not exist' in _exc_lower
+            or 'model not found' in _exc_lower
+            or 'model_not_found' in _exc_lower
+            or 'invalid model' in _exc_lower
+            or 'does not match any known model' in _exc_lower
+            or 'unknown model' in _exc_lower
+        )
        if _exc_is_quota:
            _exc_label, _exc_type, _exc_hint = (
                'Out of credits', 'quota_exhausted',
@@ -1699,6 +1850,12 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
                'The selected model may not be supported by your configured provider. '
                'Run `hermes model` in your terminal to switch providers, then restart the WebUI.',
            )
+        elif _exc_is_not_found:
+            _exc_label, _exc_type, _exc_hint = (
+                'Model not found', 'model_not_found',
+                'The selected model was not found by the provider. '
+                'Check the model ID in Settings or run `hermes model` to verify it exists for your provider.',
+            )
        else:
            _exc_label, _exc_type, _exc_hint = 'Error', 'error', ''
        if s is not None:
@@ -793,6 +793,7 @@ function applyBotName(){
    window._showThinking=s.show_thinking!==false;
    window._sidebarDensity=(s.sidebar_density==='detailed'?'detailed':'compact');
    window._botName=s.bot_name||'Hermes';
+    if(s.default_model) window._defaultModel=s.default_model;
    // Persist default workspace so the blank new-chat page can show it
    // and workspace actions (New file/folder) work before the first session (#804).
    if(s.default_workspace) S._profileDefaultWorkspace=s.default_workspace;
@@ -840,16 +841,20 @@ function applyBotName(){
  // Update profile chip label immediately
  const profileLabel=$('profileChipLabel');
  if(profileLabel) profileLabel.textContent=S.activeProfile||'default';
-  // Fetch available models from server and populate dropdown dynamically
-  await populateModelDropdown();
-  // Restore last-used model preference
-  const savedModel=localStorage.getItem('hermes-webui-model');
-  if(savedModel && $('modelSelect')){
-    $('modelSelect').value=savedModel;
-    // If the value didn't take (model not in list), clear the bad pref
-    if($('modelSelect').value!==savedModel) localStorage.removeItem('hermes-webui-model');
-    else if(typeof syncModelChip==='function') syncModelChip();
-  }
+  // Fetch available models without blocking session restore. The static HTML
+  // options are enough for first paint; the dynamic provider list can settle
+  // after the saved session is visible.
+  const _modelDropdownReady=populateModelDropdown().then(()=>{
+    const savedModel=localStorage.getItem('hermes-webui-model');
+    if(savedModel && $('modelSelect')){
+      $('modelSelect').value=savedModel;
+      // If the value didn't take (model not in list), clear the bad pref
+      if($('modelSelect').value!==savedModel) localStorage.removeItem('hermes-webui-model');
+      else if(typeof syncModelChip==='function') syncModelChip();
+    }
+    if(S.session) syncTopbar();
+  }).catch(()=>{});
+  window._modelDropdownReady=_modelDropdownReady;
  // Pre-load workspace list so sidebar name is correct from first render
  await loadWorkspaceList();
  await loadOnboardingWizard();
@@ -56,6 +56,7 @@ const LOCALES = {
    model_unavailable_title: 'This model is no longer in your current provider list',
    provider_mismatch_warning: (m,p)=>`"${m}" may not work with your configured provider (${p}). Send anyway, or run \`hermes model\` in your terminal to switch.`,
    provider_mismatch_label: 'Provider mismatch',
+    model_not_found_label: 'Model not found',
    model_custom_label: 'Custom model ID',
    model_custom_placeholder: 'e.g. openai/gpt-5.4',
    model_search_placeholder: 'Search models…',
@@ -196,6 +197,10 @@ const LOCALES = {
    new_folder_prompt: 'New folder name:',
    folder_created: 'Created folder ',
    folder_create_failed: 'Create folder failed: ',
+  workspace_auto_create_folder: 'Create folder if it doesn\'t exist',
+  folder_add_as_space_btn: 'Add as Space',
+  folder_add_as_space_msg: 'Add this folder as a new space in your workspace list?',
+  folder_add_as_space_title: 'Add as Space?',
    remove_title: 'Remove',
    empty_dir: '(empty)',
    upload_failed: 'Upload failed: ',
@@ -458,6 +463,7 @@ const LOCALES = {
    cron_status_paused: 'paused',
    cron_status_error: 'error',
    cron_status_active: 'active',
+    cron_status_running: 'running\u2026',
    cron_next: 'Next',
    cron_last: 'Last',
    cron_run_now: 'Run now',
@@ -605,9 +611,6 @@ const LOCALES = {
    profile_api_key_label: 'API key',
  },

-    cmd_status: '\u986f\u793a\u6703\u8a71\u8cc7\u8a0a',
-    memory_saved: '\u8a18\u61b6\u5df2\u5132\u5b58',
-    profile_delete_title: '\u522a\u9664\u6b64\u8a2d\u5b9a\u6a94',
  ru: {
    _lang: 'ru',
    _label: 'Русский',
@@ -652,6 +655,7 @@ const LOCALES = {
    provider_mismatch_warning: (m, p) =>
    `"${m}" может не работать с вашим настроенным провайдером (${p}). Всё равно отправить или запустите \`hermes model\` в терминале, чтобы переключиться.`,
    provider_mismatch_label: 'Несовпадение провайдера',
+    model_not_found_label: 'Модель не найдена',
    model_custom_label: 'Пользовательский ID модели',
    model_custom_placeholder: 'например, openai/gpt-5.4',
    cmd_help: 'Показать доступные команды',
@@ -745,6 +749,10 @@ const LOCALES = {
    new_folder_prompt: 'Имя новой папки:',
    folder_created: 'Папка создана ',
    folder_create_failed: 'Не удалось создать папку: ',
+  workspace_auto_create_folder: 'Создать папку, если она не существует',
+  folder_add_as_space_btn: 'Добавить',
+  folder_add_as_space_msg: 'Добавить эту папку как новое пространство?',
+  folder_add_as_space_title: 'Добавить как пространство?',
    remove_title: 'Удаление',
    empty_dir: '(пусто)',
    upload_failed: 'Не удалось загрузить: ',
@@ -975,6 +983,7 @@ const LOCALES = {
    cron_status_paused: 'на паузе',
    cron_status_error: 'ошибка',
    cron_status_active: 'активно',
+    cron_status_running: 'выполняется\u2026',
    cron_next: 'Следующий',
    cron_last: 'Последний',
    cron_run_now: 'Запустить сейчас',
@@ -1233,6 +1242,7 @@ const LOCALES = {
    model_unavailable_title: 'Este modelo ya no está en tu lista actual de proveedores',
    provider_mismatch_warning: (m,p)=>`"${m}" puede no funcionar con tu proveedor configurado (${p}). Envía de todas formas, o ejecuta \`hermes model\` en la terminal para cambiar.`,
    provider_mismatch_label: 'Proveedor incompatible',
+    model_not_found_label: 'Modelo no encontrado',
    model_custom_label: 'ID de modelo personalizado',
    model_custom_placeholder: 'p. ej. openai/gpt-5.4',
    model_search_placeholder: 'Buscar modelos…',
@@ -1309,6 +1319,10 @@ const LOCALES = {
    new_folder_prompt: 'Nombre de la carpeta nueva:',
    folder_created: 'Carpeta creada ',
    folder_create_failed: 'Error al crear la carpeta: ',
+  workspace_auto_create_folder: 'Crear carpeta si no existe',
+  folder_add_as_space_btn: 'Añadir como espacio',
+  folder_add_as_space_msg: '¿Añadir esta carpeta como un nuevo espacio?',
+  folder_add_as_space_title: '¿Añadir como espacio?',
    remove_title: 'Quitar',
    empty_dir: '(vacío)',
    upload_failed: 'Error al subir: ',
@@ -1532,6 +1546,7 @@ const LOCALES = {
    cron_status_paused: 'paused',
    cron_status_error: 'error',
    cron_status_active: 'active',
+    cron_status_running: 'running\u2026',
    cron_next: 'Next',
    cron_last: 'Last',
    cron_run_now: 'Run now',
@@ -1773,6 +1788,7 @@ const LOCALES = {
    model_unavailable_title: 'Dieses Modell ist nicht mehr in Ihrer aktuellen Provider-Liste',
    provider_mismatch_warning: (m,p)=>`"${m}" funktioniert möglicherweise nicht mit Ihrem konfigurierten Provider (${p}). Trotzdem senden, oder \`hermes model\` im Terminal ausführen.`,
    provider_mismatch_label: 'Provider-Konflikt',
+    model_not_found_label: 'Modell nicht gefunden',
    // commands.js
    cmd_help: 'Verfügbare Befehle auflisten',
    cmd_clear: 'Konversationsverlauf löschen',
@@ -1852,6 +1868,10 @@ const LOCALES = {
    new_folder_prompt: 'Neuer Ordnername:',
    folder_created: 'Ordner erstellt ',
    folder_create_failed: 'Ordner erstellen fehlgeschlagen: ',
+  workspace_auto_create_folder: 'Ordner erstellen, falls nicht vorhanden',
+  folder_add_as_space_btn: 'Als Bereich hinzufügen',
+  folder_add_as_space_msg: 'Diesen Ordner als neuen Bereich zur Liste hinzufügen?',
+  folder_add_as_space_title: 'Als Bereich hinzufügen?',
    remove_title: 'Entfernen',
    empty_dir: '(leer)',
    upload_failed: 'Upload fehlgeschlagen: ',
@@ -2097,6 +2117,7 @@ const LOCALES = {
    model_unavailable_title: '\u8fd9\u4e2a\u6a21\u578b\u5df2\u7ecf\u4e0d\u5728\u5f53\u524d provider \u5217\u8868\u4e2d',
    provider_mismatch_warning: (m,p)=>`\"${m}\" \u53ef\u80fd\u65e0\u6cd5\u5728\u5f53\u524d\u914d\u7f6e\u7684\u63d0\u4f9b\u5546 (${p}) \u4e0b\u5de5\u4f5c\u3002\u76f4\u63a5\u53d1\u9001\uff0c\u6216\u5728\u7ec8\u7aef\u8fd0\u884c \`hermes model\` \u5207\u6362\u3002`,
    provider_mismatch_label: '\u63d0\u4f9b\u5546\u4e0d\u5339\u914d',
+    model_not_found_label: '\u672a\u627e\u5230\u6a21\u578b',
    model_custom_label: '\u81ea\u5b9a\u4e49\u6a21\u578b ID',
    model_custom_placeholder: '\u4f8b\u5982 openai/gpt-5.4',
    model_search_placeholder: '\u641c\u7d22\u6a21\u578b\u2026',
@@ -2181,6 +2202,10 @@ const LOCALES = {
    new_folder_prompt: '\u65b0\u6587\u4ef6\u5939\u540d\u79f0\uff1a',
    folder_created: '\u5df2\u521b\u5efa\u6587\u4ef6\u5939 ',
    folder_create_failed: '\u521b\u5efa\u6587\u4ef6\u5939\u5931\u8d25\uff1a',
+  workspace_auto_create_folder: '\u5982\u679c\u6587\u4ef6\u5939\u4e0d\u5b58\u5728\u5219\u521b\u5efa',
+  folder_add_as_space_btn: '\u6dfb\u52a0\u4e3a\u5de5\u4f5c\u533a',
+  folder_add_as_space_msg: '\u662f\u5426\u5c06\u6b64\u6587\u4ef6\u5939\u6dfb\u52a0\u4e3a\u65b0\u7684\u5de5\u4f5c\u533a\uff1f',
+  folder_add_as_space_title: '\u6dfb\u52a0\u4e3a\u5de5\u4f5c\u533a\uff1f',
    remove_title: '\u79fb\u9664',
    empty_dir: '(\u7a7a)',
    upload_failed: '\u4e0a\u4f20\u5931\u8d25\uff1a',
@@ -2394,6 +2419,7 @@ const LOCALES = {
    cron_status_paused: '暂停',
    cron_status_error: '错误',
    cron_status_active: '运行中',
+    cron_status_running: '执行中\u2026',
    cron_next: '下次',
    cron_last: '上次',
    cron_run_now: '立即运行',
@@ -2635,6 +2661,7 @@ const LOCALES = {
    model_unavailable_title: '\u6b64\u6a21\u578b\u5df2\u7d93\u4e0d\u5728\u7576\u524d provider \u5217\u8868\u4e2d',
    provider_mismatch_warning: (m,p)=>`\"${m}\" \u53ef\u80fd\u7121\u6cd5\u5728\u7576\u524d\u914d\u7f6e\u7684\u63d0\u4f9b\u8005 (${p}) \u4e0b\u904b\u4f5c\u3002\u5c1a\u9001\uff0c\u6216\u5728\u7d42\u7aef\u57f7\u884c \`hermes model\` \u5207\u63db\u3002`,
    provider_mismatch_label: '\u63d0\u4f9b\u8005\u4e0d\u76f8\u7b26',
+    model_not_found_label: '\u672a\u627e\u5230\u6a21\u578b',
    // commands.js
    cmd_help: '\u67e5\u770b\u53ef\u7528\u547d\u4ee4',
    cmd_clear: '\u6e05\u7a7a\u7576\u524d\u5c0d\u8a71\u8a0a\u606f',
@@ -2707,6 +2734,10 @@ const LOCALES = {
    new_folder_prompt: '\u65b0\u6587\u4ef6\u593e\u540d\u7a31\uff1a',
    folder_created: '\u5df2\u5275\u5efa\u6587\u4ef6\u593e ',
    folder_create_failed: '\u5275\u5efa\u6587\u4ef6\u593e\u5931\u6557\uff1a',
+  workspace_auto_create_folder: '\u8cc7\u6599\u593e\u4e0d\u5b58\u5728\u6642\u5247\u5efa\u7acb',
+  folder_add_as_space_btn: '\u65b0\u589e\u70ba\u5de5\u4f5c\u5340',
+  folder_add_as_space_msg: '\u662f\u5426\u5c07\u6b64\u8cc7\u6599\u593e\u65b0\u589e\u70ba\u5de5\u4f5c\u5340\uff1f',
+  folder_add_as_space_title: '\u65b0\u589e\u70ba\u5de5\u4f5c\u5340\uff1f',
    remove_title: '\u79fb\u9664',
    empty_dir: '(空)',
    upload_failed: '上傳失敗：',
@@ -3125,6 +3156,7 @@ const LOCALES = {
    cron_schedule_required: '\u9700\u8981\u6392\u7a0b',
    cron_schedule_required_example: '\u9700\u8981\u6392\u7a0b\uff08\u4f8b\u5982 "0 9 * * *" \u6216 "every 1h"\uff09',
    cron_status_active: '\u6d3b\u8e8d\u4e2d',
+    cron_status_running: '\u57f7\u884c\u4e2d\u2026',
    cron_status_error: '\u932f\u8aa4',
    cron_status_off: '\u672a\u555f\u7528',
    cron_status_paused: '\u5df2\u66ab\u505c',
@@ -69,6 +69,7 @@
    </span>
    <span class="app-titlebar-title" id="appTitlebarTitle">Hermes</span>
    <span class="app-titlebar-sub" id="appTitlebarSub" hidden></span>
+    <div class="tps-chip" id="tpsStat" title="Tokens per second / minute">0.0 t/s · 0.0 high</div>
  </div>
  <div class="app-titlebar-spacer" aria-hidden="true"></div>
 </header>
@@ -456,7 +457,7 @@
        <div class="main-view-actions">
          <button id="btnRunTaskDetail" class="panel-head-btn" title="Run now" data-i18n-title="cron_run_now" onclick="runCurrentCron()" style="display:none"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true"><polygon points="5 3 19 12 5 21 5 3"/></svg></button>
          <button id="btnPauseTaskDetail" class="panel-head-btn" title="Pause" data-i18n-title="cron_pause" onclick="pauseCurrentCron()" style="display:none"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true"><rect x="6" y="4" width="4" height="16"/><rect x="14" y="4" width="4" height="16"/></svg></button>
-          <button id="btnResumeTaskDetail" class="panel-head-btn" title="Resume" data-i18n-title="cron_resume" onclick="resumeCurrentCron()" style="display:none"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true"><polygon points="5 3 19 12 5 21 5 3"/></svg></button>
+          <button id="btnResumeTaskDetail" class="panel-head-btn" title="Resume" data-i18n-title="cron_resume" onclick="resumeCurrentCron()" style="display:none"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true"><polygon points="5 3 19 12 5 21 5 3"/><line x1="22" y1="4" x2="22" y2="20"/></svg></button>
          <button id="btnEditTaskDetail" class="panel-head-btn" title="Edit" data-i18n-title="edit" onclick="editCurrentCron()" style="display:none"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true"><path d="M12 20h9"/><path d="M16.5 3.5a2.121 2.121 0 0 1 3 3L7 19l-4 1 1-4L16.5 3.5z"/></svg></button>
          <button id="btnDeleteTaskDetail" class="panel-head-btn" title="Delete" data-i18n-title="delete_title" onclick="deleteCurrentCron()" style="display:none"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true"><path d="M3 6h18"/><path d="M19 6v14a2 2 0 0 1-2 2H7a2 2 0 0 1-2-2V6"/><path d="M8 6V4a2 2 0 0 1 2-2h4a2 2 0 0 1 2 2v2"/></svg></button>
          <button id="btnCancelTaskDetail" class="panel-head-btn" title="Cancel" data-i18n-title="cancel" onclick="cancelCronForm()" style="display:none"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" aria-hidden="true"><line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/></svg></button>
@@ -77,6 +77,7 @@ async function send(){
  if(typeof saveInflightState==='function'){
    saveInflightState(activeSid,{streamId:null,messages:INFLIGHT[activeSid].messages,uploaded:uploadedNames,toolCalls:[]});
  }
+  if(typeof renderSessionListFromCache==='function') renderSessionListFromCache();
  startApprovalPolling(activeSid);
  startClarifyPolling(activeSid);
  S.activeStreamId = null;  // will be set after stream starts
@@ -754,6 +755,20 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
      }catch(err){}
    });

+    source.addEventListener('metering',e=>{
+      // TPS + HIGH/LOW stats for the header chip — emitted at 1 Hz during a stream,
+      // silenced entirely when no sessions are active (ticker exits when idle).
+      try{
+        const d=JSON.parse(e.data||'{}');
+        const el=$('tpsStat');
+        if(!el) return;
+        const tps=typeof d.tps==='number'?d.tps.toFixed(1):'0.0';
+        const high=typeof d.high==='number' && d.high>=0?d.high.toFixed(1)+' high':'—';
+        const low=typeof d.low==='number' && d.low>=0?d.low.toFixed(1)+' low':'';
+        el.textContent=`${tps} t/s · ${high}${low?' · '+low:''}`;
+      }catch(_){}
+    });
+
    source.addEventListener('apperror',e=>{
      _terminalStateReached=true;
      if(_persistTimer){clearTimeout(_persistTimer);_persistTimer=null;}
@@ -775,8 +790,9 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
          const isRateLimit=d.type==='rate_limit';
          const isQuotaExhausted=d.type==='quota_exhausted';
          const isAuthMismatch=d.type==='auth_mismatch';
+          const isModelNotFound=d.type==='model_not_found';
          const isNoResponse=d.type==='no_response';
-          const label=isQuotaExhausted?'Out of credits':isRateLimit?'Rate limit reached':isAuthMismatch?(typeof t==='function'?t('provider_mismatch_label'):'Provider mismatch'):isNoResponse?'No response received':'Error';
+          const label=isQuotaExhausted?'Out of credits':isRateLimit?'Rate limit reached':isAuthMismatch?(typeof t==='function'?t('provider_mismatch_label'):'Provider mismatch'):isModelNotFound?(typeof t==='function'?t('model_not_found_label'):'Model not found'):isNoResponse?'No response received':'Error';
          const hint=d.hint?`\n\n*${d.hint}*`:'';
          S.messages.push({role:'assistant',content:`**${label}:** ${d.message}${hint}`});
        }catch(_){
@@ -257,6 +257,7 @@ function openCronDetail(id, el){
 }

 function _clearCronDetail(){
+  if (_cronRunningPoll) { clearInterval(_cronRunningPoll); _cronRunningPoll = null; }
  _currentCronDetail = null;
  _cronMode = 'empty';
  const title = $('taskDetailTitle');
@@ -487,14 +488,55 @@ function _cronOutputSnippet(content) {
  return body.slice(0, 600) || '(empty)';
 }

+let _cronRunningPoll = null; // timer for polling job status after trigger
+
 async function cronRun(id) {
  try {
    await api('/api/crons/run', {method:'POST', body: JSON.stringify({job_id: id})});
    showToast(t('cron_job_triggered'));
-    setTimeout(() => { if (_currentCronDetail && _currentCronDetail.id === id) _loadCronDetailRuns(id); }, 5000);
+    // Immediately show "running" state in detail if this job is selected
+    if (_currentCronDetail && _currentCronDetail.id === id) {
+      _setCronDetailStatus('running');
+      _startCronRunningPoll(id);
+    }
  } catch(e) { showToast(t('failed_colon') + e.message, 4000); }
 }

+function _setCronDetailStatus(status) {
+  const badge = document.querySelector('#taskDetailBody .detail-badge');
+  if (!badge) return;
+  if (status === 'running') {
+    badge.className = 'detail-badge running';
+    badge.textContent = t('cron_status_running');
+  }
+}
+
+function _startCronRunningPoll(jobId) {
+  // Clear any existing poll
+  if (_cronRunningPoll) { clearInterval(_cronRunningPoll); _cronRunningPoll = null; }
+  let attempts = 0;
+  const maxAttempts = 10; // 10 * 3s = 30s max
+  _cronRunningPoll = setInterval(async () => {
+    attempts++;
+    if (!_currentCronDetail || _currentCronDetail.id !== jobId || attempts > maxAttempts) {
+      clearInterval(_cronRunningPoll);
+      _cronRunningPoll = null;
+      // Re-render detail with real status when poll ends (fallback from "running" indicator)
+      if (_currentCronDetail && _currentCronDetail.id === jobId) {
+        const refreshed = _cronList ? _cronList.find(j => j.id === jobId) : null;
+        if (refreshed) _renderCronDetail(refreshed);
+      }
+      return;
+    }
+    try {
+      await loadCrons();
+      // loadCrons() re-renders the detail which overwrites our "running" badge.
+      // Re-apply the running indicator if poll is still active.
+      if (_cronRunningPoll) _setCronDetailStatus('running');
+    } catch(e) { /* ignore */ }
+  }, 3000);
+}
+
 async function cronPause(id) {
  try {
    await api('/api/crons/pause', {method:'POST', body: JSON.stringify({job_id: id})});
@@ -1406,6 +1448,12 @@ function _renderWorkspaceForm({ name, path, isEdit }){
          </div>
          ${pathHint}
        </div>
+        ${!isEdit?`<div class="detail-form-row">
+          <label class="detail-form-check">
+            <input type="checkbox" id="workspaceFormAutoCreate">
+            ${esc(t('workspace_auto_create_folder')||'Create folder if it doesn\'t exist')}
+          </label>
+        </div>`:''}
        <div id="workspaceFormError" class="detail-form-error" style="display:none"></div>
      </form>
    </div>`;
@@ -1451,7 +1499,7 @@ async function saveWorkspaceForm(){
      openWorkspaceDetail(targetPath);
      return;
    }
-    const data = await api('/api/workspaces/add', { method:'POST', body: JSON.stringify({ path }) });
+    const data = await api('/api/workspaces/add', { method:'POST', body: JSON.stringify({ path, name, create: ($('workspaceFormAutoCreate')&&$('workspaceFormAutoCreate').checked)||false }) });
    _workspaceList = data.workspaces || [];
    _workspacePreFormDetail = null;
    // Apply rename if a friendly name was supplied
@@ -109,7 +109,7 @@ async function loadSession(sid){
  // Guard against network/server failures to prevent a permanently stuck loading state.
  let data;
  try {
-    data = await api(`/api/session?session_id=${encodeURIComponent(sid)}&messages=0`);
+    data = await api(`/api/session?session_id=${encodeURIComponent(sid)}&messages=0&resolve_model=0`);
  } catch(e) {
    const _msgInner = $('msgInner');
    if (_msgInner) {
@@ -119,6 +119,7 @@ async function loadSession(sid){
    return;
  }
  S.session=data.session;
+  S.session._modelResolutionDeferred=true;
  S.lastUsage={...(data.session.last_usage||{})};
  _setSessionViewedCount(S.session.session_id, Number(data.session.message_count || 0));
  localStorage.setItem('hermes-webui-session',S.session.session_id);
@@ -254,6 +255,23 @@ async function loadSession(sid){
      threshold_tokens:  _pick(u.threshold_tokens,  _s.threshold_tokens),
    });
  }
+  _resolveSessionModelForDisplaySoon(sid);
+}
+
+function _resolveSessionModelForDisplaySoon(sid){
+  if(!sid) return;
+  setTimeout(async()=>{
+    try{
+      const data=await api(`/api/session?session_id=${encodeURIComponent(sid)}&messages=0&resolve_model=1`);
+      const model=data&&data.session&&data.session.model;
+      if(!model||!S.session||S.session.session_id!==sid) return;
+      S.session.model=model;
+      S.session._modelResolutionDeferred=false;
+      syncTopbar();
+    }catch(_){
+      // Keep session switching non-blocking; the next load can try again.
+    }
+  },0);
 }

 // Load session messages if not already present.
@@ -266,7 +284,7 @@ async function _ensureMessagesLoaded(sid) {
    return;
  }
  // Fetch full session with messages
-  const data = await api(`/api/session?session_id=${encodeURIComponent(sid)}&messages=1`);
+  const data = await api(`/api/session?session_id=${encodeURIComponent(sid)}&messages=1&resolve_model=0`);
  const msgs = (data.session.messages || []).filter(m => m && m.role);
  // Check for tool-call metadata on messages (for tool-call card rendering)
  const hasMessageToolMetadata = msgs.some(m => {
@@ -282,6 +300,11 @@ async function _ensureMessagesLoaded(sid) {
  }
  clearLiveToolCards();
  S.messages = msgs;
+  if(S.session&&S.session.session_id===sid){
+    S.session.message_count=Number(data.session.message_count || msgs.length);
+    S.lastUsage={...(data.session.last_usage||S.lastUsage||{})};
+    _setSessionViewedCount(sid, Number(S.session.message_count || msgs.length));
+  }
 }

 let _allSessions = [];  // cached for search filter
@@ -621,7 +644,7 @@ function filterSessions(){
 }

 function _sessionTimestampMs(session) {
-  const raw = Number(session && (session.updated_at || session.created_at || 0));
+  const raw = Number(session && (session.last_message_at || session.updated_at || session.created_at || 0));
  return Number.isFinite(raw) ? raw * 1000 : 0;
 }

@@ -803,7 +826,7 @@ function renderSessionListFromCache(){
    hdr.className='session-date-header'+(g.isPinned?' pinned':'');
    const caret=document.createElement('span');
    caret.className='session-date-caret';
-    caret.textContent='\u25B8'; // right-pointing triangle
+    caret.textContent='\u25BE'; // down when expanded; rotated right when collapsed
    const label=document.createElement('span');
    label.textContent=g.label;
    hdr.appendChild(caret);hdr.appendChild(label);
@@ -818,18 +841,25 @@ function renderSessionListFromCache(){
      _saveCollapsed();
    };
    wrapper.appendChild(hdr);
-    for(const s of g.items){ body.appendChild(_renderOneSession(s)); }
+    for(const s of g.items){ body.appendChild(_renderOneSession(s, Boolean(g.isPinned))); }
    wrapper.appendChild(body);
    list.appendChild(wrapper);
  }
  // ── Render session items (extracted for group body use) ──
  // Note: declared after the groups loop but available via function hoisting.
-  function _renderOneSession(s){
+  function _renderOneSession(s, isPinnedGroup=false){
    const el=document.createElement('div');
    const isActive=S.session&&s.session_id===S.session.session_id;
-    const isStreaming=Boolean(s.is_streaming);
+    const isLocalStreaming=Boolean(
+      s.session_id
+      && (
+        (isActive&&S.busy)
+        || (typeof INFLIGHT==='object'&&INFLIGHT&&INFLIGHT[s.session_id])
+      )
+    );
+    const isStreaming=Boolean(s.is_streaming||isLocalStreaming);
    const hasUnread=_hasUnreadForSession(s)&&!isActive;
-    el.className='session-item'+(isActive?' active':'')+(isActive&&S.session&&S.session._flash?' new-flash':'')+(s.archived?' archived':'')+(isStreaming?' streaming':'');
+    el.className='session-item'+(isActive?' active':'')+(isActive&&S.session&&S.session._flash?' new-flash':'')+(s.archived?' archived':'')+(isStreaming?' streaming':'')+(hasUnread?' unread':'');
    if(isActive&&S.session&&S.session._flash)delete S.session._flash;
    const rawTitle=s.title||'Untitled';
    const tags=(rawTitle.match(/#[\w-]+/g)||[]);
@@ -842,23 +872,21 @@ function renderSessionListFromCache(){
    sessionText.className='session-text';
    const titleRow=document.createElement('div');
    titleRow.className='session-title-row';
-    if(s.pinned){
+    if(s.pinned&&!isPinnedGroup){
      const pinInd=document.createElement('span');
      pinInd.className='session-pin-indicator';
      pinInd.innerHTML=ICONS.pin;
      titleRow.appendChild(pinInd);
    }
-    const state=document.createElement('span');
-    state.className='session-state-indicator'+(isStreaming?' is-streaming':(hasUnread?' is-unread':''));
-    titleRow.appendChild(state); // always reserve slot — prevents title shift when indicator appears
    const title=document.createElement('span');
    title.className='session-title';
    title.textContent=cleanTitle||'Untitled';
    title.title='Double-click to rename';
    const tsMs=_sessionTimestampMs(s);
    const ts=document.createElement('span');
-    ts.className='session-time';
-    ts.textContent=_formatRelativeSessionTime(tsMs);
+    const hasAttentionState=isStreaming||hasUnread;
+    ts.className='session-time'+(hasAttentionState?' is-hidden':'');
+    ts.textContent=hasAttentionState?'':_formatRelativeSessionTime(tsMs);
    titleRow.appendChild(title);
    titleRow.appendChild(ts);
    sessionText.appendChild(titleRow);
@@ -942,6 +970,10 @@ function renderSessionListFromCache(){
      }
    }
    el.appendChild(sessionText);
+    const state=document.createElement('span');
+    state.className='session-attention-indicator session-state-indicator'+(isStreaming?' is-streaming':(hasUnread?' is-unread':''));
+    state.setAttribute('aria-hidden','true');
+    el.appendChild(state);
    // Single trigger button that opens a shared dropdown menu
    const actions=document.createElement('div');
    actions.className='session-actions';
@@ -207,7 +207,11 @@
  body{background:var(--bg);color:var(--text);height:100vh;height:100dvh;overflow:hidden;display:flex;flex-direction:column;}
  .layout{display:flex;width:100%;flex:1 1 auto;min-height:0;}
  .app-titlebar{display:flex;align-items:center;justify-content:space-between;height:38px;flex-shrink:0;background:var(--sidebar);border-bottom:1px solid var(--border);padding:0 12px;padding-top:env(safe-area-inset-top,0);padding-left:max(12px,env(safe-area-inset-left,0));padding-right:max(12px,env(safe-area-inset-right,0));box-sizing:content-box;font-size:12px;color:var(--muted);user-select:none;-webkit-app-region:drag;position:relative;z-index:20;}
-  .app-titlebar-inner{display:flex;align-items:center;gap:8px;min-width:0;max-width:100%;flex:1 1 auto;justify-content:center;}
+  .app-titlebar-inner{display:flex;align-items:center;gap:8px;min-width:0;max-width:100%;flex:1 1 auto;justify-content:space-between;}
+  .tps-chip{
+    font-size:11px;font-family:ui-monospace,'SF Mono',monospace;color:var(--muted);
+    white-space:nowrap;letter-spacing:.02em;flex-shrink:0;
+  }
  .app-titlebar-icon{display:inline-flex;align-items:center;color:var(--accent);}
  .app-titlebar-title{font-size:12px;font-weight:600;color:var(--text);letter-spacing:-.01em;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;max-width:60vw;}
  .app-titlebar-sub{font-size:10px;color:var(--muted);background:var(--hover-bg);padding:2px 7px;border-radius:4px;font-family:'SF Mono',ui-monospace,monospace;white-space:nowrap;flex-shrink:0;}
@@ -230,7 +234,8 @@
  .sidebar-search-icon{position:absolute;left:22px;top:50%;transform:translateY(-50%);width:14px;height:14px;color:var(--muted);opacity:.7;pointer-events:none;}
  /* Inline session title edit */
  .session-title-input{flex:1;background:var(--surface);border:1px solid var(--accent);border-radius:6px;color:var(--text);padding:3px 8px;font-size:13px;outline:none;min-width:0;box-shadow:0 0 0 2px var(--accent-bg-strong);font-family:inherit;}
-  .session-item{padding:8px 40px 8px 8px;margin-bottom:2px;border-radius:8px;cursor:pointer;font-size:13px;color:var(--muted);transition:background .15s,color .15s;display:flex;align-items:flex-start;gap:8px;min-width:0;position:relative;}
+  .session-item{padding:8px 86px 8px 8px;margin-bottom:2px;border-radius:8px;cursor:pointer;font-size:13px;color:var(--muted);transition:background .15s,color .15s;display:flex;align-items:flex-start;gap:8px;min-width:0;position:relative;}
+  .session-item.streaming,.session-item.unread{padding-right:40px;}
  .session-item:hover{background:var(--hover-bg);color:var(--text);}
  .session-item.active{background:var(--accent-bg);color:var(--accent);}
  .session-item.streaming .session-title{color:var(--accent);}
@@ -242,6 +247,8 @@
  .session-meta{font-size:11px;color:var(--muted);overflow:hidden;text-overflow:ellipsis;white-space:nowrap;}
  .session-item.active .session-meta{color:var(--accent-text);opacity:.8;}
  .session-state-indicator{display:inline-flex;align-items:center;justify-content:center;flex-shrink:0;width:10px;height:10px;color:var(--accent);visibility:hidden;}
+  .session-attention-indicator{position:absolute;right:6px;top:50%;transform:translateY(-50%);width:26px;height:26px;z-index:1;pointer-events:none;transition:opacity .15s ease,visibility .15s ease;}
+  .session-item:hover .session-attention-indicator,.session-item:focus-within .session-attention-indicator,.session-item.menu-open .session-attention-indicator{opacity:0;visibility:hidden;}
  .session-state-indicator.is-streaming,.session-state-indicator.is-unread{visibility:visible;}
  .session-state-indicator::before{content:"";display:block;flex-shrink:0;}
  .session-state-indicator.is-streaming::before{
@@ -259,17 +266,31 @@
    border-radius:50%;
    background:currentColor;
  }
+  .session-attention-indicator.is-streaming::before{
+    width:10px;
+    height:10px;
+  }
+  .session-attention-indicator.is-unread::before{
+    width:8px;
+    height:8px;
+  }
  .session-time{
    display:inline-flex;
-    margin-left:auto;
+    position:absolute;
+    right:10px;
+    top:50%;
+    transform:translateY(-50%);
    color:var(--muted);
    font-size:10px;
    white-space:nowrap;
    flex-shrink:0;
  }
+  .session-time.is-hidden{display:none;}
+  .session-item:hover .session-time,.session-item:focus-within .session-time,.session-item.menu-open .session-time{display:none;}
  /* ── Session action trigger + dropdown ── */
  .session-actions{position:absolute;right:6px;top:50%;transform:translateY(-50%);display:flex;align-items:center;justify-content:center;opacity:0;pointer-events:none;transition:opacity .15s ease;}
  .session-item:hover .session-actions,.session-item:focus-within .session-actions,.session-item.menu-open .session-actions{opacity:1;pointer-events:auto;}
+  .session-item.streaming:not(:hover):not(:focus-within):not(.menu-open) .session-actions,.session-item.unread:not(:hover):not(:focus-within):not(.menu-open) .session-actions{opacity:0;pointer-events:none;}
  .session-actions-trigger{width:26px;height:26px;border:1px solid transparent;border-radius:8px;background:transparent;color:var(--muted);cursor:pointer;padding:0;line-height:1;display:inline-flex;align-items:center;justify-content:center;transition:background .12s,color .12s,border-color .12s;}
  .session-actions-trigger:hover{background:var(--hover-bg);color:var(--text);}
  .session-actions-trigger.active{background:var(--accent-bg);border-color:var(--accent-bg-strong);color:var(--text);}
@@ -294,7 +315,7 @@
  .session-date-header{display:flex;align-items:center;gap:5px;font-size:10px;font-weight:700;text-transform:uppercase;letter-spacing:.08em;color:var(--muted);padding:8px 10px 4px;cursor:pointer;user-select:none;opacity:.8;transition:opacity .15s;}
  .session-date-header:hover{opacity:1;}
  .session-date-header.pinned{color:var(--accent);}
-  .session-date-caret{font-size:9px;transition:transform .2s;flex-shrink:0;display:inline-block;}
+  .session-date-caret{font-size:9px;transition:transform .2s;flex-shrink:0;display:inline-block;transform:rotate(0deg);}
  .session-date-caret.collapsed{transform:rotate(-90deg);}
  .app-dialog-overlay{position:fixed;inset:0;background:rgba(7,12,19,.62);backdrop-filter:blur(6px);z-index:1100;display:none;align-items:center;justify-content:center;padding:24px;}
  .app-dialog{width:min(460px,100%);background:linear-gradient(180deg,rgba(21,31,45,.98),rgba(13,20,31,.98));border:1px solid var(--accent-bg-strong);border-radius:18px;box-shadow:0 18px 60px rgba(0,0,0,.45);padding:18px 18px 16px;color:var(--text);}
@@ -597,7 +618,7 @@
  .composer-profile-chip.active{background:var(--accent-bg);border-color:var(--accent-bg-strong);}
  .composer-profile-icon,.composer-profile-chevron{display:inline-flex;align-items:center;justify-content:center;flex-shrink:0;line-height:1;}
  .composer-profile-label{min-width:0;overflow:hidden;text-overflow:ellipsis;white-space:nowrap;}
-  .composer-ws-wrap{position:relative;flex:0 1 auto;min-width:0;}
+  .composer-ws-wrap{position:relative;flex:0 1 auto;min-width:0;display:flex;align-items:center;gap:4px;}
  .composer-workspace-group{display:inline-flex;align-items:stretch;max-width:240px;border-radius:999px;overflow:hidden;background-color:transparent;transition:background-color .15s;}
  .composer-workspace-group:hover{background-color:var(--hover-bg);}
  .composer-workspace-group:hover .composer-workspace-files-btn,
@@ -812,7 +833,8 @@
    .ctx-tooltip{right:-4px;min-width:190px;max-width:220px;}
    /* Touch targets — minimum 44px */
    .icon-btn,.mic-btn{min-width:44px;min-height:44px;}
-    .session-item{min-height:44px;padding:10px 40px 10px 12px;}
+    .session-item{min-height:44px;padding:10px 86px 10px 12px;}
+    .session-item.streaming,.session-item.unread{padding-right:40px;}
    .session-actions{opacity:1;pointer-events:auto;}
    /* Empty state */
    .empty-state h2{font-size:18px;}
@@ -2108,7 +2130,7 @@ main.main.showing-profiles > #mainProfiles{display:flex;}
 .main-view-header{
  display:flex;align-items:center;justify-content:space-between;gap:16px;
  min-height:41px;padding:8px 32px;border-bottom:1px solid var(--border);
-  flex-shrink:0;background:var(--bg);
+  flex-shrink:0;background:var(--bg);position:relative;z-index:10;
 }
 .main-view-title{font-size:18px;font-weight:600;letter-spacing:-.01em;color:var(--text);line-height:1.3;min-width:0;overflow:hidden;text-overflow:ellipsis;white-space:nowrap;}
 .main-view-actions{display:flex;align-items:center;gap:4px;flex-shrink:0;}
@@ -2150,6 +2172,9 @@ main.main > .main-view:not([id="mainChat"]):not([id="mainSettings"]) .main-view-
 .detail-badge.ok{background:rgba(100,200,100,.12);color:rgba(100,200,100,.95);border-color:rgba(100,200,100,.3);}
 .detail-badge.warn{background:rgba(220,140,60,.12);color:rgba(220,140,60,.95);border-color:rgba(220,140,60,.3);}
 .detail-badge.err{background:rgba(220,80,80,.12);color:rgba(220,80,80,.95);border-color:rgba(220,80,80,.3);}
+.detail-badge.running{background:rgba(59,130,246,.12);color:rgba(96,165,250,.95);border-color:rgba(96,165,250,.3);}
+.detail-badge.running::before{content:'';display:inline-block;width:10px;height:10px;border:2px solid rgba(96,165,250,.4);border-top-color:rgba(96,165,250,.95);border-radius:50%;margin-right:6px;vertical-align:middle;animation:cron-spinner .6s linear infinite;}
+@keyframes cron-spinner{to{transform:rotate(360deg);}}
 .detail-prompt{background:var(--sidebar);border:1px solid var(--border);border-radius:8px;padding:10px 12px;font-size:12px;white-space:pre-wrap;line-height:1.55;color:var(--text);font-family:'SF Mono',ui-monospace,monospace;max-height:240px;overflow-y:auto;}
 .detail-run-item{border-top:1px solid var(--border);padding:8px 0;}
 .detail-run-item:first-child{border-top:none;}
@@ -1428,6 +1428,7 @@ function syncTopbar(){
    // If the model isn't in the current provider list, silently reset to the
    // first available model so stale values don't pollute the picker (#829).
    if(!applied && currentModel){
+      const deferModelCorrection=Boolean(S.session._modelResolutionDeferred);
      // Stale session model not in the current provider catalog — reset to the
      // first available model rather than injecting an "(unavailable)" option
      // that visually appears under the wrong provider group (#829).
@@ -1435,13 +1436,15 @@ function syncTopbar(){
      const first=modelSel&&modelSel.querySelector('optgroup > option, option');
      if(first){
        modelSel.value=first.value;
-        S.session.model=first.value;
-        // Persist the correction so the session doesn't re-inject on next load.
-        fetch(new URL('api/session/update',location.href).href,{
-          method:'POST',credentials:'include',
-          headers:{'Content-Type':'application/json'},
-          body:JSON.stringify({session_id:S.session.id||S.session.session_id,model:first.value})
-        }).catch(()=>{});
+        if(!deferModelCorrection){
+          S.session.model=first.value;
+          // Persist the correction so the session doesn't re-inject on next load.
+          fetch(new URL('api/session/update',location.href).href,{
+            method:'POST',credentials:'include',
+            headers:{'Content-Type':'application/json'},
+            body:JSON.stringify({session_id:S.session.id||S.session.session_id,model:first.value})
+          }).catch(()=>{});
+        }
      }
    }
  }
@@ -2176,6 +2179,9 @@ function buildToolCard(tc){
 // message column eliminates the visible "jump" users saw when renderMessages
 // fired on the done event.
 function appendLiveToolCard(tc){
+  // Guard: ignore if session was switched. Prevents stale tool events from
+  // a previous session's SSE stream from manipulating the new session's DOM.
+  if(!S.session||!S.activeStreamId) return;
  let turn=$('liveAssistantTurn');
  if(!turn){
    appendThinking();
@@ -2463,10 +2469,22 @@ function finalizeThinkingCard(){
    row.remove();
    return;
  }
+  // If the user was watching (scroll pinned = at bottom), scroll the thinking
+  // card back to the top so the completed response is visible underneath without
+  // the thinking content blocking it. If they scrolled up to read history,
+  // leave their scroll position intact.
+  if(_scrollPinned){
+    const body=row&&row.querySelector('.thinking-card-body');
+    if(body) body.scrollTop=0;
+  }
  row.removeAttribute('id');
  row.removeAttribute('data-thinking-active');
 }
 function appendThinking(text=''){
+  // Guard: ignore if session was switched during an async SSE stream.
+  // The old stream's reasoning events can still fire after switch;
+  // without this check they would pollute the new session's DOM.
+  if(!S.session||!S.activeStreamId) return;
  $('emptyState').style.display='none';
  let turn=$('liveAssistantTurn');
  if(!turn){
@@ -2496,6 +2514,12 @@ function appendThinking(text=''){
  row.className=(text&&String(text).trim())?'assistant-segment thinking-card-row':'assistant-segment';
  row.innerHTML=_thinkingMarkup(text);
  scrollIfPinned();
+  // Auto-scroll the thinking card body to bottom if the user is watching
+  // (scroll pinned). If the user scrolled up to read history, leave it alone.
+  if(_scrollPinned){
+    const body=row&&row.querySelector('.thinking-card-body');
+    if(body) body.scrollTop=body.scrollHeight;
+  }
 }
 function updateThinking(text=''){appendThinking(text);}
 function removeThinking(){
@@ -2759,6 +2783,24 @@ async function promptNewFolder(){
    await api('/api/file/create-dir',{method:'POST',body:JSON.stringify({session_id:S.session.session_id,path:relPath})});
    showToast(t('folder_created')+name.trim());
    await loadDir(S.currentDir);
+    // Offer to add the new folder as a space (#782)
+    const absPath=S.session.workspace?((S.currentDir==='.'?S.session.workspace:S.session.workspace+'/'+S.currentDir)+'/'+name.trim()):null;
+    if(absPath){
+      const addAsSpace=await showConfirmDialog({
+        title:t('folder_add_as_space_title'),
+        message:t('folder_add_as_space_msg'),
+        confirmLabel:t('folder_add_as_space_btn'),
+        focusCancel:true
+      });
+      if(addAsSpace){
+        try{
+          const data=await api('/api/workspaces/add',{method:'POST',body:JSON.stringify({path:absPath})});
+          if(typeof _workspaceList!=='undefined')_workspaceList=data.workspaces||_workspaceList||[];
+          if(typeof renderWorkspacesPanel==='function')renderWorkspacesPanel(_workspaceList);
+          showToast(t('workspace_added'));
+        }catch(e2){setStatus((t('error_prefix')||'Error: ')+e2.message);}
+      }
+    }
  }catch(e){setStatus(t('folder_create_failed')+e.message);}
 }

@@ -156,6 +156,79 @@ def test_gateway_sessions_appear_when_enabled():
        post('/api/settings', {'show_cli_sessions': False})


+def test_gateway_sessions_without_messages_are_hidden_from_sidebar():
+    """Regression: empty agent session rows must not appear as broken sidebar entries."""
+    conn = _ensure_state_db()
+    empty_sid = 'gw_empty_no_messages_001'
+    try:
+        conn.execute(
+            "INSERT OR REPLACE INTO sessions (id, source, title, model, started_at, message_count) "
+            "VALUES (?, ?, ?, ?, ?, ?)",
+            (empty_sid, 'cron', 'Cron Session', 'openai/gpt-5', time.time(), 0),
+        )
+        conn.execute("DELETE FROM messages WHERE session_id = ?", (empty_sid,))
+        conn.commit()
+
+        post('/api/settings', {'show_cli_sessions': True})
+
+        data, status = get('/api/sessions')
+        assert status == 200
+        sessions = data.get('sessions', [])
+        assert empty_sid not in {s.get('session_id') for s in sessions}, (
+            "Agent sessions with no readable message rows should be filtered before "
+            "they reach the sidebar; otherwise clicking them fails during import."
+        )
+    finally:
+        try:
+            _remove_test_sessions(conn, empty_sid)
+            conn.close()
+        except Exception:
+            pass
+        post('/api/settings', {'show_cli_sessions': False})
+
+
+def test_gateway_watcher_hides_sessions_without_messages(monkeypatch):
+    """Regression: SSE watcher must use the same importable-agent filter."""
+    conn = _ensure_state_db()
+    empty_sid = 'gw_empty_watcher_001'
+    live_sid = 'gw_live_watcher_001'
+    try:
+        conn.execute(
+            "INSERT OR REPLACE INTO sessions (id, source, title, model, started_at, message_count) "
+            "VALUES (?, ?, ?, ?, ?, ?)",
+            (empty_sid, 'cron', 'Empty Cron Session', 'openai/gpt-5', time.time(), 0),
+        )
+        conn.execute("DELETE FROM messages WHERE session_id = ?", (empty_sid,))
+        _insert_gateway_session(
+            conn,
+            session_id=live_sid,
+            source='cron',
+            title='Live Cron Session',
+            message_count=0,
+        )
+
+        import api.gateway_watcher as gateway_watcher
+
+        monkeypatch.setattr(gateway_watcher, '_get_state_db_path', _get_state_db_path)
+
+        sessions = gateway_watcher._get_agent_sessions_from_db()
+        ids = {s.get('session_id') for s in sessions}
+        live = next((s for s in sessions if s.get('session_id') == live_sid), None)
+
+        assert empty_sid not in ids
+        assert live is not None
+        assert live.get('message_count') == 2, (
+            "Watcher should fall back to actual message rows when stored "
+            "message_count is zero, matching the sidebar route."
+        )
+    finally:
+        try:
+            _remove_test_sessions(conn, empty_sid, live_sid)
+            conn.close()
+        except Exception:
+            pass
+
+
 def test_gateway_sessions_excluded_when_disabled():
    """Gateway sessions are NOT returned when show_cli_sessions is off."""
    conn = _ensure_state_db()
@@ -0,0 +1,198 @@
+"""
+Tests for issue #1014 — model-not-found error classification.
+
+Covers:
+  1. streaming.py: 404/model-not-found errors detected and classified as 'model_not_found'
+  2. streaming.py: HTML tags stripped from provider error messages before classification
+  3. static/messages.js: apperror handler has model_not_found branch
+  4. static/i18n.js: model_not_found_label key present in all locales
+  5. streaming.py: model_not_found checked after auth but before generic error
+"""
+import pathlib
+import re
+
+REPO_ROOT = pathlib.Path(__file__).parent.parent.resolve()
+
+
+def _read(rel_path: str) -> str:
+    return (REPO_ROOT / rel_path).read_text(encoding="utf-8")
+
+
+# ── 1. streaming.py: model-not-found error detection ─────────────────────────
+
+class TestStreamingModelNotFoundDetection:
+    """streaming.py must classify 404/model-not-found errors as model_not_found."""
+
+    def test_model_not_found_type_defined_in_streaming(self):
+        """'model_not_found' type must be emitted for 404 errors."""
+        src = _read("api/streaming.py")
+        assert "model_not_found" in src, (
+            "model_not_found type not found in streaming.py — "
+            "404 errors will not be surfaced with a helpful message"
+        )
+
+    def test_is_not_found_flag_defined(self):
+        """_exc_is_not_found variable must exist in the exception handler."""
+        src = _read("api/streaming.py")
+        assert "_exc_is_not_found" in src, (
+            "_exc_is_not_found flag not found in streaming.py"
+        )
+
+    def test_not_found_detects_404(self):
+        """'404' must be part of the model-not-found detection logic."""
+        src = _read("api/streaming.py")
+        idx = src.find("_exc_is_not_found")
+        assert idx != -1, "_exc_is_not_found not found"
+        block = src[idx:idx + 600]
+        assert "'404'" in block or '"404"' in block, (
+            "'404' not in model-not-found detection block"
+        )
+
+    def test_not_found_detects_not_found_string(self):
+        """'not found' must be part of the detection logic."""
+        src = _read("api/streaming.py")
+        idx = src.find("_exc_is_not_found")
+        block = src[idx:idx + 600]
+        assert "not found" in block.lower(), (
+            "'not found' not in model-not-found detection block"
+        )
+
+    def test_not_found_detects_does_not_exist(self):
+        """'does not exist' must be part of the detection logic."""
+        src = _read("api/streaming.py")
+        idx = src.find("_exc_is_not_found")
+        block = src[idx:idx + 600]
+        assert "does not exist" in block.lower(), (
+            "'does not exist' not in model-not-found detection block"
+        )
+
+    def test_not_found_detects_invalid_model(self):
+        """'invalid model' must be part of the detection logic."""
+        src = _read("api/streaming.py")
+        idx = src.find("_exc_is_not_found")
+        block = src[idx:idx + 600]
+        assert "invalid model" in block.lower(), (
+            "'invalid model' not in model-not-found detection block"
+        )
+
+    def test_not_found_hint_mentions_settings(self):
+        """The model_not_found hint must mention Settings or hermes model."""
+        src = _read("api/streaming.py")
+        idx = src.find("model_not_found")
+        block = src[idx:idx + 500]
+        assert "Settings" in block or "hermes model" in block, (
+            "model_not_found hint must mention Settings or hermes model command"
+        )
+
+    def test_not_found_check_order_after_auth(self):
+        """model_not_found must be checked after auth_mismatch (auth first)."""
+        src = _read("api/streaming.py")
+        auth_idx = src.find("elif _exc_is_auth")
+        nf_idx = src.find("elif _exc_is_not_found")
+        assert auth_idx != -1, "_exc_is_auth not found"
+        assert nf_idx != -1, "_exc_is_not_found not found"
+        assert auth_idx < nf_idx, (
+            "auth_mismatch should be checked before model_not_found — "
+            "auth errors must not be mistaken for not-found errors"
+        )
+
+
+# ── 2. streaming.py: HTML sanitization ───────────────────────────────────────
+
+class TestStreamingHtmlSanitization:
+    """Provider error messages containing HTML must be stripped."""
+
+    def test_html_strip_before_classification(self):
+        """HTML tags must be stripped before error classification."""
+        src = _read("api/streaming.py")
+        # Find the HTML sanitization block in the exception handler
+        # It should appear before _exc_lower = err_str.lower()
+        sanitize_idx = src.find("re.sub(r'<[^>]+>'")
+        exc_lower_idx = src.find("_exc_lower = err_str.lower()")
+        assert sanitize_idx != -1, (
+            "HTML tag stripping (re.sub) not found in streaming.py exception handler"
+        )
+        assert exc_lower_idx != -1, "_exc_lower not found"
+        assert sanitize_idx < exc_lower_idx, (
+            "HTML sanitization must happen before error classification"
+        )
+
+    def test_whitespace_normalization(self):
+        """Stripped HTML must have whitespace collapsed."""
+        src = _read("api/streaming.py")
+        sanitize_idx = src.find("re.sub(r'<[^>]+>'")
+        block = src[sanitize_idx:sanitize_idx + 300]
+        assert r"\s+" in block, (
+            "Whitespace normalization (\\s+) not found after HTML strip"
+        )
+
+
+# ── 3. static/messages.js: apperror handler ──────────────────────────────────
+
+class TestApperrorModelNotFound:
+    """messages.js apperror handler must handle model_not_found type."""
+
+    def test_model_not_found_type_handled(self):
+        """apperror handler must check for type='model_not_found'."""
+        src = _read("static/messages.js")
+        assert "model_not_found" in src, (
+            "model_not_found type not handled in messages.js apperror handler"
+        )
+
+    def test_model_not_found_label(self):
+        """'Model not found' label must appear in the error handling."""
+        src = _read("static/messages.js")
+        assert "Model not found" in src, (
+            "'Model not found' label not found in messages.js"
+        )
+
+    def test_is_model_not_found_variable(self):
+        """isModelNotFound variable must be defined."""
+        src = _read("static/messages.js")
+        assert "isModelNotFound" in src, (
+            "isModelNotFound variable not found in messages.js apperror handler"
+        )
+
+
+# ── 4. static/i18n.js: all locales ───────────────────────────────────────────
+
+class TestI18nModelNotFound:
+    """All locales must have model_not_found_label."""
+
+    REQUIRED_KEY = "model_not_found_label"
+
+    def _locale_names(self, src: str) -> list:
+        pattern = re.compile(
+            r"^\s{2}(?:'(?P<quoted>[A-Za-z0-9-]+)'|(?P<plain>[A-Za-z0-9-]+))\s*:\s*\{",
+            re.MULTILINE,
+        )
+        names = []
+        for match in pattern.finditer(src):
+            names.append(match.group("quoted") or match.group("plain"))
+        return names
+
+    def _count_key(self, src: str, key: str) -> int:
+        return len(re.findall(r'\b' + re.escape(key) + r'\b', src))
+
+    def test_all_locales_have_model_not_found_label(self):
+        """model_not_found_label must appear in all locales."""
+        src = _read("static/i18n.js")
+        locale_count = len(self._locale_names(src))
+        count = self._count_key(src, self.REQUIRED_KEY)
+        assert count >= locale_count, (
+            f"model_not_found_label found {count} times, expected >= {locale_count} "
+            f"(one per locale)"
+        )
+
+    def test_english_label_is_plain_string(self):
+        """English model_not_found_label must be a plain string, not a function."""
+        src = _read("static/i18n.js")
+        en_start = src.find("\n  en: {")
+        es_start = src.find("\n  es: {")
+        en_block = src[en_start:es_start]
+        assert self.REQUIRED_KEY in en_block, "Key not in en block"
+        idx = en_block.find(self.REQUIRED_KEY)
+        line = en_block[idx:idx + 200]
+        assert "=>" not in line, (
+            "model_not_found_label should be a plain string, not an arrow function"
+        )
@@ -14,7 +14,10 @@ import pathlib
 import re

 MODELS_PY = pathlib.Path(__file__).parent.parent / 'api' / 'models.py'
+AGENT_SESSIONS_PY = pathlib.Path(__file__).parent.parent / 'api' / 'agent_sessions.py'
 src = MODELS_PY.read_text(encoding='utf-8')
+agent_src = AGENT_SESSIONS_PY.read_text(encoding='utf-8')
+combined_src = src + "\n" + agent_src


 class TestCliSessionsErrorSurface:
@@ -22,16 +25,16 @@ class TestCliSessionsErrorSurface:

    def test_schema_introspection_present(self):
        """The function must check for the 'source' column before querying."""
-        assert "PRAGMA table_info(sessions)" in src
+        assert "PRAGMA table_info(sessions)" in combined_src

    def test_missing_source_column_logs_warning(self):
        """If 'source' column is absent, a warning is logged."""
        # The warning message must mention the missing column and how to fix it
-        assert "no 'source' column" in src or "has no 'source' column" in src
+        assert "no 'source' column" in combined_src or "has no 'source' column" in combined_src

    def test_missing_source_column_suggests_upgrade(self):
        """Warning message must suggest upgrading hermes-agent."""
-        assert "Upgrade hermes-agent" in src or "upgrade hermes-agent" in src.lower()
+        assert "Upgrade hermes-agent" in combined_src or "upgrade hermes-agent" in combined_src.lower()

    def test_exception_path_logs_warning(self):
        """The except clause must call logger.warning, not silently pass."""
@@ -67,8 +70,8 @@ class TestCliSessionsErrorSurface:

    def test_source_column_check_before_sql_query(self):
        """Schema check must happen before the main SQL SELECT."""
-        pragma_pos = src.find("PRAGMA table_info(sessions)")
-        select_pos = src.find("SELECT s.id, s.title, s.model")
+        pragma_pos = agent_src.find("PRAGMA table_info(sessions)")
+        select_pos = agent_src.find("SELECT s.id, s.title, s.model")
        assert pragma_pos != -1, "PRAGMA check not found"
        assert select_pos != -1, "SELECT query not found"
        assert pragma_pos < select_pos, \
@@ -11,6 +11,10 @@ def test_pinned_indicator_renders_inside_title_row():
    title_row_idx = SESSIONS_JS.find("titleRow.className='session-title-row';")
    assert title_row_idx != -1, "session title row construction not found"

+    assert "body.appendChild(_renderOneSession(s, Boolean(g.isPinned)))" in SESSIONS_JS
+    assert "function _renderOneSession(s, isPinnedGroup=false)" in SESSIONS_JS
+    assert "if(s.pinned&&!isPinnedGroup){" in SESSIONS_JS
+
    pin_idx = SESSIONS_JS.find("pinInd.className='session-pin-indicator';", title_row_idx)
    assert pin_idx != -1, "pinned indicator creation not found after title row"

@@ -32,26 +36,88 @@ def test_pinned_indicator_uses_fixed_indicator_box():
    assert "justify-content:center;" in css_block, "pin indicator should center the star inside its box"


-def test_state_indicator_always_appended_to_prevent_layout_shift():
-    """State span is always added to the DOM (visibility:hidden when inactive) to prevent
-    titles shifting left/right when the spinner or unread dot appears/disappears."""
+def test_state_indicator_uses_right_actions_slot_to_prevent_title_shift():
+    """State span reuses the right-side action slot so the title start position
+    does not shift when the spinner or unread dot appears/disappears."""
    title_row_idx = SESSIONS_JS.find("titleRow.className='session-title-row';")
    assert title_row_idx != -1, "title row construction not found"

-    # state span must be appended unconditionally (no surrounding if-check)
-    append_idx = SESSIONS_JS.find("titleRow.appendChild(state);", title_row_idx)
-    assert append_idx != -1, "state span must always be appended to titleRow"
-
-    # Verify CSS uses visibility:hidden to reserve the slot
-    assert "session-state-indicator{" in STYLE_CSS, "session-state-indicator CSS rule missing"
-    base_block_start = STYLE_CSS.find("session-state-indicator{")
-    base_block_end = STYLE_CSS.find("}", base_block_start)
-    base_block = STYLE_CSS[base_block_start:base_block_end]
-    assert "visibility:hidden;" in base_block, (
-        "session-state-indicator should default to visibility:hidden so it reserves slot "
-        "without being visible — prevents title layout shift on state changes"
+    title_row_append_idx = SESSIONS_JS.find("titleRow.appendChild(state);", title_row_idx)
+    assert title_row_append_idx == -1, (
+        "state indicator should not be inserted before the title; it should reuse "
+        "the right-side actions slot to avoid title shift"
    )

+    state_idx = SESSIONS_JS.find("state.className='session-attention-indicator session-state-indicator'")
+    assert state_idx != -1, "right-side attention indicator creation not found"
+
+    append_to_row_idx = SESSIONS_JS.find("el.appendChild(state);", state_idx)
+    assert append_to_row_idx != -1, "state indicator should be appended to the outer row"
+
+    actions_idx = SESSIONS_JS.find("actions.className='session-actions';", append_to_row_idx)
+    assert actions_idx != -1, "session actions should still be appended after attention indicator"
+
+    assert ".session-attention-indicator{" in STYLE_CSS, "attention indicator CSS rule missing"
+    css_block = STYLE_CSS[
+        STYLE_CSS.find(".session-attention-indicator{"):
+        STYLE_CSS.find(".session-item:hover .session-attention-indicator")
+    ]
+    assert "position:absolute;" in css_block, "attention indicator should be positioned in the row action slot"
+    assert "right:6px;" in css_block, "attention indicator should align with the actions trigger"
+    assert "width:26px;" in css_block, "attention indicator should use the same width as the actions trigger"
+    assert "height:26px;" in css_block, "attention indicator should use the same height as the actions trigger"
+    assert ".session-attention-indicator.is-streaming::before{" in STYLE_CSS
+    inner_spinner_block = STYLE_CSS[
+        STYLE_CSS.find(".session-attention-indicator.is-streaming::before{"):
+        STYLE_CSS.find(".session-attention-indicator.is-unread::before{")
+    ]
+    assert "width:10px;" in inner_spinner_block, "spinner glyph should stay 10px inside the 26px action slot"
+    assert "height:10px;" in inner_spinner_block, "spinner glyph should stay 10px inside the 26px action slot"
+
+    hover_rule = ".session-item:hover .session-attention-indicator"
+    assert hover_rule in STYLE_CSS, "hover rule should hide attention indicator when actions appear"
+
+
+def test_timestamp_hidden_when_attention_state_is_present():
+    assert "+(hasUnread?' unread':'')" in SESSIONS_JS
+    assert "const hasAttentionState=isStreaming||hasUnread;" in SESSIONS_JS
+    assert "ts.className='session-time'+(hasAttentionState?' is-hidden':'');" in SESSIONS_JS
+    assert "ts.textContent=hasAttentionState?'':_formatRelativeSessionTime(tsMs);" in SESSIONS_JS
+    assert ".session-time.is-hidden{display:none;}" in STYLE_CSS
+    assert ".session-item{padding:8px 86px 8px 8px;" in STYLE_CSS
+    assert ".session-item.streaming,.session-item.unread{padding-right:40px;}" in STYLE_CSS
+    assert ".session-item{min-height:44px;padding:10px 86px 10px 12px;}" in STYLE_CSS
+    session_time_block = STYLE_CSS[
+        STYLE_CSS.find(".session-time{"):
+        STYLE_CSS.find(".session-time.is-hidden")
+    ]
+    assert "position:absolute;" in session_time_block
+    assert "right:10px;" in session_time_block
+    assert ".session-item:hover .session-time" in STYLE_CSS
+    assert ".session-item.streaming:not(:hover):not(:focus-within):not(.menu-open) .session-actions" in STYLE_CSS
+    assert ".session-item.unread:not(:hover):not(:focus-within):not(.menu-open) .session-actions" in STYLE_CSS
+
+
+def test_sidebar_uses_local_inflight_state_for_immediate_spinner():
+    messages_js = (Path(__file__).resolve().parent.parent / "static" / "messages.js").read_text()
+
+    assert "const isLocalStreaming=Boolean(" in SESSIONS_JS
+    assert "(isActive&&S.busy)" in SESSIONS_JS
+    assert "INFLIGHT[s.session_id]" in SESSIONS_JS
+    assert "const isStreaming=Boolean(s.is_streaming||isLocalStreaming);" in SESSIONS_JS
+    assert "if(typeof renderSessionListFromCache==='function') renderSessionListFromCache();" in messages_js
+
+
+def test_date_group_caret_expanded_down_collapsed_right():
+    assert "caret.textContent='\\u25BE';" in SESSIONS_JS
+    assert ".session-date-caret{" in STYLE_CSS
+    caret_block = STYLE_CSS[
+        STYLE_CSS.find(".session-date-caret{"):
+        STYLE_CSS.find(".session-date-caret.collapsed")
+    ]
+    assert "transform:rotate(0deg);" in caret_block
+    assert ".session-date-caret.collapsed{transform:rotate(-90deg);}" in STYLE_CSS
+

 def test_apperror_path_calls_render_session_list():
    """apperror handler must call renderSessionList() to clear the streaming indicator
@@ -65,6 +65,64 @@ def _read_index(index_file):
    return json.loads(index_file.read_text(encoding="utf-8"))


+def test_compact_exposes_last_message_at_from_message_timestamp():
+    s = Session(
+        session_id="sess_time",
+        title="Time",
+        updated_at=300.0,
+        messages=[
+            {"role": "user", "content": "old", "_ts": 100.0},
+            {"role": "tool", "content": "ignore", "timestamp": 400.0},
+            {"role": "assistant", "content": "latest", "timestamp": 200.0},
+        ],
+    )
+
+    compact = s.compact()
+
+    assert compact["updated_at"] == 300.0
+    assert compact["last_message_at"] == 200.0
+
+
+def test_all_sessions_backfills_last_message_at_for_legacy_index_rows():
+    index_file = models.SESSION_INDEX_FILE
+    s = Session(
+        session_id="sess_legacy_index",
+        title="Legacy Index",
+        updated_at=300.0,
+        messages=[{"role": "assistant", "content": "reply", "_ts": 100.0}],
+    )
+    s.path.write_text(json.dumps(s.__dict__, ensure_ascii=False, indent=2), encoding="utf-8")
+    _write_index_file(
+        index_file,
+        [
+            {
+                "session_id": s.session_id,
+                "title": s.title,
+                "updated_at": s.updated_at,
+                "workspace": s.workspace,
+                "model": s.model,
+                "message_count": 1,
+                "created_at": s.created_at,
+                "pinned": False,
+                "archived": False,
+            }
+        ],
+    )
+
+    rows = models.all_sessions()
+
+    assert rows[0]["session_id"] == s.session_id
+    assert rows[0]["last_message_at"] == 100.0
+
+    # Backfill must also be persisted to the index so subsequent /api/sessions
+    # polls don't re-read every legacy session file.  Without this, a 5-second
+    # poll cycle re-loads every legacy session JSON on every tick until each
+    # session is independently saved.
+    persisted = _read_index(index_file)
+    assert persisted[0]["session_id"] == s.session_id
+    assert persisted[0].get("last_message_at") == 100.0
+
+
 # ── 6. test_incremental_patch_correctness ─────────────────────────────────

 def test_incremental_patch_correctness():
@@ -178,6 +236,61 @@ def test_incremental_update_prunes_stale_entries():
    assert "ghost_sid" not in ids, "stale entry with no backing file must be pruned"


+def test_load_metadata_only_does_not_parse_large_message_body():
+    """Large sessions must keep the metadata-only path cheap."""
+    s = Session(
+        session_id="sess_large",
+        title="Large Session",
+        messages=[{"role": "assistant", "content": "x" * 200_000}],
+        tool_calls=[{"id": "tool_1", "name": "read_file", "result": "y" * 10_000}],
+        input_tokens=123,
+        output_tokens=45,
+    )
+    s.save()
+
+    with patch.object(Session, "load", side_effect=AssertionError("full load should not run")):
+        meta = Session.load_metadata_only("sess_large")
+
+    assert meta is not None
+    assert meta.session_id == "sess_large"
+    assert meta.title == "Large Session"
+    assert meta.input_tokens == 123
+    assert meta.output_tokens == 45
+    assert meta.messages == []
+    assert meta.tool_calls == []
+    assert meta.compact()["message_count"] == 1
+
+
+def test_metadata_only_get_session_does_not_poison_full_session_cache():
+    s = Session(
+        session_id="sess_cache",
+        title="Cache Guard",
+        messages=[{"role": "user", "content": "hi"}],
+    )
+    s.save(skip_index=True)
+
+    meta = models.get_session("sess_cache", metadata_only=True)
+    assert meta.messages == []
+    assert "sess_cache" not in models.SESSIONS
+
+    full = models.get_session("sess_cache")
+    assert full.messages == [{"role": "user", "content": "hi"}]
+    assert models.SESSIONS["sess_cache"] is full
+
+
+def test_session_save_does_not_persist_metadata_message_count_hint():
+    s = Session(
+        session_id="sess_private_hint",
+        title="Private Hint",
+        messages=[{"role": "user", "content": "hi"}],
+    )
+    s._metadata_message_count = 10
+    s.save(skip_index=True)
+
+    payload = json.loads(s.path.read_text(encoding="utf-8"))
+    assert "_metadata_message_count" not in payload
+
+
 # ── 8. test_first_call_full_rebuild ──────────────────────────────────────

 def test_first_call_full_rebuild():
@@ -0,0 +1,51 @@
+import re
+from pathlib import Path
+
+
+ROOT = Path(__file__).resolve().parents[1]
+
+
+def test_messages_zero_skips_effective_model_resolution():
+    src = (ROOT / "api" / "routes.py").read_text(encoding="utf-8")
+
+    assert re.search(
+        r"effective_model\s*=\s*\(\s*"
+        r"_resolve_effective_session_model_for_display\(s\)\s*"
+        r"if resolve_model\s*else None\s*\)",
+        src,
+    ), "messages=0 metadata requests must not resolve the model catalog"
+    assert 'resolve_model_default = "1" if load_messages else "0"' in src
+
+
+def test_full_message_load_updates_viewed_count_after_metadata_fast_path():
+    src = (ROOT / "static" / "sessions.js").read_text(encoding="utf-8")
+
+    assert "_setSessionViewedCount(S.session.session_id, Number(data.session.message_count || 0));" in src
+    assert "_setSessionViewedCount(sid, Number(S.session.message_count || msgs.length));" in src
+
+
+def test_lazy_message_load_skips_model_resolution():
+    src = (ROOT / "static" / "sessions.js").read_text(encoding="utf-8")
+
+    assert "messages=1&resolve_model=0" in src
+
+
+def test_session_switch_defers_model_resolution_without_blocking():
+    src = (ROOT / "static" / "sessions.js").read_text(encoding="utf-8")
+    ui = (ROOT / "static" / "ui.js").read_text(encoding="utf-8")
+
+    assert "messages=0&resolve_model=0" in src
+    assert "function _resolveSessionModelForDisplaySoon" in src
+    assert "messages=0&resolve_model=1" in src
+    assert "_modelResolutionDeferred=true" in src
+    assert "deferModelCorrection" in ui
+    assert "if(!deferModelCorrection)" in ui
+
+
+def test_boot_does_not_block_session_restore_on_model_catalog():
+    src = (ROOT / "static" / "boot.js").read_text(encoding="utf-8")
+
+    assert "if(s.default_model) window._defaultModel=s.default_model;" in src
+    assert "const _modelDropdownReady=populateModelDropdown().then" in src
+    assert "window._modelDropdownReady=_modelDropdownReady" in src
+    assert "await populateModelDropdown()" not in src
@@ -29,6 +29,7 @@ def _run_session_time_case(script_body: str) -> dict:
    functions = "\n\n".join(
        _extract_function(SESSIONS_JS, name)
        for name in (
+            "_sessionTimestampMs",
            "_localDayOrdinal",
            "_sessionCalendarBoundaries",
            "_formatSessionDate",
@@ -65,6 +66,7 @@ def _run_session_time_case(script_body: str) -> dict:


 def test_session_sidebar_js_has_dynamic_relative_time_helpers():
+    assert "function _sessionTimestampMs" in SESSIONS_JS
    assert "function _sessionCalendarBoundaries" in SESSIONS_JS
    assert "function _formatRelativeSessionTime" in SESSIONS_JS
    assert "function _sessionTimeBucketLabel" in SESSIONS_JS
@@ -86,6 +88,22 @@ def test_session_sidebar_renders_relative_time_and_meta_rows():
    assert "const ONE_DAY=86400000;" not in SESSIONS_JS


+def test_session_timestamp_prefers_last_message_at_over_metadata_updated_at():
+    result = _run_session_time_case(
+        """
+        const session = {
+          created_at: 1776441348,
+          updated_at: 1777086443,
+          last_message_at: 1776441972,
+        };
+        process.stdout.write(JSON.stringify({
+          timestampMs: _sessionTimestampMs(session),
+        }));
+        """
+    )
+    assert result["timestampMs"] == 1776441972 * 1000
+
+
 def test_relative_time_uses_calendar_boundaries_and_year_for_old_sessions():
    result = _run_session_time_case(
        """
@@ -72,9 +72,14 @@ class TestGenerateTitleRawViaAuxTimeout(unittest.TestCase):
    def _run_with_config(self, tg_config, expected_timeout):
        from api.streaming import generate_title_raw_via_aux

-        mock_resp = MagicMock()
-        mock_resp.choices = [MagicMock()]
-        mock_resp.choices[0].message.content = 'Test Title'
+        mock_resp = types.SimpleNamespace(
+            choices=[
+                types.SimpleNamespace(
+                    message=types.SimpleNamespace(content='Test Title'),
+                    finish_reason='stop',
+                )
+            ]
+        )

        captured = {}

@@ -118,6 +123,153 @@ class TestGenerateTitleRawViaAuxTimeout(unittest.TestCase):
        )


+class TestReasoningModelTitleGeneration(unittest.TestCase):
+    """Regression coverage for reasoning models that spend output budget on reasoning."""
+
+    def test_title_budget_defaults_to_reasoning_safe_value(self):
+        """Title generation should not use a tiny output cap that starves final content."""
+        from api.streaming import _title_completion_budget, _title_retry_completion_budget
+
+        self.assertEqual(_title_completion_budget(), 512)
+        self.assertEqual(_title_retry_completion_budget(), 1024)
+
+    def test_aux_retries_empty_reasoning_length_response_with_larger_budget(self):
+        """If a reasoning model returns empty content at finish_reason=length, retry once."""
+        from api.streaming import generate_title_raw_via_aux
+
+        responses = [
+            {
+                'choices': [
+                    {
+                        'message': {'content': '', 'reasoning': 'long hidden reasoning'},
+                        'finish_reason': 'length',
+                    }
+                ]
+            },
+            {'choices': [{'message': {'content': 'Useful Session Title'}, 'finish_reason': 'stop'}]},
+        ]
+        captured_budgets = []
+
+        def fake_call_llm(**kwargs):
+            captured_budgets.append(kwargs.get('max_tokens'))
+            return responses.pop(0)
+
+        with _patch_tg_config({'provider': 'ollama', 'model': 'kimi-k2.6', 'base_url': 'https://ollama.com/v1'}):
+            with patch('agent.auxiliary_client.call_llm', side_effect=fake_call_llm, create=True):
+                result, status = generate_title_raw_via_aux(
+                    user_text='Hey nur ein kurzer Test',
+                    assistant_text='Alles klar, ich helfe dir dabei.',
+                )
+
+        self.assertEqual(result, 'Useful Session Title')
+        self.assertEqual(status, 'llm_aux_retry')
+        self.assertEqual(captured_budgets, [512, 1024])
+
+    def test_aux_returns_specific_status_when_reasoning_retry_still_empty(self):
+        """Diagnostics should expose the provider failure mode instead of generic llm_error_aux."""
+        from api.streaming import generate_title_raw_via_aux
+
+        def empty_length_response(**kwargs):
+            return {
+                'choices': [
+                    {
+                        'message': {'content': '', 'reasoning': 'still reasoning'},
+                        'finish_reason': 'length',
+                    }
+                ]
+            }
+
+        with _patch_tg_config({'provider': 'ollama', 'model': 'kimi-k2.6', 'base_url': 'https://ollama.com/v1'}):
+            with patch('agent.auxiliary_client.call_llm', side_effect=empty_length_response, create=True):
+                result, status = generate_title_raw_via_aux(
+                    user_text='Hey nur ein kurzer Test',
+                    assistant_text='Alles klar, ich helfe dir dabei.',
+                )
+
+        self.assertIsNone(result)
+        self.assertEqual(status, 'llm_length_aux')
+
+    def test_agent_route_retries_empty_reasoning_length_response(self):
+        """The active-agent route should get the same reasoning-model retry path as aux."""
+        from api.streaming import generate_title_raw_via_agent
+
+        responses = [
+            {
+                'choices': [
+                    {
+                        'message': {'content': '', 'reasoning': 'long hidden reasoning'},
+                        'finish_reason': 'length',
+                    }
+                ]
+            },
+            {'choices': [{'message': {'content': 'Agent Session Title'}, 'finish_reason': 'stop'}]},
+        ]
+        captured_budgets = []
+
+        def fake_create(**kwargs):
+            captured_budgets.append(kwargs.get('max_tokens') or kwargs.get('max_completion_tokens'))
+            return responses.pop(0)
+
+        client = types.SimpleNamespace(
+            chat=types.SimpleNamespace(
+                completions=types.SimpleNamespace(create=fake_create)
+            )
+        )
+        agent = MagicMock()
+        agent.api_mode = 'openai'
+        agent.provider = 'ollama'
+        agent.model = 'kimi-k2.6'
+        agent.base_url = 'https://ollama.com/v1'
+        agent.reasoning_config = None
+        agent._build_api_kwargs.return_value = {}
+        agent._ensure_primary_openai_client.return_value = client
+
+        result, status = generate_title_raw_via_agent(
+            agent,
+            user_text='Hey nur ein kurzer Test',
+            assistant_text='Alles klar, ich helfe dir dabei.',
+        )
+
+        self.assertEqual(result, 'Agent Session Title')
+        self.assertEqual(status, 'llm_retry')
+        self.assertEqual(captured_budgets, [512, 1024])
+        self.assertIsNone(agent.reasoning_config)
+
+    @patch('api.streaming._aux_title_configured', return_value=True)
+    @patch('api.streaming._generate_llm_session_title_via_aux')
+    @patch('api.streaming.get_session')
+    def test_fallback_title_status_keeps_underlying_llm_reason(
+        self, mock_get_session, mock_aux_title, mock_configured,
+    ):
+        """Local fallback should not hide that the LLM failed because it hit length."""
+        from api.streaming import _run_background_title_update
+
+        mock_session = MagicMock()
+        mock_session.title = 'Untitled'
+        mock_session.llm_title_generated = False
+        mock_session.messages = [
+            {'role': 'user', 'content': 'Hey nur ein kurzer Test'},
+            {'role': 'assistant', 'content': 'Alles klar, ich helfe dir dabei.'},
+        ]
+        mock_get_session.return_value = mock_session
+        mock_aux_title.return_value = (None, 'llm_length_aux', '')
+        events = []
+
+        _run_background_title_update(
+            session_id='reasoning-title-session',
+            user_text='Hey nur ein kurzer Test',
+            assistant_text='Alles klar, ich helfe dir dabei.',
+            placeholder_title='Untitled',
+            put_event=lambda event_type, data: events.append((event_type, data)),
+            agent=None,
+        )
+
+        title_status = [data for event_type, data in events if event_type == 'title_status']
+        self.assertTrue(title_status)
+        self.assertEqual(title_status[0]['status'], 'fallback')
+        self.assertEqual(title_status[0]['reason'], 'local_summary:llm_length_aux')
+
+
 class TestAuxTitleTimeoutEdgeCases(unittest.TestCase):
    """Comment 4: _aux_title_timeout must reject zero, negative, and non-numeric values."""