mirror of
https://github.com/nesquena/hermes-webui.git
synced 2026-05-24 18:50:15 +00:00
9f3f8ea902
Two concrete data-corruption vectors flagged in Opus review of PR #2041, both fixed atomically so the new repair-safe endpoint is safe for production: 1. Shared tmp filename under concurrent calls `tmp = target.with_suffix('.json.reconcile.tmp')` produced a fixed path per session ID. Two simultaneous repair-safe POSTs would interleave bytes in the same tmp file, then both rename → corrupted JSON. Now matches the `Session.save()` convention at api/models.py:484 with a pid+tid suffix. 2. TOCTOU between target.exists() check and tmp.replace(target) `os.replace()` overwrites unconditionally. If a concurrent Session.save() for the same SID materialized the live sidecar in the microsecond window between the existence check and the rename, the reconciliation would silently overwrite a live sidecar with a (lossier) state.db reconstruction. Switched to `os.link()` + `unlink(tmp)` which is atomic create-or-fail — on FileExistsError we record `skipped: sidecar_appeared_during_reconcile` and keep the live sidecar untouched. Plus a round-trip schema-parity test: materialize a sidecar from state.db, then load it back through `Session.load()` and assert the messages survive. Catches future schema drift between `_state_db_row_to_sidecar()` and `Session.__init__()`. Also adds a guard test confirming the .reconcile.tmp suffix includes pid+tid (regression guard for hazard #1). Tests: 23 passing across the recovery suite (was 21; +2 new in this commit). Co-authored-by: ai-ag2026 <261867348+ai-ag2026@users.noreply.github.com>