mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-21 03:39:54 +00:00
107de0321d
Third Windows-specific sandbox bug (after WinError 10106 and the UTF-8
file-write bug): user scripts that print non-ASCII to stdout crash with
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192'
in position N: character maps to <undefined>
Root cause: Python's sys.stdout on Windows is bound to the console code
page (cp1252 on US-locale installs) when the process is attached to a
pipe without PYTHONIOENCODING set. LLM-generated scripts routinely
print em-dashes, arrows, accented chars, and emoji — all of which cp1252
can't encode.
Fix: spawn the sandbox child with:
PYTHONIOENCODING=utf-8 # sys.stdin/stdout/stderr all UTF-8
PYTHONUTF8=1 # PEP 540 UTF-8 mode — open() defaults to UTF-8 too
PYTHONUTF8 is the belt-and-suspenders half: LLM scripts that call
open(path, 'w') without encoding= in user code will now produce UTF-8
files by default, matching what the sandbox already does for its own
staging files.
The parent side already decodes child stdout/stderr as UTF-8 with
errors='replace' (lines 1345-1347) so the end-to-end chain is clean.
On POSIX these values usually match the locale default already, so
setting them is harmless belt-and-suspenders for C/POSIX-locale
containers and minimal base images.
Tests added (4) — total file now at 28 passed, 1 skipped on Windows:
- test_popen_env_sets_pythonioencoding_utf8 (source grep)
- test_popen_env_sets_pythonutf8_mode (source grep)
- test_live_child_can_print_non_ascii (cross-platform live test)
- test_windows_child_without_utf8_env_would_fail (Windows negative
control — actually reproduces the bug without our env overrides,
proving the fix is load-bearing on this system)