| Severity | critical |
| Status | fixed |
| Found | 2026-04-18 |
| Fixed | 2026-04-18 |
| Area | gateway/session |
| Commit | 77603da |
Symptom
Sessions that hit the compaction threshold during a tool-heavy turn permanently broke. The very next request to the model failed with:compacted=true, so even a fresh turn would
rebuild the same broken message list. Only manual intervention
(resetting the flag) recovered the session.
Root cause
splitHead in gateway/src/session/compaction.ts:62-71 (pre-fix):
tool_use + tool_result + possibly
more tool pairs + final-assistant. A three-tool turn is 8 messages. A
ten-tool turn is 22.
With keepTail = 4, the cut landed mid-turn at least half the time
a turn was long enough to cross the boundary. The common failing shape:
tool_result in the visible message list (tail) that
references a tool_use id that doesn’t appear anywhere (head was
replaced by the summary paragraph). Hard 400 error, unrecoverable
without touching the DB.
Mental model that led to the bug: keepTail was thought of as “keep the
last N messages verbatim,” with N chosen to be roughly one short turn.
The assumption was that a turn is 2–3 messages. Reality is 8–22 for
tool-heavy turns, which were the whole point of this gateway.
Fix
Walk left from the naive cut point until the message at the split is auser turn. Since a user message starts a new turn by definition,
the invariant is that every assistant tool_use is in the same half
as its tool_result.
keepTail is now a MINIMUM, not an exact count. Tail may be longer
than requested — never shorter.
See commit 77603da and five regression cases in
gateway/tests/session/compaction-split.test.ts covering tool-heavy
turns, single-unbroken-turn histories, and boundary cases.
Why the tests didn’t catch it
splitHead was an un-exported helper and had no unit tests. Compaction
as a whole had no tests. The bug only fires when:
- Token budget crosses the threshold during a session — a statistical event that depends on session content.
- The cut point lands inside a turn — another statistical event.
tool_use + tool_result must be paired in the
visible message list — that rule lives in Anthropic’s and OpenAI’s API
docs. Our tests cover our code’s behavior; they don’t cover our
compliance with external contracts.
Class of bug — where else to watch
“Index arithmetic on a list with semantic boundaries” — any time code slices, splits, or truncates a message list by count, it must respect turn boundaries (and, more strictly, tool-pair boundaries). Candidates for similar bugs in the codebase:gateway/src/session/index.ts:listMessagespaginates at 1000 rows. If a long session’s boundary falls mid-turn, and some code path uses the truncated list directly with the LLM, same failure mode. Tracked inknown-issues.mdunderlist-messages-pagination-silent.- Revert (
gateway/src/session/revert.ts) cuts bycreated_attimestamp. Timestamps generally align with turn starts (user messages precede assistant responses), but there’s no enforced invariant. Worth auditing. - Any future “replay from turn N” or “export last K turns” feature must slice on user-message boundaries, not message indices.
tool_result.
A minor related lesson: keepTail as an exact count was the wrong
vocabulary. The name suggested strict guarantees that were unsafe.
Renaming it to minKeepTail in the fix makes the looser contract
visible in the API. When a parameter semantically means “at least this
many,” name it minX — don’t rely on docstrings.