Bug #1: the summary machine is set for a bigger tank
Slug:context-window-hardcoded
Picture a fuel tank. Language models each have their own size. Claude Opus has a big one — 200,000 tokens. GPT-4o-mini has a smaller one — 128,000. Gemini Flash has a huge one.
The gateway has a feature called compaction — a little robot that watches the conversation growing and, when the history gets too long to fit in the tank, summarizes the older turns to make room. Good idea. Saves you from crashing into the tank wall.
Here’s the story. You set up the gateway with Claude Opus. Everything works. A few weeks later you decide to save some money and switch the model to GPT-4o-mini. Restart. Run a few long conversations. Suddenly the model throws an error: “too much context, I can’t fit this.”
You check the logs, expecting to see compaction kick in. It didn’t. Why?
Because the compaction robot’s threshold is hardcoded to 200,000 — Claude Opus’s size. It was watching for a tank wall the new model doesn’t have. GPT-4o-mini’s wall is at 128,000 and the conversation sailed right past it.
The bad number lives in one spot — gateway/src/session/overflow.ts — and it’s just the literal 200_000. Any smaller model blows past the real wall. Any bigger model triggers compaction way too early, making you pay for summaries you didn’t need.
What you can do today. When you call compactIfNeeded, pass in an explicit threshold.contextWindow that matches the model you’re actually running. Not elegant, but it works.
The real fix is already in. A function called thresholdForModel() now looks up the right number based on which model you’re using — Anthropic 4.x gets 200k, GPT-4o family gets 128k, GPT-4.1 gets about 1M, o3 gets 200k. If the model is something we haven’t heard of, it picks 128k as a safe guess — better to compact early than to crash into a tank wall we didn’t expect. There are fourteen tests making sure this stays right.
Bug #2: the phone call nobody can hang up
Slug:poll-budget-unbounded-wall-clock
Imagine you call customer service. You get put on hold. You wait. And wait. You hang up. Except — imagine the customer service line stays on hold after you hang up. Nobody on your end is listening anymore, but the line is still ringing in a back room somewhere, forever.
That’s this bug.
Here’s what’s happening. Your user sends a /plan request. The gateway needs to ask a peer agent to do one thing as part of it. The peer is busy — maybe it’s waiting on something slow, maybe it’s genuinely stuck — and it isn’t finishing.
The gateway has a polling loop that keeps checking: is it done yet? is it done yet? It tries up to 60 times, with gaps that grow from a tiny delay to 10 seconds. Worst case: five minutes of polling for one tool call.
Meanwhile on the user’s end: their connection to the gateway is still open, their screen is doing nothing, and eventually their browser gives up and closes the tab. User is gone.
But the gateway doesn’t know that. The polling loop keeps running in the background. It’s still calling the peer, still checking, still waiting. Nobody told it to stop.
If your plan needs ten tool calls and two of them hang, you’ve just turned one user request into fifty minutes of background work on a connection that doesn’t exist anymore.
The code is at gateway/src/bindu/client/poll.ts. The deeper problem is that there’s no time limit on the whole /plan — just limits on each individual phone call. One stuck peer can hold the whole thing hostage.
What you can do today. When you set up the Bindu client, pass in a smaller maxPolls number, or a shorter wait schedule. Knowing that your user might disconnect won’t help you directly — that’s a separate bug (see it on the medium page). The easiest real-world timeout is from the outside: put a client-side timeout on the SSE request so the gateway at least could notice if we taught it to listen.
Bug #3: two people writing in the same notebook
Slug:no-session-concurrency-guard
Alice has your app open in two browser tabs. Same conversation in both. She sends a message from tab 1. Before it finishes processing, she sends another message from tab 2.
Two /plan calls, same session, running at the same time.
Here’s where it goes sideways. Both planners start writing to the conversation history at the same time. There’s no rule saying “one at a time, please.” Tab 1 writes half of a tool call. Tab 2, looking at the history, sees the half-written tool call — but without its matching result. The sequence is now broken.
The LLM sees the broken sequence and either hallucinates something wrong or just errors out. That’s bad.
Worse: the broken sequence is now saved to disk. Alice closing the tabs and refreshing won’t fix it. The conversation is permanently tangled.
We had a related bug a while back where compaction itself could race — two compaction attempts fighting over the same session. We fixed that. You can read the postmortem if you want (compaction-concurrent-races). But that fix only protects compaction. It doesn’t stop two plans from stepping on each other in the first place.
What you can do today. Make sure your client doesn’t send two /plan calls for the same session at the same time. If you need two things to happen in parallel, give each one its own session ID and stitch them together yourself afterward.
The proper fix is a per-session lock inside the gateway — so the second /plan waits for the first to finish. It’s on the list.