Skip to main content
Before we dive in, a quick picture of what the gateway even is. The gateway is a traffic director. Your app talks to it. It talks to the Bindu agents. It keeps track of the conversation, figures out when to summarize it (so the language model doesn’t drown in history), and passes tool calls to whichever agent is supposed to handle them. When something weird happens end-to-end — the agent gives a broken answer, or the request hangs forever, or the history suddenly makes no sense — it’s often because of something on this page. There are three bugs here I want to walk you through. Each one has its own story. If you’re debugging and something here feels familiar, you’ve probably found your match.

Bug #1: the summary machine is set for a bigger tank

Slug: context-window-hardcoded Picture a fuel tank. Language models each have their own size. Claude Opus has a big one — 200,000 tokens. GPT-4o-mini has a smaller one — 128,000. Gemini Flash has a huge one. The gateway has a feature called compaction — a little robot that watches the conversation growing and, when the history gets too long to fit in the tank, summarizes the older turns to make room. Good idea. Saves you from crashing into the tank wall. Here’s the story. You set up the gateway with Claude Opus. Everything works. A few weeks later you decide to save some money and switch the model to GPT-4o-mini. Restart. Run a few long conversations. Suddenly the model throws an error: “too much context, I can’t fit this.” You check the logs, expecting to see compaction kick in. It didn’t. Why? Because the compaction robot’s threshold is hardcoded to 200,000 — Claude Opus’s size. It was watching for a tank wall the new model doesn’t have. GPT-4o-mini’s wall is at 128,000 and the conversation sailed right past it. The bad number lives in one spot — gateway/src/session/overflow.ts — and it’s just the literal 200_000. Any smaller model blows past the real wall. Any bigger model triggers compaction way too early, making you pay for summaries you didn’t need. What you can do today. When you call compactIfNeeded, pass in an explicit threshold.contextWindow that matches the model you’re actually running. Not elegant, but it works. The real fix is already in. A function called thresholdForModel() now looks up the right number based on which model you’re using — Anthropic 4.x gets 200k, GPT-4o family gets 128k, GPT-4.1 gets about 1M, o3 gets 200k. If the model is something we haven’t heard of, it picks 128k as a safe guess — better to compact early than to crash into a tank wall we didn’t expect. There are fourteen tests making sure this stays right.

Bug #2: the phone call nobody can hang up

Slug: poll-budget-unbounded-wall-clock Imagine you call customer service. You get put on hold. You wait. And wait. You hang up. Except — imagine the customer service line stays on hold after you hang up. Nobody on your end is listening anymore, but the line is still ringing in a back room somewhere, forever. That’s this bug. Here’s what’s happening. Your user sends a /plan request. The gateway needs to ask a peer agent to do one thing as part of it. The peer is busy — maybe it’s waiting on something slow, maybe it’s genuinely stuck — and it isn’t finishing. The gateway has a polling loop that keeps checking: is it done yet? is it done yet? It tries up to 60 times, with gaps that grow from a tiny delay to 10 seconds. Worst case: five minutes of polling for one tool call. Meanwhile on the user’s end: their connection to the gateway is still open, their screen is doing nothing, and eventually their browser gives up and closes the tab. User is gone. But the gateway doesn’t know that. The polling loop keeps running in the background. It’s still calling the peer, still checking, still waiting. Nobody told it to stop. If your plan needs ten tool calls and two of them hang, you’ve just turned one user request into fifty minutes of background work on a connection that doesn’t exist anymore. The code is at gateway/src/bindu/client/poll.ts. The deeper problem is that there’s no time limit on the whole /plan — just limits on each individual phone call. One stuck peer can hold the whole thing hostage. What you can do today. When you set up the Bindu client, pass in a smaller maxPolls number, or a shorter wait schedule. Knowing that your user might disconnect won’t help you directly — that’s a separate bug (see it on the medium page). The easiest real-world timeout is from the outside: put a client-side timeout on the SSE request so the gateway at least could notice if we taught it to listen.

Bug #3: two people writing in the same notebook

Slug: no-session-concurrency-guard Alice has your app open in two browser tabs. Same conversation in both. She sends a message from tab 1. Before it finishes processing, she sends another message from tab 2. Two /plan calls, same session, running at the same time. Here’s where it goes sideways. Both planners start writing to the conversation history at the same time. There’s no rule saying “one at a time, please.” Tab 1 writes half of a tool call. Tab 2, looking at the history, sees the half-written tool call — but without its matching result. The sequence is now broken. The LLM sees the broken sequence and either hallucinates something wrong or just errors out. That’s bad. Worse: the broken sequence is now saved to disk. Alice closing the tabs and refreshing won’t fix it. The conversation is permanently tangled. We had a related bug a while back where compaction itself could race — two compaction attempts fighting over the same session. We fixed that. You can read the postmortem if you want (compaction-concurrent-races). But that fix only protects compaction. It doesn’t stop two plans from stepping on each other in the first place. What you can do today. Make sure your client doesn’t send two /plan calls for the same session at the same time. If you need two things to happen in parallel, give each one its own session ID and stitch them together yourself afterward. The proper fix is a per-session lock inside the gateway — so the second /plan waits for the first to finish. It’s on the list.

That’s the three

Those are the gateway bugs big enough to write stories about. The rest — medium, low, nits — are on their own pages, one click away. Medium is worth skimming before you ship. Low is “things to know.” Nits are cleanup for rainy afternoons.