Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getbindu.com/llms.txt

Use this file to discover all available pages before exploring further.

Before we dive in, a quick picture of what the gateway even is. The gateway is a traffic director. Your app talks to it. It talks to the Bindu agents. It keeps track of the conversation, figures out when to summarize it (so the language model doesn’t drown in history), and passes tool calls to whichever agent is supposed to handle them. When something weird happens end-to-end — the agent gives a broken answer, or the request hangs forever, or the history suddenly makes no sense — it’s often because of something on this page. There are two bugs here I want to walk you through. Each one has its own story. If you’re debugging and something here feels familiar, you’ve probably found your match.
This page used to have a third entry — poll-budget-unbounded-wall-clock, where a stuck peer could stall a /plan for five minutes per tool call. Fixed in April 2026. sendAndPoll is now abort-aware, and preferences.timeout_ms on /plan sets a plan-wide deadline (default 30 min, ceiling 6 h). Same fix closed the related medium-severity abort-signal-not-propagated-to-bindu-client. Read the postmortem if you want the story.

Bug #1: the summary machine is set for a bigger tank

Slug: context-window-hardcoded Picture a fuel tank. Language models each have their own size. Claude Opus has a big one — 200,000 tokens. GPT-4o-mini has a smaller one — 128,000. Gemini Flash has a huge one. The gateway has a feature called compaction — a little robot that watches the conversation growing and, when the history gets too long to fit in the tank, summarizes the older turns to make room. Good idea. Saves you from crashing into the tank wall. Here’s the story. You set up the gateway with Claude Opus. Everything works. A few weeks later you decide to save some money and switch the model to GPT-4o-mini. Restart. Run a few long conversations. Suddenly the model throws an error: “too much context, I can’t fit this.” You check the logs, expecting to see compaction kick in. It didn’t. Why? Because the compaction robot’s threshold is hardcoded to 200,000 — Claude Opus’s size. It was watching for a tank wall the new model doesn’t have. GPT-4o-mini’s wall is at 128,000 and the conversation sailed right past it. The bad number lives in one spot — gateway/src/session/overflow.ts — and it’s just the literal 200_000. Any smaller model blows past the real wall. Any bigger model triggers compaction way too early, making you pay for summaries you didn’t need. What you can do today. When you call compactIfNeeded, pass in an explicit threshold.contextWindow that matches the model you’re actually running. Not elegant, but it works. The real fix is already in. A function called thresholdForModel() now looks up the right number based on which model you’re using — Anthropic 4.x gets 200k, GPT-4o family gets 128k, GPT-4.1 gets about 1M, o3 gets 200k. If the model is something we haven’t heard of, it picks 128k as a safe guess — better to compact early than to crash into a tank wall we didn’t expect. There are fourteen tests making sure this stays right.

Bug #2: two people writing in the same notebook

Slug: no-session-concurrency-guard Alice has your app open in two browser tabs. Same conversation in both. She sends a message from tab 1. Before it finishes processing, she sends another message from tab 2. Two /plan calls, same session, running at the same time. Here’s where it goes sideways. Both planners start writing to the conversation history at the same time. There’s no rule saying “one at a time, please.” Tab 1 writes half of a tool call. Tab 2, looking at the history, sees the half-written tool call — but without its matching result. The sequence is now broken. The LLM sees the broken sequence and either hallucinates something wrong or just errors out. That’s bad. Worse: the broken sequence is now saved to disk. Alice closing the tabs and refreshing won’t fix it. The conversation is permanently tangled. We had a related bug a while back where compaction itself could race — two compaction attempts fighting over the same session. We fixed that. You can read the postmortem if you want (compaction-concurrent-races). But that fix only protects compaction. It doesn’t stop two plans from stepping on each other in the first place. What you can do today. Make sure your client doesn’t send two /plan calls for the same session at the same time. If you need two things to happen in parallel, give each one its own session ID and stitch them together yourself afterward. The proper fix is a per-session lock inside the gateway — so the second /plan waits for the first to finish. It’s on the list.

That’s the two

Those are the gateway bugs big enough to write stories about. The rest — medium, low, nits — are on their own pages, one click away. Medium is worth skimming before you ship. Low is “things to know.” Nits are cleanup for rainy afternoons.