/plan endpoint and grades the SSE stream. This is the reproducible setup docs/GATEWAY.md uses for its walkthrough — each agent is deliberately narrow so the planner has to pick the right one for each query.
Fleet overview
| Agent | Port | Skill id | What it does |
|---|---|---|---|
joke_agent | 3773 | tell_joke | Tells short jokes. Declines anything off-topic. |
math_agent | 3775 | solve_math | Solves math problems step-by-step. |
poet_agent | 3776 | write_poem | Writes 4-line poems. Hydra auth enabled. |
research_agent | 3777 | web_research | Web search + cited summaries via DuckDuckGo. |
faq_agent (Bindu docs) | 3778 | bindu_docs_qa | Answers Bindu doc questions with citations. Hydra auth enabled. |
3774. All agents run openai/gpt-4o-mini via OpenRouter.
Code
joke_agent.py
math_agent.py
poet_agent.py
research_agent.py
faq_agent.py
Fleet scripts
start_fleet.sh
Boots all five agents underuv run, each on its assigned port. Inherits examples/.env, writes pid files to pids/<agent>.pid, and tails each agent’s /health endpoint to harvest its DID. The DIDs land in a sibling .fleet.env you source into your shell.
BINDU_PORT overrides the port baked into each Python file — that’s why operational ports are 3xxx even though the Python docstrings reference the 5xxx defaults.
stop_fleet.sh
run_matrix.sh
The 13-case query matrix. Each case is a bash function that prints a JSON body; the runner POSTs to${GATEWAY_URL}/plan (default http://localhost:3774), captures the SSE stream into logs/<case_id>.sse, and grades it on the presence of plan/final/done/error events.
run_dup_check idempotency probe).
How It Works
Per-agent registration. Each agent declares its skills inconfig["skills"] and bindufy() registers it with the local Bindu runtime. With AUTH__ENABLED=true, the agent also auto-registers its DID with Hydra on first boot and persists OAuth client credentials under <cwd>/.bindu/oauth_credentials.json. The start_fleet.sh script polls /health after boot to extract each agent’s DID and exports them to .fleet.env.
Gateway planning. The gateway’s POST /plan endpoint accepts a question, an agents roster (each entry has endpoint, auth, and skills), and optional preferences (timeout_ms, max_steps, session_id). It returns an SSE stream emitting these events in order: session, plan, task.started (one per agent step), task.finished, final, done — or error on failure.
The 13-case test matrix. Each case exercises a different failure mode or routing scenario:
- Q1–Q2: single-agent routing. Q1 is a perfect match for the joke agent; Q2 deliberately mismatches (asks for math but offers only a joke agent — should produce a polite decline, not an error).
- Q3, Q_MULTIHOP: multi-step chains. Q_MULTIHOP forces
research → math → poetin order, each consuming the previous artifact. - Q4: ambiguity — “make me smile” could route to joke or poet.
- Q5–Q6: gibberish and empty inputs. Q6 must be rejected at the API boundary with HTTP 400 (listed in
EXPECT_400). - Q7: unreachable peer (endpoint at
localhost:39999) — planner must surface the connect error, not hang. - Q8: bad auth (bogus bearer token) — planner must surface the 401 cleanly.
- Q9: missing skill (
nonexistent_skillon the joke agent). - Q10: timeout test with
preferences.timeout_ms = 30000. - Q11: large payload (~10KB of lorem ipsum context) — verifies no silent truncation.
- Q12: full-roster planning — all five agents available, one factual question routed correctly.
- Q_INBOX_REPRO_A: regression for a real inbox bug — single compound message that needs two agents.
- Q_INBOX_REPRO_B: turn-2 routing under multi-recipient roster. Turn 1 was math; turn 2 must re-route to joke alone.
- Q_INBOX_REPRO_C: duplicate-submit idempotency. Fires the same body twice in parallel via
run_dup_check. Currently fails —/planhas no idempotency layer yet. Excluded fromALL_CASESso the matrix stays green; run it explicitly.
"auth": { "type": "did_signed" } per agent — peer calls from the gateway sign each body with Ed25519 over a canonical {body, did, timestamp} payload (base58-encoded). The full round-trip — Hydra bearer + DID signature — is independently smoke-tested by hydra_smoke_test.sh.
Dependencies / Setup
- The Bindu Gateway running on
localhost:3774(withGATEWAY_API_KEY=dev-key-change-mein dev — override with the env var if yours differs). - A reachable Hydra at the URL configured in
examples/.env(only required forpoet_agent,faq_agent, anddid_signedauth). examples/.envwithOPENROUTER_API_KEYset. SetAUTH__ENABLED=truethere to flip the whole fleet into Hydra-protected mode; leave it false for the open-port path.
Run
examples/gateway_test_fleet/logs/:
logs/<case_id>.sse— raw Server-Sent Events streamlogs/<case_id>.status— HTTP status codelogs/<agent>.log— stdout/stderr for each agent process
hydra_smoke_test.sh — it walks through public endpoint (200), protected endpoint without bearer (401), fetching a Hydra token via client_credentials, DID-signing the body, and a successful POST with bearer + signature.
Example API Calls
Q1 request body to POST /plan
Q1 request body to POST /plan
Q1 SSE response (raw stream)
Q1 SSE response (raw stream)
Q1 final plan (stripped JSON)
Q1 final plan (stripped JSON)
Talk to one agent directly (auth off)
Talk to one agent directly (auth off)
Frontend Setup
localhost:3774 to drive the fleet from the UI.