Skip to main content
Stop the joke agent (Ctrl-C in its terminal). We’ll start both it and four more using a helper script:
./examples/gateway_test_fleet/start_fleet.sh
Expected output:
  [joke_agent]      started, pid=64945
  [math_agent]      started, pid=64958
  [poet_agent]      started, pid=64969
  [research_agent]  started, pid=64980
  [faq_agent]       started, pid=64993
Five agents now, each on its own port:
AgentPortDoes
joke_agent3773Tells jokes
math_agent3775Solves math problems step-by-step
poet_agent3776Writes short poems
research_agent3777Web search + summarize a factual question
faq_agent3778Answers from a canned FAQ
Each is ~60 lines of Python. Open any one — say joke_agent.py — and you’ll see a small configuration that wires a language model (openai/gpt-4o-mini) to a few lines of instructions (“tell jokes, refuse other requests”). Narrow scope on purpose so mistakes are visible.
The gateway is already running from the previous chapter; don’t restart it.

A three-agent question

Paste this into your curl terminal. It asks something that genuinely needs three agents to answer:
curl -N http://localhost:3774/plan \
  -H "Authorization: Bearer ${GATEWAY_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "First research the current approximate population of Tokyo. Then compute what exactly 0.5% of that population is. Finally write a 4-line poem celebrating that number of people.",
    "agents": [
      {
        "name": "research", "endpoint": "http://localhost:3777",
        "auth": { "type": "none" },
        "skills": [{ "id": "web_research", "description": "Web search and summarize a factual question" }]
      },
      {
        "name": "math", "endpoint": "http://localhost:3775",
        "auth": { "type": "none" },
        "skills": [{ "id": "solve", "description": "Solve math problems step-by-step" }]
      },
      {
        "name": "poet", "endpoint": "http://localhost:3776",
        "auth": { "type": "none" },
        "skills": [{ "id": "write_poem", "description": "Write a short poem" }]
      }
    ]
  }'
This takes around 15 seconds and produces three task.started events, in order — research first, then math, then poet. Abbreviated output from a real run:
task.started  → research called with "What is the current population of Tokyo?"
task.artifact → "Tokyo's metropolitan area has approximately 36.95 million people..."
task.finished → completed

task.started  → math called with "Compute 0.5% of 36,950,000"
task.artifact → "0.005 × 36,950,000 = 184,750"
task.finished → completed

task.started  → poet called with "Write a 4-line poem about 184,750 people"
task.artifact → "In Tokyo's heart, where dreams align, / 184,750 souls brightly shine, / ..."
task.finished → completed

text.delta    → "Step 1 — Population: 36.95 million..."
...
final
done
The gateway chose the order, extracted the right number from each reply, and passed it to the next agent — all without you writing a single line of glue code. That’s the whole point.

How it chose

The planner saw three tools available (one per agent-skill combination):
Tool nameDescription
call_research_web_researchWeb search and summarize a factual question
call_math_solveSolve math problems step-by-step
call_poet_write_poemWrite a short poem
Where do those tool names come from? The gateway builds them automatically from the name and skills[].id fields in your request: call_<agent-name>_<skill-id>.
Then the planner read the question: “First research… Then compute… Finally write a 4-line poem…” The word “First” strongly suggests research is step 1, and the LLM picked call_research_web_research. It waited for the reply, re-read the question with the new context, decided the next step was math, picked call_math_solve, and so on. All of this happens inside one HTTP request. The SSE stream is the gateway narrating what the planner decided.

What if you added a fourth agent it doesn’t need?

Try it. Add the joke agent to the catalog above and re-run:
{
  "name": "joke", "endpoint": "http://localhost:3773",
  "auth": { "type": "none" },
  "skills": [{ "id": "tell_joke", "description": "Tell a joke" }]
}
The SSE output is the same — three task.started events for research, math, poet. The joke tool sat there unused.
The planner only calls what it needs. This matters in production: you can hand the gateway a catalog of 50 agents, and only the 2 or 3 relevant to a given question will actually be invoked.

What is the planner, actually?

Inside the gateway, there’s a single agent configuration file called gateway/agents/planner.md. It’s a markdown file with some frontmatter:
---
name: planner
model: openrouter/anthropic/claude-sonnet-4.6
steps: 10
permission:
  ...
---

# System prompt body — the planner's own instructions.
The body is the system prompt. On each /plan request, the gateway does this:
1

Read the planner's system prompt.

Loaded fresh from disk each request — no cache.
2

Add the user's question as a new user message.

Plus any history from the session, if resuming.
3

Build the tool list from your agents[] catalog.

One tool per agent.skill pair.
4

Hand all of that to OpenRouter with streamText().

Claude (or whatever model you picked) drives the loop.
5

Stream the output back to you as SSE.

Text deltas + tool calls + tool results.
Inside OpenRouter, Claude runs its agentic loop — text → tool call → tool result → more text → another tool call → final text. The gateway’s job is just to execute the tool calls against your real agents and plumb the results back.
Open gateway/agents/planner.md and read the body. That’s the instructions the coordinator AI follows. You can edit it and the next plan will see the changes — the file is loaded on every request, not cached.
Next up: teach the planner reusable patterns without editing its system prompt. Recipes →