You have three research agents on the network. Two are good, one is overloaded, one charges 5x more than the others. Static routing — “always send research tasks to Agent A” — works until Agent A goes down or starts dropping quality on a topic outside its training. Negotiation flips the question. Instead of the orchestrator guessing who’s best, it asks each candidate to score itself against the task. The agent looks at its own skills, queue depth, latency profile, and (if x402 is enabled) price, and replies with anDocumentation Index
Fetch the complete documentation index at: https://docs.getbindu.com/llms.txt
Use this file to discover all available pages before exploring further.
accepted flag, a numeric score, and a per-skill reasoning trail. The orchestrator ranks the replies and picks one.
The endpoint is POST /agent/negotiation. It’s public — no auth, no DID handshake — so any orchestrator on the network can probe an agent before sending a real task.
Bindu ships the client side only: each agent self-assesses against an incoming
negotiation request. The broadcast-and-rank logic lives in the orchestrator, which is
out of scope for the runtime. See Custom orchestrator below for
a minimal implementation.
How Bindu Negotiation Works
The agent owns the scoring. It knows its skills, its queue, its latency history, its prices. The orchestrator just asks the question and ranks the answers.The Negotiation Model
| Static routing | Negotiation |
|---|---|
| Orchestrator hardcodes “send X to agent A” | Orchestrator asks “who can do X?” |
| Breaks when A is overloaded or down | Live load and latency feed the score |
| New agents need router config changes | New agents advertise skills and join the pool |
| Single point of failure | Per-request fan-out across candidates |
| No visibility into why A was chosen | Response includes subscores and matched skills |
Adaptive
Selection can change per request instead of being fixed in advance.
Explainable
Responses include per-component subscores and per-skill match reasoning, not a
black-box yes or no.
Efficient
Hard constraints reject early. Only candidates that survive the gate compute a full
score.
The Lifecycle: Broadcast, Assess, Select
Broadcast
The orchestrator POSTs the same assessment request to every candidate URL. Only
task_summary is required — everything else (IO types, tools, latency budget, cost
cap, weights) is optional and turns into either a hard gate or a soft score.Assess
The agent runs Any caller-supplied
CapabilityCalculator.calculate(). The pipeline is fixed:- Empty skill list -> reject with
no_skills_advertised. - Hard constraints -> check IO modes, required tools, forbidden tools. First failure exits with the matching
rejection_reason. - Skill match -> hybrid score per skill (see Capability matching).
- Cost -> if x402 says the agent’s price exceeds
max_cost_amount, exit withcost_exceeds_budget. - Latency -> estimate from the skills’
performance.avg_processing_time_ms. If the estimate exceedsmax_latency_ms * 2, exit withlatency_exceeds_constraint. - Combine -> weighted average of five subscores.
- Accept if
final_score >= min_scoreand at least one skill matched.
app_settings.negotiation):weights block is normalized to sum to 1.0 — you don’t have to
pre-balance them.Capability matching
Skill match is the largest weight (55%) and the most interesting part of the pipeline. It runs as a hybrid:Embeddings come from OpenRouter (
text-embedding-3-small by default). Skill embeddings
are computed once on first request and cached on the agent. Task embeddings are cached
in a per-instance LRU (max 1000 entries) so identical follow-ups don’t re-hit the API.What the agent embeds per skill
TheSkillEmbedder builds a single string from each skill’s name, description, tags, assessment.keywords, and the keys of capabilities_detail. That string is what gets compared against the embedded task_summary + task_details.
Keyword extraction
Both the task and each skill are tokenized withre.split(r"[^a-z0-9]+", ...), lowercased, and filtered to tokens of length 2-100. Jaccard similarity (|A ∩ B| / |A ∪ B|) gives a deterministic, no-network fallback.
Confidence
A separate signal fromscore. Confidence starts at 0.5 and gains:
- +0.2 if best skill match > 0.3, else +0.1 if any match exists
- +0.1 each for IO constraints, latency constraint, and queue depth being present in the request
Hard constraints and rejection
Hard constraints reject before any scoring happens. The full list:The latency gate uses a 2x multiplier. Saying
max_latency_ms: 5000 rejects only when
the agent’s estimated latency exceeds 10000 ms. Everything between 5000 and 10000 ms
passes the gate but contributes a lower performance subscore.Configuration
Enable negotiation on an agent
capabilities.negotiation is True, Bindu reads OPENROUTER_API_KEY from the environment and injects it into negotiation.embedding_api_key if you haven’t set it explicitly. With no key configured, the calculator silently falls back to keyword-only matching.
The
/agent/negotiation route is always mounted — it doesn’t require the capability
flag to respond. The flag controls config enrichment (auto-loading the OpenRouter key).
An agent that omits the flag but sets the key directly still gets full embedding-backed
matching.Authoring skills for good matches
Skill metadata is what the agent has to score against. Two fields move the needle most:assessment block is the lever. Keywords beef up the keyword side of the hybrid score, specializations give you targeted boosts when a domain phrase shows up in the task, and anti_patterns are how you opt out of tasks you’d technically match on tags but actually can’t do well.
Environment variables
Asking “who can do X?“
curl
Python orchestrator
A minimum-viable orchestrator: fan out, filter rejections, rank, return the winner.Real-world use cases
Multi-agent translation
Multi-agent translation
Query every translation agent in parallel. The specialist with matching domain
keywords (e.g.
technical_translation in specializations) bubbles up over the
generalist even with a deeper queue.Cost-aware routing with x402
Cost-aware routing with x402
When an agent advertises a price via the
x402 capability, the cost subscore
becomes meaningful. Set max_cost_amount to your budget and the agent rejects with
cost_exceeds_budget if it can’t fit. Among the survivors, pick by your own rule —
cheapest, fastest, highest score.Custom orchestrator
Custom orchestrator
The orchestrator is yours to write. Bindu doesn’t ship one — it ships the side of the
protocol that lets any agent answer the question honestly. Re-rank by custom
business rules without changing agents:
Gateway routing
Gateway routing
The Bindu gateway is stateless and forwards by agent ID — it does not orchestrate
negotiation. If you want capability-based routing in front of the gateway, run a
thin orchestrator that calls
/agent/negotiation on each candidate and forwards the
real task to the winner via the gateway as usual.Edge cases & FAQ
What happens when no agent accepts?
What happens when no agent accepts?
Each agent returns
accepted: false with a specific rejection_reason. The
orchestrator’s job is to either fall back to a relaxed query (drop min_score, drop
required_tools), pick the highest-scoring rejection anyway and accept the risk, or
surface the failure to the caller. The agent itself takes no further action.What if I send no skills?
What if I send no skills?
The calculator returns
accepted: false, rejection_reason: "no_skills_advertised"
immediately. No scoring runs. Negotiation requires that the agent has declared at
least one Skill in its manifest.What if I don't have an OpenRouter key?
What if I don't have an OpenRouter key?
use_embeddings stays on, but the lazy load fails the first time it’s needed, logs
a warning, and flips the flag off. The pipeline degrades to pure keyword (Jaccard)
matching. You lose paraphrase resilience but the endpoint stays functional.How is queue_depth measured?
How is queue_depth measured?
The endpoint asks the task manager’s storage layer for all tasks and counts the ones
whose
status.state is in app_settings.agent.non_terminal_states. So a busy agent
actually scores lower on load — the formula is 1 / (1 + queue_depth), which goes
from 1.0 at idle to 0.5 at one in-flight to 0.33 at two, and so on.Can the agent lie?
Can the agent lie?
Yes. Self-assessment is a trust contract, not a proof. The orchestrator should track
actual outcomes (latency, success rate) and either down-rank dishonest agents or
publish reputation back into selection. Bindu does not enforce honesty in the
response — the protocol just exposes the agent’s claim.
Is /agent/negotiation authenticated?
Is /agent/negotiation authenticated?
No. It’s in the public-endpoints allowlist alongside
/agent/info and
/.well-known/agent.json. Anyone reachable on the network can probe. If that’s not
OK for your deployment, front the agent with a gateway or proxy that gates the
route.Design principles
Honest
Agents score themselves with real data — queue depth from storage, latency from
skill metadata, price from the x402 extension. Inflating scores is technically
possible but observably wrong over time.
Weighted
Orchestrators tune weights per request. Same agent, same skills, different ranking
depending on whether you want speed, fit, or cost.
Composable
The endpoint is one POST with one JSON body. Build the orchestrator however you
like — broadcast, two-phase, hedged requests, reputation-weighted.