Why Health And Metrics Matter
An agent does not just need to be online. It needs to be ready, observable, and measurable while it works.| No health instrumentation | Bindu health and metrics |
|---|---|
| Hard to tell if the process is actually ready | Explicit readiness and runtime state |
| Failures surface only through user-facing symptoms | Health endpoint exposes system condition early |
| Limited insight into request behavior | Metrics reveal traffic, latency, and task load |
| Harder to automate monitoring | Prometheus-friendly output fits existing tooling |
| Debugging starts too late | Visibility begins from the first request |
If an agent is accepting requests but cannot actually process them safely, a plain
process heartbeat is not enough. You need visibility into readiness, runtime
dependencies, and traffic behavior.
How Health And Metrics Work
Bindu exposes two complementary views:/healthanswers “is this agent ready right now?”/metricsanswers “how has this agent been behaving over time?”
Readable
Health responses expose runtime, application, and system state in one place.
Queryable
Metrics can be scraped by Prometheus-compatible systems for long-term monitoring.
Operational
Together they help teams detect readiness issues and performance drift early.
Health Endpoint
Check Health
The health endpoint answers the simplest operational question first: is the agent
running and ready?Response fields:
| Field | Description |
|---|---|
status | Overall health status (ok or degraded) |
ready | Whether agent is ready to accept requests |
uptime_seconds | Time since agent started |
version | Bindu version number |
health | Health status (healthy or degraded) |
runtime.storage_backend | Storage backend type (e.g., PostgresStorage) |
runtime.scheduler_backend | Scheduler backend type (e.g., RedisScheduler) |
runtime.task_manager_running | Whether task manager is running |
runtime.strict_ready | Strict readiness check status |
application.penguin_id | Unique agent instance identifier |
application.agent_did | Agent’s Decentralized Identifier |
system.python_version | Python version |
system.platform | Operating system platform |
system.environment | Deployment environment |
Inspect Runtime State
The health response is not just a liveness check. It exposes enough context to tell
whether the agent is actually safe to receive work.Readiness is often a dependency question. If storage, scheduling, or the task
manager is degraded, the agent may be present but not truly ready.
Metrics Endpoint
The metrics endpoint exposes Prometheus-compatible time-series data so you can watch traffic, latency, concurrency, and active task load over time.| Metric | Type | Description |
|---|---|---|
http_requests_total | counter | Total HTTP requests by method, endpoint, status |
http_request_duration_seconds | histogram | Request latency |
agent_tasks_active | gauge | Currently active tasks |
agent_tasks_completed_total | counter | Total completed tasks by agent and status |
http_response_size_bytes | summary | Response body size |
http_requests_in_flight | gauge | Current requests being processed |
Real-World Use Cases
Kubernetes or container readiness checks
Kubernetes or container readiness checks
A deployment platform can call
/health before routing traffic to the agent,
making sure requests only hit instances that are actually ready.Prometheus-based monitoring
Prometheus-based monitoring
A monitoring stack can scrape
/metrics continuously to track request counts,
latency changes, and current task volume.Debugging runtime dependencies
Debugging runtime dependencies
If an agent appears live but behaves incorrectly, the health response can reveal
which runtime pieces are active and whether strict readiness is being met.
Best Practices
Monitor Readiness, Not Just Liveness
Use the full
/health response to validate agent readiness instead of relying only
on process existence.Track Metrics Continuously
Scrape
/metrics over time so latency, concurrency, and traffic issues are visible
before they become outages.