Observability Overview
Bindu provides comprehensive observability through OpenTelemetry, enabling you to monitor, trace, and analyze your agentβs performance and behavior in real-time.What is Observability?
Observability gives you deep insights into your agentβs internal state through:- Traces - Follow requests through the entire system
- Metrics - Track performance and resource usage
- Logs - Capture detailed execution information
- Events - Monitor state transitions and key moments
Why Observability Matters
Performance Analysis
Identify bottlenecks and optimize agent response times
Error Debugging
Quickly diagnose and fix issues with detailed traces
State Visibility
Track task state transitions and agent behavior
Production Monitoring
Monitor agent health and performance in production
Architecture
Bindu implements distributed tracing across the entire task execution lifecycle:Trace Flow
- TaskManager - Creates root span for operations
- Scheduler - Propagates trace context to workers
- Worker - Restores context and creates child spans
- Agent - Tracks execution time and state changes
Key Features
Distributed Tracing
Track requests across the entire system:- β End-to-end visibility - From API request to agent response
- β Span propagation - Maintains context across async boundaries
- β Parent-child relationships - Clear hierarchy of operations
- β Timing information - Precise duration of each operation
Rich Attributes
Comprehensive metadata on every span:bindu.operation- Operation name (e.g., βsend_messageβ)bindu.task_id- Task UUIDbindu.context_id- Conversation contextbindu.agent.name- Agent identifierbindu.agent.did- Agent DIDbindu.agent.execution_time- Agent processing time
State Transition Events
Timeline markers for key moments:- Task state changes (working β completed)
- Error occurrences with stack traces
- Input/auth requirements
- Custom agent events
Performance Metrics
Automatic metric collection:bindu_tasks_total- Counter of tasks processedbindu_task_duration_seconds- Histogram of durationsbindu_active_tasks- Current active tasksbindu_contexts_total- Contexts managed
Supported Backends
Bindu works with any OpenTelemetry-compatible backend:Open Source
- Jaeger - Distributed tracing platform
- Grafana Tempo - High-scale distributed tracing
- Zipkin - Distributed tracing system
- SigNoz - Full-stack observability platform
Commercial
- Honeycomb - Observability for production systems
- New Relic - Full-stack observability
- Datadog - Monitoring and analytics
- Lightstep - Observability for microservices
Quick Start
1
Start Jaeger
2
Configure Agent
Add to your agent config:
3
Run Agent
4
View Traces
Open http://localhost:16686 and select your service
Configuration Options
Agent Config (Recommended)
Environment Variables
Agent config takes precedence over environment variables.
Example Trace
Hereβs what a complete trace looks like:Trace Attributes
| Attribute | Description | Example |
|---|---|---|
bindu.operation | Operation type | send_message |
bindu.task_id | Task identifier | task-456 |
bindu.agent.name | Agent name | my-agent |
bindu.agent.execution_time | Processing time | 0.200 |
bindu.success | Success flag | true |
Observability Best Practices
1. Consistent Naming
Use clear, consistent span names:2. Rich Attributes
Add meaningful context to spans:3. Span Events
Use events for timeline markers:4. Error Handling
Always record errors with context:5. Sampling Strategy
Configure sampling for high-volume production:Performance Tuning
Batch Processor
Optimize for production workloads:Development vs Production
Development:Troubleshooting
No traces appearing
No traces appearing
Check:
- Jaeger is running:
docker ps | grep jaeger - Endpoint is correct:
echo $OTEL_EXPORTER_OTLP_ENDPOINT - Agent logs show observability initialization
- Test endpoint:
curl http://localhost:4318/v1/traces
Traces delayed
Traces delayed
Cause: BatchSpanProcessor batches spans before sending (default: 5s)Solution: Set
OTEL_USE_BATCH_PROCESSOR="false" for developmentWrong service name
Wrong service name
Solution: Set
OTEL_SERVICE_NAME or configure in agent configHigh memory usage
High memory usage
Cause: Queue size too large or export delaysSolution: Tune batch processor parameters:
Next Steps
Tracing Guide
Learn about distributed tracing in detail
Jaeger Setup
Set up Jaeger for trace visualization
Metrics
Understand metrics and performance monitoring
Phoenix Dashboard
Use the monitoring dashboard
Resources
- OpenTelemetry Docs - Official OpenTelemetry documentation
- Jaeger Documentation - Jaeger tracing platform
- GitHub Examples - Code examples
- Discord Community - Get help and share insights