Skip to main content

Jaeger Setup

Jaeger is an open-source distributed tracing platform that helps you monitor and troubleshoot your Bindu agents.

Quick Start

1

Start Jaeger

docker run -d --name jaeger \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest
Ports:
  • 16686 - Jaeger UI (web interface)
  • 4317 - OTLP gRPC receiver
  • 4318 - OTLP HTTP receiver
2

Configure Bindu

Add to your agent config:
{
  "name": "my-agent",
  "telemetry": true,
  "oltp": {
    "endpoint": "http://localhost:4318/v1/traces",
    "service_name": "bindu-agent"
  }
}
Or use environment variables:
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318/v1/traces"
export OTEL_SERVICE_NAME="bindu-agent"
3

Run Your Agent

python your_agent.py
4

View Traces

Open http://localhost:16686 in your browser

Jaeger UI Overview

Service View

The main dashboard shows:
  • Service Name - Your agent service (bindu-agent)
  • Operations - All traced operations:
    • task_manager.send_message
    • task_manager.get_task
    • task_manager.cancel_task
    • run task
    • agent.execute

Trace View

Complete request flow with timing:
task_manager.send_message (250ms)
└─ run task (220ms)
   └─ agent.execute (200ms)
Visual Features:
  • Color-coded by service
  • Shows parallel vs sequential execution
  • Hover for span details
  • Click to expand/collapse

Span Details

Each span shows: Attributes (Tags):
  • bindu.operation - Operation name
  • bindu.request_id - Request identifier
  • bindu.task_id - Task UUID
  • bindu.context_id - Context UUID
  • bindu.agent.name - Agent name
  • bindu.agent.did - Agent DID
  • bindu.agent.execution_time - Execution duration
  • bindu.component - Component type
Events (Logs):
  • task.state_changed - State transitions
    • from_state - Previous state
    • to_state - New state
    • error - Error message (if failed)

Search & Filter

By Service

Service: bindu-agent

By Operation

Operation: task_manager.send_message

By Tags

bindu.task_id = "550e8400-e29b-41d4-a716-446655440000"
bindu.agent.name = "my-agent"
error = true

By Duration

Min Duration: 100ms
Max Duration: 5s

By Time Range

  • Last 1 hour
  • Last 24 hours
  • Custom range

Jaeger UI Features

1. Trace Timeline

  • Visual span hierarchy
  • Color-coded by service
  • Parallel vs sequential execution
  • Interactive hover details

2. Trace Comparison

  • Compare multiple traces side-by-side
  • Identify performance regressions
  • Spot anomalies

3. Service Dependencies

  • Visualize service interactions
  • Identify bottlenecks
  • Understand system architecture

4. Statistics

  • Latency percentiles (p50, p95, p99)
  • Error rates
  • Request volume
  • Operation distribution

Production Deployment

Docker Compose

version: '3'
services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    environment:
      - COLLECTOR_OTLP_ENABLED=true
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch:9200
    ports:
      - "16686:16686"
      - "4317:4317"
      - "4318:4318"
    depends_on:
      - elasticsearch

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.0
    environment:
      - discovery.type=single-node
    ports:
      - "9200:9200"

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger
spec:
  replicas: 1
  selector:
    matchLabels:
      app: jaeger
  template:
    metadata:
      labels:
        app: jaeger
    spec:
      containers:
      - name: jaeger
        image: jaegertracing/all-in-one:latest
        env:
        - name: COLLECTOR_OTLP_ENABLED
          value: "true"
        ports:
        - containerPort: 16686
        - containerPort: 4317
        - containerPort: 4318
---
apiVersion: v1
kind: Service
metadata:
  name: jaeger
spec:
  selector:
    app: jaeger
  ports:
  - name: ui
    port: 16686
  - name: otlp-grpc
    port: 4317
  - name: otlp-http
    port: 4318

Performance Tuning

Batch Processor Configuration

# High-volume production
export OTEL_BSP_MAX_QUEUE_SIZE="4096"
export OTEL_BSP_SCHEDULE_DELAY="10000"
export OTEL_BSP_MAX_EXPORT_BATCH_SIZE="1024"
export OTEL_BSP_EXPORT_TIMEOUT="60000"

Sampling

# Sample 10% of traces
export OTEL_TRACES_SAMPLER="parentbased_traceidratio"
export OTEL_TRACES_SAMPLER_ARG="0.1"

Troubleshooting

Check:
  1. Jaeger is running:
docker ps | grep jaeger
curl http://localhost:16686
  1. OTLP endpoint is reachable:
curl -X POST http://localhost:4318/v1/traces \
  -H "Content-Type: application/json" \
  -d '{"resourceSpans":[]}'
  1. Agent logs show observability initialization:
[INFO] Initializing observability...
[INFO] Configured OTLP exporter endpoint=http://localhost:4318/v1/traces
Cause: Too many spans or large payloadsSolution:
  • Enable batch processing
  • Increase batch delay
  • Implement sampling
export OTEL_USE_BATCH_PROCESSOR="true"
export OTEL_BSP_SCHEDULE_DELAY="10000"
export OTEL_TRACES_SAMPLER="parentbased_traceidratio"
export OTEL_TRACES_SAMPLER_ARG="0.1"
Cause: Span context not propagatedSolution: Verify span propagation in scheduler:
task_operation["_current_span"] = get_current_span()

Alternative Backends

Bindu works with any OpenTelemetry-compatible backend:

Grafana Tempo

export OTEL_EXPORTER_OTLP_ENDPOINT="http://tempo:4318/v1/traces"

Zipkin

export OTEL_EXPORTER_ZIPKIN_ENDPOINT="http://zipkin:9411/api/v2/spans"

SigNoz

export OTEL_EXPORTER_OTLP_ENDPOINT="http://signoz:4318/v1/traces"

Honeycomb

export OTEL_EXPORTER_OTLP_ENDPOINT="https://api.honeycomb.io"
export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY"

Next Steps


Resources