Skip to main content

Overview

The Problem: Hand-crafted prompts are hard to optimize. How do we make agents learn from real user feedback and improve over time? The Solution: Use Bindu’s feedback API to collect real interactions, create golden datasets, and use DSPy to automatically optimize prompts.

The Workflow

1. Agent runs with initial prompts
2. Users provide feedback via Bindu's feedback API
3. Collect good examples → Create golden dataset
4. Export dataset for DSPy
5. DSPy optimizes prompts automatically
6. Deploy improved agent

Why This Matters

Manual prompt engineering is slow. You tweak prompts, test, repeat. With DSPy + Bindu feedback:
  • Learn from real usage - Not synthetic examples
  • Automatic optimization - DSPy tunes prompts for you
  • Continuous improvement - Agents get better over time
  • Data-driven - Optimize based on actual performance

How It Works

Step 1: Collect Feedback

from bindu import bindufy

agent = bindufy(my_agent, config, handler)

# Bindu automatically tracks all interactions
# Users provide feedback via API:
# POST /feedback
# {
#   "task_id": "task-123",
#   "rating": 5,
#   "comment": "Perfect answer!"
# }

Step 2: Create Golden Dataset

from bindu.dspy import create_golden_dataset

# Export high-quality examples
golden_dataset = create_golden_dataset(
    agent_id="my-agent",
    min_rating=4,  # Only 4+ star feedback
    limit=100
)

# Exports to DSPy format
golden_dataset.export("training_data.json")

Internal Prompt Versioning

Bindu automatically versions your prompts:
# Deploy multiple prompt versions
agent = bindufy(my_agent, config, handler)

# Version 1 (original)
agent.deploy_prompt_version("v1", "original_prompts.json")

# Version 2 (DSPy optimized)
agent.deploy_prompt_version("v2", "optimized_agent.json")

# Route traffic to different versions
agent.set_routing({
    "v1": 0.2,  # 20% traffic to original
    "v2": 0.8   # 80% traffic to optimized
})

# Or A/B test
agent.ab_test(
    control="v1",
    treatment="v2",
    metric="user_satisfaction"
)
Benefits:
  • Safe rollouts - Test optimized prompts with small traffic
  • A/B testing - Compare performance scientifically
  • Easy rollback - Revert if optimization doesn’t work
  • Version history - Track prompt improvements over time

Use Cases

Customer Support - Learn from 5-star responses
Research Agents - Optimize based on accurate answers
Code Generation - Improve from working code examples
Data Analysis - Learn from validated insights

Status

📋 Planned - Feedback API integration with DSPy export

What’s Next

  • Learn - Read about DSPy
  • Discuss - Share your optimization needs on Discord
  • Request features