Bindu - A2A Protocol Compliant AI Agent Framework

AI agent that crawls web pages and extracts structured data.

Code

Create web-scraping-agent.py with the code below, or save it directly from your editor.

"""Web Scraping AI Agent (Local & Cloud SDK)

Crawls web pages, extracts structured data, cleans and formats outputs,
and prepares datasets for analysis or integration.

Features:
- ScrapeGraph AI for intelligent structured extraction
- Mem0 for persistent memory (dedup, extraction profiles)
- OpenRouter (openai/gpt-oss-120b) for synthesis and formatting
- Local run mode + Bindu Cloud SDK deployment

Usage:
    python web_scraping_agent.py

Environment:
    Requires SCRAPEGRAPH_API_KEY, MEM0_API_KEY, OPENROUTER_API_KEY in .env file
"""

import os
from dotenv import load_dotenv

load_dotenv()

from bindu.penguin.bindufy import bindufy
from agno.agent import Agent
from agno.models.openrouter import OpenRouter
from agno.tools.scrapegraph import ScrapeGraphTools
from agno.tools.mem0 import Mem0Tools

# Initialize the web scraping agent
agent = Agent(
    instructions=(
        "You are a web scraping assistant. Given a URL and an optional extraction prompt, "
        "use ScrapeGraph to extract structured data from the page. Clean and format the output "
        "into JSON. Use memory to avoid re-scraping URLs you have already processed and to "
        "remember extraction preferences ."
    ),
    model=OpenRouter(
        id="openai/gpt-oss-120b",
        api_key=os.getenv("OPENROUTER_API_KEY"),
    ),
    tools=[
        ScrapeGraphTools(api_key=os.getenv("SCRAPEGRAPH_API_KEY")),
        Mem0Tools(api_key=os.getenv("MEM0_API_KEY")),
    ],
)

# Agent configuration for Bindu
config = {
    "author": "bindu.builder@getbindu.com",
    "name": "web_scraping_agent",
    "description": (
        "AI-enabled web scraping agent that collects, structures, and processes "
        "data from websites for analysis and automation."
    ),
    "deployment": {
        "url": "http://localhost:3773",
        "expose": True,
        "cors_origins": ["http://localhost:5173"],
    },
    "skills": ["skills/web-scraping-skill"],
}

def handler(messages: list[dict[str, str]]):
    """
    Process incoming messages and return agent response.

    Args:
        messages: List of message dictionaries containing conversation history

    Returns:
        Extracted and structured data from the requested web page
    """
    if messages:
        latest = (
            messages[-1].get("content", "")
            if isinstance(messages[-1], dict)
            else str(messages[-1])
        )
        result = agent.run(input=latest)
        if hasattr(result, "content"):
            return result.content
        elif hasattr(result, "response"):
            return result.response
        return str(result)
    return "Please provide a URL and an extraction prompt."

if __name__ == "__main__":
    # Bindu-fy the agent — converts it to a discoverable, interoperable Bindu agent
    bindufy(config, handler)

Skill Configuration

Create skills/web-scraping-skill/skill.yaml:

# Web Scraping Skill
# AI-enabled web scraping that crawls pages and extracts structured data

id: web-scraping-skill
name: web-scraping-skill
version: 1.0.0
author: bindu.builder@getbindu.com

description: |
  AI-enabled web scraping skill that crawls web pages, extracts structured data,
  cleans and formats outputs, and prepares datasets for analysis or integration.

tags:
  - web-scraping
  - data-processing
  - extraction
  - crawler
  - scrape
  - structured-data

input_modes:
  - application/json

output_modes:
  - application/json

examples:
  - "Extract product listings from this e-commerce site: https://example.com/products"
  - "Scrape blog titles and publish dates from https://example.com/blog"
  - "Get all article headlines from this news page"

capabilities_detail:
  web_scraping:
    supported: true
    description: "Crawl and extract structured data from any public web page"
  data_processing:
    supported: true
    description: "Clean, normalize, and format extracted content into structured JSON"
  memory:
    supported: true
    description: "Remember previously scraped URLs and extraction profiles via Mem0"
  deduplication:
    supported: true
    description: "Avoid re-scraping already processed URLs"

assessment:
  keywords:
    - scrape
    - crawl
    - extract
    - web
    - website
    - product listings
    - blog titles
    - data collection
    - html
    - structured data

  specializations:
    - domain: e_commerce_extraction
      confidence_boost: 0.3
    - domain: content_aggregation
      confidence_boost: 0.2

  anti_patterns:
    - "pdf extraction"
    - "database query"
    - "audio transcription"
    - "image generation"

How It Works

Web Scraping

ScrapeGraphTools: AI-powered structured data extraction
Intelligent web page crawling and parsing
JSON output formatting and cleaning
Custom extraction prompt support

Memory Management

Mem0Tools: Persistent memory for deduplication
Extraction profile storage and retrieval
Avoids re-scraping previously processed URLs
Remembers user extraction preferences

Data Processing

OpenRouter with GPT-OSS-120b for synthesis
Advanced data structuring and formatting
Content cleaning and normalization
JSON output preparation

Agent Capabilities

Web scraping assistant with AI extraction
Structured data output in JSON format
Memory-based optimization and caching
Multi-format data preparation

Dependencies

uv init
uv add bindu agno python-dotenv scrapegraphai mem0ai

Environment Setup

Create .env file:

SCRAPEGRAPH_API_KEY=your_scrapegraph_api_key
MEM0_API_KEY=your_mem0_api_key
OPENROUTER_API_KEY=your_openrouter_api_key_here

Run

uv run web-scraping-agent.py

Examples:

“Extract product information from https://example-shop.com including prices, names, and descriptions”
“Scrape news headlines from multiple news websites”
“Get pricing data from e-commerce product pages”

Example API Calls

Message Send Request

{
  "jsonrpc": "2.0",
  "method": "message/send",
  "params": {
    "message": {
      "role": "user",
      "kind": "message",
      "messageId": "9f11c870-5616-49ad-b187-d93cbb100001",
      "contextId": "9f11c870-5616-49ad-b187-d93cbb100002",
      "taskId": "9f11c870-5616-49ad-b187-d93cbb100003",
      "parts": [
        {
          "kind": "text",
          "text": "Extract product information from https://example-shop.com including prices, names, and descriptions"
        }
      ]
    },
     "skillId": "web-scraping-skill",
    "configuration": {
      "acceptedOutputModes": ["application/json"]
    }
  },
  "id": "9f11c870-5616-49ad-b187-d93cbb100003"
}

Task get Request

{
  "jsonrpc": "2.0",
  "method": "tasks/get",
  "params": {
    "taskId": "9f11c870-5616-49ad-b187-d93cbb100003"
  },
  "id": "9f11c870-5616-49ad-b187-d93cbb100004"
}

Frontend Setup

# Clone the Bindu repository
git clone https://github.com/GetBindu/Bindu

# Navigate to frontend directory
cd frontend

# Install dependencies
npm install

# Start frontend development server
npm run dev

Open http://localhost:5173 and try to chat with the web scraping agent

Beginner

Specialized

Advanced

2.9 Web Scraping AI Agent

Code

Skill Configuration

How It Works

Dependencies

Environment Setup

Run

Example API Calls

Frontend Setup

Beginner

Specialized

Advanced

Documentation Index

​Code

​Skill Configuration

​How It Works

​Dependencies

​Environment Setup

​Run

​Example API Calls

​Frontend Setup

Code

Skill Configuration

How It Works

Dependencies

Environment Setup

Run

Example API Calls

Frontend Setup