MCP Server

A Model Context Protocol (MCP) server that lets IDEs and agentic systems orchestrate crawling and scraping with WDS. It exposes ready‑to‑use tools and prompts for link discovery, data extraction, and workflow automation — all backed by the WDS API Server.

What You Can Do

Run crawl/scrape workflows with tools (start jobs, follow links, extract fields, check status).
Use one‑shot prompts to guide an agent through discovery and extraction tasks.
Stream large extractions via cursors for efficient, incremental processing.

WDS API Server

WDS API Server provides the core functionality, including the WDS MCP Server.

WDS MCP Server

When the WDS API Server is up and running, the WDS MCP Server can be connected to IDEs:

Visual Studio Code

Here is an official instruction on how to connect MCP servers to Visual Studio Code. Use the following values to connect WDS MCP Server:

Parameter	Value	Description
Name	wds	A name to find the WDS MCP Server among other connected MCP servers
Type	http	WDS MCP Server is connected using the HTTP protocol
URL	http://[host:port]/mcp	A WDS MCP Server URL. If WDS API Server is deployed locally in Docker, the URL appears to be `http://localhost:2807/mcp`

Quick Start

Deploy the WDS API Server (see Server Deployments).
Connect the MCP server in your IDE (see table above).
Try a prompt, for example in VS Code: /mcp.wds.scrape-data with optional urls and mainTask args.

Tools at a Glance

StartJob: create/update a job from a Job Config, return initial Download Tasks.
Crawl: discover follow‑up pages using a selector, return Download Tasks.
Scrape: extract text/attribute values with a selector.
GetDownloadTaskStatus: inspect status, errors, and request/response details.
CrawlMdr: execute hierarchical crawl/scrape plans with cursor‑based results.
CrawlMdrConfig* helpers: build/update multi‑level plans (subs, crawl params, scrape params).
GetCrawlMdrData: fetch the next batch of scraped JSON documents via cursor.

See full details in MCP Tools.

Prompts at a Glance

ScrapeData: discover pages, define fields, and configure site‑wide scraping to JSON.
Resume: crawl and summarize an entire site into a structured overview.

See full details in MCP Prompts.

Typical Flows

Simple: StartJob → Crawl → Scrape.
Hierarchical (MDR): CrawlMdrConfigCreate/Upsert* → StartJob → CrawlMdr → GetCrawlMdrData (repeat until cursor is empty).

Endpoint and Base Path

Default MCP endpoint: /mcp under the WDS base URL (e.g., http://localhost:2807/mcp).
Helm deployments can add a base‑path prefix via global.ingress.basePath.