MCP Server
A Model Context Protocol (MCP) server that lets IDEs and agentic systems orchestrate crawling and scraping with WDS. It exposes ready‑to‑use tools and prompts for link discovery, data extraction, and workflow automation — all backed by the WDS API Server.
What You Can Do
- Run crawl/scrape workflows with tools (start jobs, follow links, extract fields, check status).
- Use one‑shot prompts to guide an agent through discovery and extraction tasks.
- Stream large extractions via cursors for efficient, incremental processing.
WDS API Server
WDS API Server provides the core functionality, including the WDS MCP Server.
WDS MCP Server
When the WDS API Server is up and running, the WDS MCP Server can be connected to IDEs:
Visual Studio Code
Here is an official instruction on how to connect MCP servers to Visual Studio Code. Use the following values to connect WDS MCP Server:
Parameter | Value | Description |
---|---|---|
Name | wds | A name to find the WDS MCP Server among other connected MCP servers |
Type | http | WDS MCP Server is connected using the HTTP protocol |
URL | http://[host:port]/mcp | A WDS MCP Server URL. If WDS API Server is deployed locally in Docker, the URL appears to be http://localhost:2807/mcp |
Quick Start
- Deploy the WDS API Server (see Server Deployments).
- Connect the MCP server in your IDE (see table above).
- Try a prompt, for example in VS Code:
/mcp.wds.scrape-data
with optionalurls
andmainTask
args.
Tools at a Glance
- StartJob: create/update a job from a Job Config, return initial Download Tasks.
- Crawl: discover follow‑up pages using a selector, return Download Tasks.
- Scrape: extract text/attribute values with a selector.
- GetDownloadTaskStatus: inspect status, errors, and request/response details.
- CrawlMdr: execute hierarchical crawl/scrape plans with cursor‑based results.
- CrawlMdrConfig* helpers: build/update multi‑level plans (subs, crawl params, scrape params).
- GetCrawlMdrData: fetch the next batch of scraped JSON documents via cursor.
See full details in MCP Tools.
Prompts at a Glance
- ScrapeData: discover pages, define fields, and configure site‑wide scraping to JSON.
- Resume: crawl and summarize an entire site into a structured overview.
See full details in MCP Prompts.
Typical Flows
- Simple: StartJob → Crawl → Scrape.
- Hierarchical (MDR): CrawlMdrConfigCreate/Upsert* → StartJob → CrawlMdr → GetCrawlMdrData (repeat until cursor is empty).
Endpoint and Base Path
- Default MCP endpoint:
/mcp
under the WDS base URL (e.g.,http://localhost:2807/mcp
). - Helm deployments can add a base‑path prefix via
global.ingress.basePath
.