CrawlMdr Tool

Performs recursive crawling and scraping based on a hierarchical configuration: follows links, extracts fields per level, and returns a cursor to stream large result sets.

Arguments

Name Type Description
tasks Array of DownloadTask Required. Initial download tasks (from StartJob)
crawlMdrConfig CrawlMdrConfig Required. Crawl Multi Dimentional Recurcieve (MDR) configuration

DownloadTask

Represents a single page download request produced by a crawl or scrape job.

Fields:

Name Type Description
Id String Required. Task Id
Url String Required. Page URL

CrawlMdrConfig

Hierarchical crawl/scrape plan that defines fields to extract, link selectors, and child levels.

Name Type Description MCP Tools
Name String Required. Name of the level (e.g., ‘/’, ‘products’, etc.) Set via CrawlMdrConfigCreate, CrawlMdrConfigUpsertSub tools
ScrapeParams Array of ScrapeParams List of data fields to extract Set via CrawlMdrConfigUpsertScrapeParams
CrawlParams Array of CrawlParams List of link selectors for crawling on the current level Set via CrawlMdrConfigUpsertCrawlParams tool
SubCrawlMdrConfigs Array of SubCrawlMdrConfigs List of sub-levels (child pages/sections), with transition crawl parameters Set via CrawlMdrConfigUpsertSub tool
Remarks

The selector argument is a selector of the following format: CSS|XPATH: selector. The first part defines the selector type, the second one should be a selector in the corresponding type. Supported types:

ScrapeParams

Name Type Description
FieldName String Required. Name of the data field to extract
Selector String Required. Selector for getting interesting data on a web page
Attribute String Optional. Attribute name to get data from. Use val to get inner text. Default value: val

CrawlParams

Name Type Description
Selector String Required. Selector for getting interesting links on a web page
Attribute String Optional. Attribute name to get data from. Use val to get inner text. Default value: href

SubCrawlMdrConfigs

A child CrawlMdrConfig that includes transition crawl parameters to reach the sublevel.

Name Type Description
SubCrawlParams CrawlParams Required. Transition crawl parameters to move to a sublevel

Return Type

Returns a CrawlMdrResult

CrawlMdrResult

Name Type Description
FailedDownloadTasks Array FailedDownloadTask Required. List of failed tasks grouped by their parent pages URLs
FailedDownloadTaskCount Int Required. Number of failed download tasks
SuccessfulDownloadTaskCount Int Required. Number of successful download tasks
DataCursor CrawlMdrDataCursor Optional. Cursor for fetching batches of scraped data (null if no data)

FailedDownloadTask

Name Type Description
ParentDownloadTaskUrl String Required. Parent page URL
FailedDownloadTasks Array of DownloadTask Required. Failed download tasks

CrawlMdrDataCursor

Name Type Description
JobId String Required. Job Id
NextCursor String Optional. Cursor for fetching the next batch of scraped data (null if done)

Please rotate your device to landscape mode

This documentation is specifically designed with a wider layout to provide a better reading experience for code examples, tables, and diagrams.
Rotating your device horizontally ensures you can see everything clearly without excessive scrolling or resizing.

Return to Web Data Source Home