Performs recursive crawling and scraping based on a hierarchical configuration: follows links, extracts fields per level, and returns a cursor to stream large result sets.
Arguments
Name |
Type |
Description |
tasks |
Array of DownloadTask |
Required. Initial download tasks (from StartJob) |
crawlMdrConfig |
CrawlMdrConfig |
Required. Crawl Multi Dimentional Recurcieve (MDR) configuration |
DownloadTask
Represents a single page download request produced by a crawl or scrape job.
Fields:
Name |
Type |
Description |
Id |
String |
Required. Task Id |
Url |
String |
Required. Page URL |
CrawlMdrConfig
Hierarchical crawl/scrape plan that defines fields to extract, link selectors, and child levels.
Name |
Type |
Description |
MCP Tools |
Name |
String |
Required. Name of the level (e.g., ‘/’, ‘products’, etc.) |
Set via CrawlMdrConfigCreate, CrawlMdrConfigUpsertSub tools |
ScrapeParams |
Array of ScrapeParams |
List of data fields to extract |
Set via CrawlMdrConfigUpsertScrapeParams |
CrawlParams |
Array of CrawlParams |
List of link selectors for crawling on the current level |
Set via CrawlMdrConfigUpsertCrawlParams tool |
SubCrawlMdrConfigs |
Array of SubCrawlMdrConfigs |
List of sub-levels (child pages/sections), with transition crawl parameters |
Set via CrawlMdrConfigUpsertSub tool |
The selector argument is a selector of the following format: CSS|XPATH: selector
. The first part defines the selector type, the second one should be a selector in the corresponding type.
Supported types:
ScrapeParams
Name |
Type |
Description |
FieldName |
String |
Required. Name of the data field to extract |
Selector |
String |
Required. Selector for getting interesting data on a web page |
Attribute |
String |
Optional. Attribute name to get data from. Use val to get inner text. Default value: val |
CrawlParams
Name |
Type |
Description |
Selector |
String |
Required. Selector for getting interesting links on a web page |
Attribute |
String |
Optional. Attribute name to get data from. Use val to get inner text. Default value: href |
SubCrawlMdrConfigs
A child CrawlMdrConfig that includes transition crawl parameters to reach the sublevel.
Name |
Type |
Description |
SubCrawlParams |
CrawlParams |
Required. Transition crawl parameters to move to a sublevel |
Return Type
Returns a CrawlMdrResult
CrawlMdrResult
Name |
Type |
Description |
FailedDownloadTasks |
Array FailedDownloadTask |
Required. List of failed tasks grouped by their parent pages URLs |
FailedDownloadTaskCount |
Int |
Required. Number of failed download tasks |
SuccessfulDownloadTaskCount |
Int |
Required. Number of successful download tasks |
DataCursor |
CrawlMdrDataCursor |
Optional. Cursor for fetching batches of scraped data (null if no data) |
FailedDownloadTask
Name |
Type |
Description |
ParentDownloadTaskUrl |
String |
Required. Parent page URL |
FailedDownloadTasks |
Array of DownloadTask |
Required. Failed download tasks |
CrawlMdrDataCursor
Name |
Type |
Description |
JobId |
String |
Required. Job Id |
NextCursor |
String |
Optional. Cursor for fetching the next batch of scraped data (null if done) |