Retrieval
Built‑in search (FullText and Vector) that turns everything you crawl into answers.
Retrieval makes your crawled content instantly searchable with natural‑language queries. As WDS discovers pages, it adds them in a full text index (Lucene), creates embeddings and stores them in a vector index, so you can retrieve the most relevant snippets — across a single job or your entire tenant — and plug them straight into RAG workflows.
By default, WDS is configured to use the Gemma embedding model to generate high‑quality vector representations (embeddings) for indexed content.
To automatically enroll crawled pages into the indexes, make sure your crawling jobs are properly configured to enable this feature.
Which indexes to use (Full-Text, Vector, both) can be configured on the service level. See Solidstack, and Retriever service configurations for reference.
Query
Job‑scoped Search:
GET /api/retrieval/v1/{jobName}/query
Path Parameters
| Name | Type | Description |
|---|---|---|
| jobName | string | Required. Unique job name. Used to identify the job in the system where the domain name is often used (e.g., example.com) |
Tenant‑wide Search:
GET /api/retrieval/v1/query
Job‑scoped and Tenant‑wide Search Query Parameters
| Name | Type | Description |
|---|---|---|
| q | string | Required. The natural‑language query to match against indexed content. |
| limit | int | Optional. Maximum number of results to return. Default: 5. |
| threshold | string | Optional. Minimum relevance score using cosine similarity. Default: same-domain. |
Similarity Thresholds
Choose a preset for quick, predictable relevance — or provide a numeric value. Presets map to cosine similarity scores.
| Name | When to use |
|---|---|
| exact-match | The query and result describe essentially the same thing (exact term or strong synonym). |
| same-category | Not identical, but clearly the same family/category and very relevant. |
| same-domain | Topically aligned within the same thematic domain; balanced recall vs. precision. |
| generic-similarity | Broad lexical similarity; maximize recall when you’ll filter results later. |
Responses
200 (OK)
Returns an array of RetrievalItem objects.
RetrievalItem
| Field | Type | Description |
|---|---|---|
| Span | string | Required. Text span - found text with surrounding semantic context. |
| Score | float | Required. Relevance score. |
| DownloadTasks | array of DownloadTaskInfo | Required. Download tasks with this text span. |
DownloadTaskInfo
| Field | Type | Description |
|---|---|---|
| DownloadTaskId | string | Required. The download task ID where this content was captured. |
| Url | string | Required. Source page URL. |
| CaptureDateUtc | date | Required. Capture timestamp in UTC. |
403 (Forbidden)
Access restricted. Refer to the response text for more information
404 (Not Found)
The specified job was not found.