runner object is available globally in every crawler script. It provides the bridge between the browser’s DOM and Crawlstack’s backend infrastructure.
Core Methods
publishItems(items)
Sends extracted data to the database and triggers configured webhooks.
- Parameters:
items: RawItem[](See Types Reference) - Returns:
Promise<BackendCallResult<TItem[]>>
addTasks(links)
Adds new URLs to the crawl queue.
- Parameters:
links: TaskDef[](See Types Reference) - Returns:
Promise<BackendCallResult<TTask[]>>
getCurrentTask()
Returns the metadata and custom context for the currently executing task.
- Returns:
Promise<TTask>
Human-like Interactions
These methods use the Chrome DevTools Protocol to simulate interactions that are indistinguishable from real user actions.humanClick(element, button?)
Simulates a trusted mouse click at the element’s exact coordinates.
humanScrollInView(element)
Smoothly scrolls the page until the element is visible.
Browser Automation Helpers
sleep(min, max?)
Pauses execution. If max is provided, it sleeps for a random duration between min and max milliseconds.
waitFor(selectorOrFn, options?)
Waits for an element to appear or a function to return true.
getByTextDeep(text, options?)
Finds an element by text content, even if it’s nested inside Shadow DOM.
Network Interception
fetch(url, options?)
A universal, unblockable replacement for the standard fetch() API. It proxies all requests through the background Service Worker, automatically bypassing CORS, CSP, and cross-origin restrictions.
- Internal Files: Handles
https://opfs-local.internal/URLs by streaming directly from browser storage. - External Network: Handles standard
https://URLs, masking the request origin to avoid detection. - Stealth: Uses a secure MessageChannel bridge with a random burner key handshake.
- Memory Efficient: Large files are streamed natively using
ReadableStream.
- Returns:
Promise<Response>
getDownloads()
Returns an array of files that have been intercepted during the current task.
- Returns:
Promise<{ files: { local_url: string, public_url: string, size: number, mimeType: string, status: 'pending'|'done'|'failed', error?: string }[] }>
clearDownloads()
Clears the list of tracked downloads from the current task’s queue.
Real-time Events
enableWebsockets()
Enables interception of WebSocket messages.
disableWebsockets()
Disables interception of WebSocket messages.
getWebsocketMessages()
Returns a list of WebSocket messages intercepted during the current task.
- Returns:
Promise<{ events: WebSocketEvent[] }>
clearWebsocketMessages()
Clears the list of tracked WebSocket messages.
enableSse()
Enables interception of Server-Sent Events (SSE).
disableSse()
Disables interception of Server-Sent Events (SSE).
getSseMessages()
Returns a list of SSE messages intercepted during the current task.
- Returns:
Promise<{ events: SseEvent[] }>
clearSseMessages()
Clears the list of tracked SSE messages.
Types Reference
RawItem
The format for data you want to save.
| Field | Type | Description |
|---|---|---|
id | string | Required. Unique key for deduplication. |
data | object | Required. The extraction payload (JSON). |
changefreq | string | Optional. Hints for re-crawling (always, daily, never, etc). |
TaskDef
The definition for a new URL to crawl.
| Field | Type | Description |
|---|---|---|
href | string | Required. The destination URL. |
strategy | string | NAVIGATE (default) or REOPEN (fresh browser context). |
ctx | any | Custom data to pass to the script running on the child page. |
changefreq | string | Frequency for re-visiting this link. |
TTask
The task object returned by getCurrentTask().
| Field | Type | Description |
|---|---|---|
url | string | The URL being crawled. |
depth | number | The current crawl depth. |
status | string | queued, processing, success, failed, etc. |
taskDef | TaskDef | The original definition used to create this task. |
NetworkRequest
Captured network metadata.
| Field | Type | Description |
|---|---|---|
url | string | Request URL. |
method | string | HTTP method (GET, POST, etc). |
status | number | HTTP response status code. |
body() | function | Async function that returns the response body as a string. |
WebSocketEvent
Captured WebSocket frame metadata.
| Field | Type | Description |
|---|---|---|
url | string | The URL of the WebSocket connection. |
type | string | send or receive. |
message | string | The payload data of the frame. |
opcode | number | The WebSocket opcode (1=text, 2=binary). |
timestamp | number | Browser timestamp of the event. |
SseEvent
Captured Server-Sent Event metadata.
| Field | Type | Description |
|---|---|---|
url | string | The URL of the event stream. |
eventName | string | The name of the event. |
eventId | string | The ID of the event. |
message | string | The payload data. |
timestamp | number | Browser timestamp of the event. |