Skip to main content
The runner object is available globally in every crawler script. It provides the bridge between the browser’s DOM and Crawlstack’s backend infrastructure.

Core Methods

publishItems(items)

Sends extracted data to the database and triggers configured webhooks.
await runner.publishItems([
  {
    id: "unique-key-123",
    data: { name: "Product", price: "$10" },
    changefreq: "daily"
  }
]);
  • Parameters: items: RawItem[] (See Types Reference)
  • Returns: Promise<BackendCallResult<TItem[]>>
Adds new URLs to the crawl queue.
await runner.addTasks([
  { 
    href: "https://example.com/page/2", 
    strategy: "NAVIGATE",
    ctx: { category: "electronics" } 
  }
]);
  • Parameters: links: TaskDef[] (See Types Reference)
  • Returns: Promise<BackendCallResult<TTask[]>>

getCurrentTask()

Returns the metadata and custom context for the currently executing task.
const task = await runner.getCurrentTask();
console.log(`Current Depth: ${task.depth}`);
console.log(`Custom Context:`, task.taskDef.ctx);
  • Returns: Promise<TTask>

Human-like Interactions

These methods use the Chrome DevTools Protocol to simulate interactions that are indistinguishable from real user actions.

humanClick(element, button?)

Simulates a trusted mouse click at the element’s exact coordinates.
const btn = document.querySelector("#submit");
await runner.humanClick(btn, "left");

humanScrollInView(element)

Smoothly scrolls the page until the element is visible.
const footerLink = document.querySelector("a.footer");
await runner.humanScrollInView(footerLink);

Browser Automation Helpers

sleep(min, max?)

Pauses execution. If max is provided, it sleeps for a random duration between min and max milliseconds.
await runner.sleep(1000, 3000); // Sleep for 1-3 seconds

waitFor(selectorOrFn, options?)

Waits for an element to appear or a function to return true.
await runner.waitFor(".success-message", { timeoutMs: 15000 });

getByTextDeep(text, options?)

Finds an element by text content, even if it’s nested inside Shadow DOM.
const button = runner.getByTextDeep("Load More", { exact: true });
button?.click();

Network Interception

fetch(url, options?)

A universal, unblockable replacement for the standard fetch() API. It proxies all requests through the background Service Worker, automatically bypassing CORS, CSP, and cross-origin restrictions.
  • Internal Files: Handles https://opfs-local.internal/ URLs by streaming directly from browser storage.
  • External Network: Handles standard https:// URLs, masking the request origin to avoid detection.
  • Stealth: Uses a secure MessageChannel bridge with a random burner key handshake.
  • Memory Efficient: Large files are streamed natively using ReadableStream.
const res = await runner.fetch("https://api.example.com/data");
const json = await res.json();
  • Returns: Promise<Response>

getDownloads()

Returns an array of files that have been intercepted during the current task.
const downloads = await runner.getDownloads();
for (const file of downloads.files) {
    console.log(`Local: ${file.local_url}`);   // Internal address for script access
    console.log(`Public: ${file.public_url}`); // Publicly reachable via Relay
    console.log(`Type: ${file.mimeType}`);
    console.log(`Size: ${file.size} bytes`);
}
  • Returns: Promise<{ files: { local_url: string, public_url: string, size: number, mimeType: string, status: 'pending'|'done'|'failed', error?: string }[] }>

clearDownloads()

Clears the list of tracked downloads from the current task’s queue.
await runner.clearDownloads();

Real-time Events

enableWebsockets()

Enables interception of WebSocket messages.

disableWebsockets()

Disables interception of WebSocket messages.

getWebsocketMessages()

Returns a list of WebSocket messages intercepted during the current task.
const res = await runner.getWebsocketMessages();
for (const event of res.events) {
    console.log(`Socket [${event.type}]: ${event.message}`);
}
  • Returns: Promise<{ events: WebSocketEvent[] }>

clearWebsocketMessages()

Clears the list of tracked WebSocket messages.

enableSse()

Enables interception of Server-Sent Events (SSE).

disableSse()

Disables interception of Server-Sent Events (SSE).

getSseMessages()

Returns a list of SSE messages intercepted during the current task.
const res = await runner.getSseMessages();
for (const event of res.events) {
    console.log(`SSE [${event.eventName}]: ${event.message}`);
}
  • Returns: Promise<{ events: SseEvent[] }>

clearSseMessages()

Clears the list of tracked SSE messages.

Types Reference

RawItem

The format for data you want to save.
FieldTypeDescription
idstringRequired. Unique key for deduplication.
dataobjectRequired. The extraction payload (JSON).
changefreqstringOptional. Hints for re-crawling (always, daily, never, etc).

TaskDef

The definition for a new URL to crawl.
FieldTypeDescription
hrefstringRequired. The destination URL.
strategystringNAVIGATE (default) or REOPEN (fresh browser context).
ctxanyCustom data to pass to the script running on the child page.
changefreqstringFrequency for re-visiting this link.

TTask

The task object returned by getCurrentTask().
FieldTypeDescription
urlstringThe URL being crawled.
depthnumberThe current crawl depth.
statusstringqueued, processing, success, failed, etc.
taskDefTaskDefThe original definition used to create this task.

NetworkRequest

Captured network metadata.
FieldTypeDescription
urlstringRequest URL.
methodstringHTTP method (GET, POST, etc).
statusnumberHTTP response status code.
body()functionAsync function that returns the response body as a string.

WebSocketEvent

Captured WebSocket frame metadata.
FieldTypeDescription
urlstringThe URL of the WebSocket connection.
typestringsend or receive.
messagestringThe payload data of the frame.
opcodenumberThe WebSocket opcode (1=text, 2=binary).
timestampnumberBrowser timestamp of the event.

SseEvent

Captured Server-Sent Event metadata.
FieldTypeDescription
urlstringThe URL of the event stream.
eventNamestringThe name of the event.
eventIdstringThe ID of the event.
messagestringThe payload data.
timestampnumberBrowser timestamp of the event.