Crawlstack provides a built-in mechanism to capture file downloads during your crawl and save them seamlessly to your storage.Documentation Index
Fetch the complete documentation index at: https://docs.crawlstack.dev/llms.txt
Use this file to discover all available pages before exploring further.
How it works
When scraping modern websites, files are typically downloaded in one of two ways:- Network Downloads: The file is hosted on a server (e.g., clicking a link to
https://example.com/report.pdf). - Client-Side Downloads: The file is generated instantly in the browser’s memory using JavaScript (e.g., exporting a CSV from a data table via a
blob:URI).
URL.createObjectURL API to extract data directly from memory for client-side files, preventing the browser’s default “Save As” popup entirely.
Enabling Downloads
To capture files during a run, simply call therunner.enableDownloads() method in your script before triggering the download action.
URL Strategy
Crawlstack provides two types of URLs for every intercepted file:- local_url: Pointing to
https://opfs-local.internal/. This is the fastest way to access the file content within your extraction script usingrunner.fetch(). It bypasses the network entirely. - public_url: Pointing to your configured Relay Server. Use this when you need to share the file link with an external system (e.g., via a webhook).
Storage Location
Depending on your configuration, captured files will be saved in one of two places. The file paths will always follow this format:[crawlerId]/[runId]/[filename].
OPFS (Local Storage)
By default, files are saved locally inside the browser using the Origin Private File System (OPFS). They are served to your scripts via a high-performance, stealthy streaming bridge.S3 / R2 / MinIO (Cloud Storage)
If you have configured S3 credentials in the Settings dashboard, Crawlstack will automatically stream the captured files directly to your cloud bucket. In this case, bothlocal_url and public_url will point to the direct S3 link.
Logging
Whenever a file is successfully intercepted and saved, you will see a log entry in your Run’s backend logs:
[Downloader] Client file saved to https://my-bucket.s3.amazonaws.com/...