Crawlstack is designed to be Agent-Native. By connecting Gemini CLI to your Relay Server via the Model Context Protocol (MCP), you can delegate the entire lifecycle of developing, testing, and debugging crawlers to an AI agent.Documentation Index
Fetch the complete documentation index at: https://docs.crawlstack.dev/llms.txt
Use this file to discover all available pages before exploring further.
1. Setup the Connection
First, ensure your Relay Server is running and accessible. By default, it exposes an MCP endpoint athttp://localhost:3002/mcp.
Using the CLI (Recommended)
Run the following command in your terminal to automatically configure the server:Manual Configuration
Alternatively, you can manually add the server to your~/.gemini/settings.json file:
/mcp list inside Gemini CLI.
2. Available Agent Tools
Once connected, the agent has access to a specialized toolkit for browser automation:list_nodes: Find connected browser instances.extension_get_cluster_state: Get a global view of all node states (running tasks, metrics) in a specific tenant cluster.extension_list_crawlers: See existing crawlers on a specific node.extension_upsert_crawler: Create or edit extraction scripts.extension_trigger_run: Execute a crawler.extension_get_run_logs: Read execution logs to fix bugs.extension_preview_script: Run a script instantly. Supportskeep_alive: trueto leave the tab open for inspection.extension_take_tab_screenshot: Capture any tab by ID (useful during previews).extension_close_tab: Manually cleanup kept-alive tabs.
3. Interactive Debugging Pattern
Agents can iterate faster by keeping tabs alive between requests. This allows for a “Playground” workflow:- Start Preview: Call
extension_preview_scriptwithkeep_alive: trueand some initial code. - Inspect: If it fails, call
extension_take_tab_screenshotwith the returnedtabId. - Refine: Call
extension_preview_scriptagain on the same URL (or a different one) without waiting for a full reload if needed. - Cleanup: Call
extension_close_tabwhen finished.
4. Advanced Data Extraction
Agents can handle binary files and complex streams without leaving the browser environment:- XLSX/PDF Parsing: Use
await import()or script injection to load libraries like SheetJS directly in the tab. - Local File Access: Use standard `await fetch(url)` to read intercepted downloads from the extension’s protected storage (mapped to `https://opfs-local.internal/\`).
- Dynamic Waiting: Always prefer
await runner.waitFor()over hard sleeps to ensure robustness across different network speeds.