otherapi_key

Scrape Do

Scrape.do is a web scraping API offering rotating residential, data-center, and mobile proxies with headless browser support and session management to bypass anti-bot protections (e.g., Cloudflare, Akamai) and extract data at scale in formats like JSON and HTML.

Verdict

Scrape Do gives your team a web scraping API that handles JavaScript rendering, proxy rotation, and anti-bot evasion without managing infrastructure. @mention it in a Space to fetch page content, monitor WebSocket traffic, set custom headers or cookies, and block unwanted resources during scrapes. It's most useful for teams that need to extract data from dynamic sites or competitor pages where simple HTTP requests fail. You'll need an API key from Scrape Do's dashboard, and each request counts against your account's concurrent limit and quota.

Common use cases

  • Extract pricing data from competitor sites
  • Monitor product availability on e-commerce pages
  • Scrape job listings for market research
  • Capture dynamic content loaded by JavaScript
  • Block ads and trackers during data collection

Integration

Vendor
Scrape Do
Category
other
Auth
API_KEY
Tools
19
Composio slug
scrape_do

Tools

  • Block specific URLs during scraping

    This tool allows users to block specific urls during the scraping process. it's particularly useful for blocking unwanted resources like analytics scripts, advertisements, or any other urls that might interfere with the scraping process or

  • Get Account Information

    Retrieves account information and usage statistics from scrape.do. this action makes a get request to the scrape.do info endpoint to fetch: - subscription status - concurrent request limits and usage - monthly request limits and remaining r

  • Get rendered page content

    This tool allows you to scrape web pages with javascript rendering enabled. it's particularly useful for scraping dynamic websites where content is loaded through javascript. the tool will wait for the javascript to execute and return the f

  • Monitor WebSocket requests using scrape.do

    This tool provides the ability to view websocket requests made by a webpage. it requires using render=true and returnjson=true parameters along with showwebsocketrequests=true to enable logging of websocket requests.

  • Scrape webpage using scrape.do

    A tool to scrape web pages using scrape.do's api service. it makes a basic get request to fetch the content of a target webpage while handling anti-bot protections and proxy rotation automatically.

  • Set Cookies for Scraping

    This tool allows users to set specific cookies for their scraping requests to a target website. it is useful for maintaining session states or authentication through cookies.

  • Set custom headers for scrape.do request

    A tool to send custom headers with scrape.do requests. this allows simulating specific browser behaviors or adding authentication headers by controlling all headers sent to the target website.

  • Set Custom Wait Time

    This tool sets the custom wait time in milliseconds after page load when using the render option in scrape.do. it is particularly useful for dealing with dynamic content to ensure that it is fully loaded before scraping, especially on javas

  • Set Device Type for Scraping

    This tool allows users to set the device type (desktop, mobile, or tablet) for making scraping requests. it is used to emulate different devices, which helps in testing responsive designs or fetching device-specific content.

  • Set Disable Redirection

    Controls the automatic redirection behavior of scrape.do requests. when enabled (disable redirection=true), prevents the automatic following of redirects during web scraping operations. this allows: - inspection of the redirect chain - capt

  • Set Pure Cookies Mode

    This tool enables getting the original set-cookie headers from target websites instead of the processed scrape.do-cookies format. when enabled, this parameter returns the original set-cookie headers from the target website rather than using

  • Set Regional Geolocation for Scraping

    This tool allows users to set a broader geographical targeting by specifying a region code instead of a specific country code. this is useful when you want to scrape content from an entire region rather than a specific country. note that th

  • Set Retry Timeout

    This tool allows users to set the maximum wait time (in milliseconds) before retrying a failed request in scrape.do. it requires a parameter 'retry timeout' (integer) which specifies the maximum time to wait before retrying, with a default

  • Set Scrape.do Super Mode

    The scrape do set super mode tool enables enhanced scraping by using residential and mobile proxies, bypassing blocks and restrictions associated with datacenter ips. when the 'super' parameter is set to true, it activates a mode that lever

  • Set Screenshot Capture for Scraping

    This tool enables the screenshot functionality for the scrape.do api, allowing users to capture a visual representation of the scraped webpage. when enabled, the api will return a screenshot of the rendered page along with the regular respo

  • Set Session ID for Sticky Sessions

    This tool implements the session id functionality for scrape.do to maintain a sticky session with the same proxy ip across multiple requests. it achieves this by adding a sessionid parameter to the query parameters of any scraping request,

  • Set Wait For Selector

    This action allows setting a css selector to wait for before considering the page load complete. it is particularly useful when scraping javascript-heavy pages to ensure that certain elements have loaded dynamically.

  • Set Wait Until Condition

    This tool sets the waituntil parameter for the scrape.do api, defining when the rendering should consider the page loaded during javascript execution. it is particularly useful for handling dynamic websites by specifying conditions such as

  • Use Scrape.do Proxy Mode

    This tool implements the proxy mode functionality of scrape.do, which allows routing requests through their proxy server. it provides an alternative way to access web scraping capabilities by handling complex javascript-rendered pages, geol

Setup

Setup guide

  1. 11. Sign up at scrape.do and copy your API key from the account dashboard. 2. In Switchy, open your workspace settings and navigate to the MCP Integrations tab. 3. Click 'Add Integration', search for Scrape Do, and select it. 4. Paste your API key into the authentication field and click 'Connect'. 5. Switchy will verify the key by fetching your account info and usage stats. 6. Once connected, open any Space and type '@Scrape Do' followed by your request to invoke a tool. 7. Test the connection by asking '@Scrape Do fetch the rendered content of example.com' — you should see HTML or JSON output within seconds. 8. Check your account limits with '@Scrape Do show my account info' to confirm quota and concurrent request availability before running large scraping jobs.

What teammates see: by default, memories from Scrape Do are scoped to the Space (PROJECT visibility) - you can mark any memory PRIVATE or share it ORG-wide.

Works well with

Top models

Compatibility data appears once enough Spaces have used this MCP together with a given model.

How Switchy teams use it

Not enough Spaces yet to publish anonymised usage stats (we require ≥ 50 Spaces per week).

Starter prompts

Fetch Rendered Page

@Scrape Do get the rendered content of https://example.com and return the full HTML after waiting 3 seconds for dynamic content to load.
Open in a Space →

Check Account Limits

@Scrape Do show my account information including subscription tier, concurrent request limits, and current usage stats.
Open in a Space →

Scrape with Custom Headers

@Scrape Do scrape https://api.example.com/data with custom headers User-Agent set to a mobile browser and Authorization set to Bearer token123.
Open in a Space →

Block Unwanted Resources

@Scrape Do scrape https://news.example.com and block all requests to google-analytics.com, doubleclick.net, and facebook.com during the process.
Open in a Space →

Monitor WebSocket Traffic

@Scrape Do scrape https://chat.example.com with rendering enabled and log all WebSocket requests made by the page during load.
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

@scrape-do fetch the main content from techcrunch.com/startups and wait for any lazy-loaded articles to appear

Output

I've scraped TechCrunch's startups section with JavaScript rendering enabled. The page returned 12 article headlines with summaries, author bylines, and publication timestamps. The custom wait time ensured that dynamically loaded content below the fold was captured. The response includes the full HTML structure plus extracted text content totaling approximately 8,500 characters. Note that this is an illustrative example—actual content varies by page state and may include additional elements like sidebar widgets or comment sections.

Notes

This example demonstrates Scrape Do's JavaScript rendering capability, essential for modern SPAs and dynamic sites. Requires the 'render=true' parameter and may consume more API credits than static scraping. Be aware of rate limits on your plan tier and ensure your use case complies with the target site's robots.txt and terms of service.

Prompt

@scrape-do block all Google Analytics and Facebook Pixel scripts while scraping example.com/products, then return just the product listings

Output

I've configured the scrape request to block URLs matching '*google-analytics.com*' and '*facebook.net/*/fbevents.js*' patterns. The scrape completed successfully, returning 24 product cards with titles, prices, and image URLs. By blocking tracking scripts, the page loaded 40% faster and the response payload was reduced by approximately 180KB. The filtered HTML contains only the core product catalog data without third-party analytics overhead. This is an illustrative example—actual blocking patterns and performance gains depend on the target site's architecture.

Notes

This showcases Scrape Do's URL blocking feature, useful for reducing noise and speeding up scrapes. Particularly valuable when you need clean data without tracking pixels or ads. Requires knowing which URL patterns to block—inspect the target site's network activity first. Blocking essential scripts may break dynamic content loading.

Prompt

@scrape-do monitor WebSocket connections on a live sports score site and summarize what real-time data streams are being used

Output

I've captured WebSocket activity from the target sports site with monitoring enabled. The page established 3 WebSocket connections: one to 'wss://scores.example.com/live' pushing score updates every 2 seconds, one to 'wss://chat.example.com' for fan commentary, and one to 'wss://ads.example.com' for ad refresh signals. The JSON response includes message payloads showing score objects with team IDs, current points, and game clock timestamps. This illustrative example demonstrates the kind of real-time data architecture you'd encounter—actual WebSocket schemas vary by site.

Notes

This example highlights Scrape Do's WebSocket monitoring capability, rare among scraping tools. Essential for understanding how sites push live updates (sports scores, stock tickers, chat). Requires 'render=true' and 'returnjson=true' parameters, which increase API cost. The captured WebSocket data can reveal API endpoints and data structures useful for more efficient scraping strategies.

Use-case deep-dives

Competitor pricing scrape for e-commerce

When Scrape Do beats manual price checks for small shops

A 3-person Shopify store tracks 40 competitor SKUs weekly to adjust pricing. Scrape Do's rendered-page tool handles the JavaScript-heavy product pages that break basic scrapers, and the custom wait time ensures lazy-loaded prices actually load before capture. The API key setup takes 10 minutes; the cookie-setting tool handles login walls on wholesale sites. This works until you hit 200+ SKUs or need sub-hourly updates—then the 19-tool overhead and per-request cost make a dedicated scraping service cheaper. If your price check is weekly and under 100 pages, Scrape Do inside Switchy keeps the data in your workspace without exporting CSVs. Set it up once, run it from any agent, and your pricing analyst sees fresh numbers in the same thread as your margin discussion.

Lead gen from directory sites

Scrape Do for one-time directory pulls, not ongoing CRM sync

A 5-person agency needs 500 local business contacts from a regional chamber directory that loads via JavaScript. Scrape Do's render mode and custom header tools bypass the bot detection, and the URL-blocking feature skips the ad trackers that slow the crawl. You get a clean JSON export in 20 minutes. The catch: this is a one-time or monthly pull, not a daily CRM feed. Scrape Do has no native webhook or database write, so you're pasting results into your CRM manually or writing a Zapier bridge. If you're pulling the same directory every week and feeding Salesforce, you need a different tool. If you're doing quarterly research pulls and discussing the leads in Switchy before outreach, Scrape Do is the right weight—no infrastructure, no ongoing cost between pulls.

Monitoring changelog pages for SaaS tools

When Scrape Do tracks vendor updates without RSS feeds

A 6-person product team tracks 12 API vendors' changelog pages for breaking changes. Half the vendors don't publish RSS, and their changelogs are JavaScript-rendered React apps. Scrape Do's rendered scrape plus WebSocket monitoring catches the updates, and the cookie tool handles the gated changelogs behind login. You run it Monday mornings; results land in a Switchy thread where the eng lead triages which changes need tickets. This breaks down if you need sub-daily checks or if the vendor rate-limits aggressively—Scrape Do's proxy rotation helps but isn't magic. For weekly monitoring of a dozen pages, it's faster than building a scraper or paying for a changelog aggregator. The 19-tool count is overkill for simple HTML scrapes, so only use this when JavaScript rendering is actually required.

Frequently asked

What does the Scrape Do MCP let me do in Switchy?

It lets your team scrape web pages — including JavaScript-heavy sites — without writing proxy rotation or anti-bot logic. You can block specific URLs (like analytics scripts), set custom headers and cookies, monitor WebSocket traffic, and control wait times for dynamic content. All 19 tools route through Scrape Do's API, so you avoid IP bans and CAPTCHA walls that break DIY scrapers.

Do I need a paid Scrape Do account to use this MCP?

Yes. The MCP authenticates with an API key tied to your Scrape Do subscription. Free trials exist, but concurrent-request limits and usage quotas depend on your plan tier. The Get Account Information tool surfaces your current limits inside Switchy, so you know when you're about to hit the cap before a scrape fails mid-job.

Can it scrape sites that require login or session cookies?

Yes, via the Set Cookies for Scraping tool. You pass cookie key-value pairs with your request, and Scrape Do includes them when fetching the target page. This works for session auth, but OAuth flows that redirect to a login page won't work — you'd need to grab the session cookie from a real browser first, then feed it to the MCP.

Why use this instead of calling Scrape Do's API directly?

The MCP wraps Scrape Do's endpoints in natural-language tools, so your team can say "scrape this product page and wait 3 seconds for reviews to load" instead of writing curl commands with query-string parameters. It also surfaces account limits and WebSocket logs in the same interface where you're already prompting the AI, cutting context-switching.

Who on my team should connect the Scrape Do MCP?

Whoever owns the Scrape Do API key and understands your scraping budget. Because all requests count against your plan's concurrent-request limit, you want one person monitoring usage via the Get Account Information tool. If multiple teammates trigger scrapes simultaneously, you'll hit the cap faster and queue requests, slowing everyone down.

Data last verified 607 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.