Webscraper io
WebScraper.IO is a web scraping tool that makes web data extraction easy and accessible for everyone through a cloud-based API.
Verdict
Common use cases
- Monitor competitor pricing daily
- Extract job postings from career pages
- Track product availability across retailers
- Scrape event listings for market research
- Pull review data from multiple sources
Integration
- Vendor
- Webscraper io
- Category
- developer-tools
- Auth
- API_KEY
- Tools
- 10
- Composio slug
webscraper_io
Tools
- Create Sitemap
Tool to create a new sitemap configuration for web scraping. Use when you need to define a new scraping structure with start URLs and selector rules for data extraction from a website.
- Delete Sitemapdestructive
Tool to permanently delete a sitemap configuration from Web Scraper Cloud account. Use when you need to remove a sitemap that is no longer needed.
- Disable Sitemap Scheduler
Tool to disable automatic scheduling for a sitemap. Use when you need to stop automated scraping jobs from running on a schedule.
- Enable Sitemap Scheduler
Tool to enable and configure automatic scheduling for sitemap scraping jobs. Use when you need to automate scraping jobs to run at specific times using cron expressions with customizable request intervals, page load delays, driver types, an
- Get Account Info
Tool to retrieve account information including email and page credits. Use when you need to check account details or available credits.
- Get Scraping Jobs
Tool to retrieve all scraping jobs for the account with optional filtering and pagination. Use when you need to list scraping jobs, check job status, or filter jobs by sitemap or tag.
- Get Sitemap
Tool to retrieve a specific sitemap configuration by ID. Use when you need to inspect or reference an existing sitemap's configuration.
- Get Sitemaps
Tool to retrieve all sitemaps for the authenticated account with pagination support. Use when you need to list available sitemaps or filter them by tag. Supports optional pagination via page parameter and filtering by tag name.
- Get Sitemap Scheduler
Tool to retrieve scheduler configuration for a sitemap. Use when you need to check scheduling settings including cron configuration and proxy settings.
- Update Sitemap
Tool to update an existing sitemap configuration including structure, URLs, and selectors. Use when you need to modify sitemap settings.
Setup
Setup guide
- 11. Open your Switchy workspace and navigate to Settings > Integrations. 2. Find Webscraper.io in the MCP directory and click Connect. 3. Log in to your Webscraper.io account, go to API settings, and copy your API key. 4. Paste the key into Switchy's auth prompt and click Authorize. 5. Open any Space and type '@Webscraper.io get account info' to confirm the connection — you should see your email and remaining page credits. 6. To test scraping, create a sitemap with '@Webscraper.io create sitemap' (provide a start URL and selector rules), then launch a job and retrieve results. The MCP will prompt you for required parameters at each step.
What teammates see: by default, memories from Webscraper io are scoped to the Space (PROJECT visibility) - you can mark any memory PRIVATE or share it ORG-wide.
Works well with
Top models
Compatibility data appears once enough Spaces have used this MCP together with a given model.
How Switchy teams use it
Starter prompts
Check Credit Balance
@Webscraper.io get my account info and tell me how many page credits I have left.Open in a Space →
List Active Sitemaps
@Webscraper.io show me all my sitemaps and summarize what each one scrapes.Open in a Space →
Create Pricing Sitemap
@Webscraper.io create a sitemap called 'Competitor Pricing' that starts at https://example.com/products and extracts product names and prices from the listing.Open in a Space →
Schedule Daily Scrape
@Webscraper.io enable the scheduler for sitemap ID 12345 to run every day at 6 AM UTC.Open in a Space →
Review Recent Jobs
@Webscraper.io get all scraping jobs from the last 7 days and flag any that failed or are still running.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
@webscraper-io list all my sitemaps and show me which ones have scheduled jobs enabled
Found 4 sitemaps in your account. Two have active schedules: 'competitor-pricing' (runs daily at 6am UTC) and 'job-listings' (runs every 6 hours). The other two — 'product-catalog' and 'news-archive' — are manual-only. Your account shows 12,450 page credits remaining. This is an illustrative example of what the MCP would return after querying your Web Scraper Cloud account.
This example demonstrates read-only account inspection using the Get Sitemaps and Get Account Info tools. Requires a valid API key with read permissions. Useful for auditing scraping infrastructure before making changes or checking credit balance before launching new jobs.
@webscraper-io create a new sitemap called 'tech-blogs' that scrapes article titles and publish dates from example.com/blog starting at the homepage
Created sitemap 'tech-blogs' (ID: sm_7k2p9x). Configuration includes start URL at example.com/blog with selectors for article titles (CSS: .post-title) and publish dates (CSS: .post-date). The sitemap is saved but not yet scheduled — you'll need to manually trigger a scraping job or enable scheduling separately. This illustrative response shows what happens after the Create Sitemap tool executes successfully.
This write operation uses the Create Sitemap tool to define scraping structure. The MCP creates the configuration but does NOT automatically run a scrape or enable scheduling — those are separate actions. Be aware that creating sitemaps with overly broad selectors can waste page credits on irrelevant data.
@webscraper-io check the status of all scraping jobs from the last 48 hours and summarize any failures with error messages
Reviewed 23 scraping jobs from the past two days. Twenty completed successfully. Three failed: Job #8821 on 'competitor-pricing' hit a 403 Forbidden error (likely IP block), Job #8834 on 'job-listings' timed out after 15 minutes, and Job #8847 on 'news-archive' encountered malformed HTML that broke the selector. All failures occurred on sitemaps without rate-limiting configured. This illustrative summary combines data from Get Scraping Jobs with AI analysis to surface actionable issues.
This synthesis example pairs the Get Scraping Jobs tool with the AI's reasoning to diagnose patterns across multiple jobs. The MCP provides raw job data (status, timestamps, errors); the AI interprets it. Useful for operational monitoring, but note that Web Scraper Cloud's error messages vary in specificity depending on failure type.
Use-case deep-dives
When scheduled scraping beats manual price checks
A 6-person e-commerce team needs to track competitor pricing across 15 product categories twice daily. Webscraper.io's scheduler tools (Enable/Disable Sitemap Scheduler) let you set cron expressions to run scrapes automatically, feeding fresh data into your pricing model without manual kicks. The sitemap creation tool handles selector rules once, then the jobs run unattended. This works if your competitors' sites are stable and you're scraping under 10k pages per day—beyond that, you'll hit rate limits or need custom proxy rotation. If your pricing decisions happen in real-time (hourly adjustments), this MCP is too slow; you need a streaming solution. But for twice-daily batch updates that feed a dashboard or alert system, the scheduler removes the toil and the API key auth keeps it simple for a single ops person to manage.
When scraping directories scales your outbound pipeline
A 3-person sales team at a B2B SaaS startup scrapes industry directories and conference attendee lists to build lead lists. The Create Sitemap tool defines the extraction rules (company name, contact email, job title), and Get Scraping Jobs tracks progress across multiple directories. This MCP is the right call if you're hitting 5-10 directories per quarter and each has a consistent HTML structure—you define the sitemap once, run the job, export the data to your CRM. It falls apart if the directories require login (Webscraper.io doesn't handle auth flows well) or if the HTML changes weekly (you'll spend more time fixing selectors than scraping). For teams running outbound at scale, this replaces 8-10 hours of manual copy-paste per month and keeps your pipeline full without hiring a data analyst.
When batch scraping feeds your content calendar
A 4-person editorial team at a tech publication scrapes 20 industry blogs weekly to identify trending topics and source article ideas. The Get Sitemaps and Get Sitemap tools let you manage multiple scraping configs (one per blog), and the pagination support handles large result sets without timeouts. This works if you're aggregating text content (headlines, summaries, publish dates) from static sites—Webscraper.io's selectors handle most blog CMSs cleanly. It's not the right tool if you need full-text extraction from paywalled sites or if the blogs are JavaScript-heavy (the scraper doesn't execute JS by default). For a weekly research sprint where you're pulling 500-1000 articles into a spreadsheet to review in a Monday meeting, this MCP delivers the raw material in 10 minutes instead of 3 hours of manual browsing.
Frequently asked
What does the Webscraper.io MCP let me do in Switchy?
It connects your Webscraper.io Cloud account so you can create sitemaps, launch scraping jobs, and retrieve scraped data directly from Switchy's chat interface. You can check your page credit balance, schedule recurring scrapes with cron expressions, and manage existing sitemaps without switching to the Webscraper dashboard. Useful when you're building workflows that need fresh web data on demand.
Do I need a paid Webscraper.io account to use this MCP?
Yes. The MCP requires a Webscraper.io Cloud API key, which is only available on paid plans. Free Webscraper.io users get the browser extension but no API access. You'll paste your API key into Switchy's connection settings once, and the MCP handles authentication for all subsequent requests. No OAuth dance—just the key from your Webscraper account settings.
Can this MCP scrape a site directly, or do I still need to configure sitemaps?
You still configure sitemaps—the MCP doesn't bypass Webscraper's architecture. You use the Create Sitemap tool to define start URLs and CSS selectors, then trigger scraping jobs against that sitemap. The MCP automates the API calls but doesn't replace the sitemap-based scraping model. If you need zero-config scraping, consider a different tool; Webscraper.io trades setup time for precise, repeatable extraction.
How does this compare to running Webscraper.io's browser extension?
The browser extension runs scrapes on your local machine; the MCP talks to Webscraper Cloud, which runs jobs on their servers and bills you in page credits. The MCP is better for scheduled or team workflows because jobs run remotely and results live in your Cloud account. The extension is faster for one-off scrapes where you're iterating on selectors. You can use both—they share the same sitemap definitions.
Who on my team should connect the Webscraper.io MCP?
Whoever owns the Webscraper.io Cloud subscription and has the API key. That person's page credit balance applies to all scraping jobs triggered through Switchy. If multiple people need scraping access, share the Switchy workspace rather than creating separate Webscraper accounts—jobs and credits stay under one billing entity. Only one connection per workspace is needed; everyone in the workspace can use it.