otherapi_key

Firecrawl

Firecrawl automates web crawling and data extraction, enabling organizations to gather content, index sites, and gain insights from online sources at scale

Verdict

Firecrawl turns any public website into clean, AI-ready text or structured data. @mention it in a Space to scrape single pages, crawl entire sites, or extract specific fields from search results — all without wrestling with HTML. Teams use it to pull competitor pricing tables, monitor changelog pages, or feed documentation into context. The MCP handles rate limits and retries, but you'll need an API key from Firecrawl's dashboard. Jobs that crawl hundreds of pages can take minutes; check status with a follow-up prompt or let the MCP poll automatically.

Common use cases

  • Pull competitor pricing into weekly reports
  • Monitor product changelog for customer updates
  • Scrape job boards for candidate sourcing
  • Extract contact details from directory sites
  • Index help docs for internal search

Integration

Vendor
Firecrawl
Category
other
Auth
API_KEY
Tools
7
Composio slug
firecrawl

Tools

  • Cancel a crawl job

    Cancels an active or queued web crawl job using its id; attempting to cancel completed, failed, or previously canceled jobs will not change their state.

  • Extract structured data

    Extracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a json `schema` (one must be provided).

  • Get the status of a crawl job

    Retrieves the current status, progress, and details of a web crawl job, using the job id obtained when the crawl was initiated.

  • Map multiple URLs

    Maps a website by discovering urls from a starting base url, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.

  • Scrape URL

    Scrapes a publicly accessible url, optionally performing pre-scrape browser actions or extracting structured json using an llm, to retrieve content in specified formats.

  • Search

    Performs a web search for a query, scrapes content from the top search results using firecrawl, and returns details in specified formats.

  • Start a web crawl

    Initiates a firecrawl web crawl from a given url, applying various filtering and content extraction rules, and polls until the job is complete; ensure the url is accessible and any regex patterns for paths are valid.

Setup

Setup guide

  1. 11. Open your Switchy workspace and navigate to Settings > Integrations > MCP Servers. 2. Click 'Add MCP Server' and select Firecrawl from the directory. 3. Visit firecrawl.dev, sign up or log in, then copy your API key from the dashboard under Account > API Keys. 4. Paste the key into Switchy's configuration modal and click 'Connect'. 5. Open any Space and type '@Firecrawl scrape https://example.com' to test the connection — you should see formatted content appear in seconds. 6. For ongoing crawls, start a job with '@Firecrawl start a web crawl from https://docs.example.com', then check progress later with '@Firecrawl get the status of crawl job [id]'. 7. To extract structured data like pricing or contact info, use '@Firecrawl extract structured data from https://example.com/pricing' and describe the fields you need in plain English.

What teammates see: by default, memories from Firecrawl are scoped to the Space (PROJECT visibility) - you can mark any memory PRIVATE or share it ORG-wide.

Works well with

Top models

Compatibility data appears once enough Spaces have used this MCP together with a given model.

How Switchy teams use it

Not enough Spaces yet to publish anonymised usage stats (we require ≥ 50 Spaces per week).

Starter prompts

Scrape Competitor Homepage

@Firecrawl scrape https://competitor.com and return the main content in markdown format, excluding navigation and footer elements.
Open in a Space →

Crawl Documentation Site

@Firecrawl start a web crawl from https://docs.example.com, limit to 100 pages, and include only URLs under the /docs path.
Open in a Space →

Extract Pricing Table

@Firecrawl extract structured data from https://example.com/pricing using this schema: plan name, monthly price, annual price, feature list.
Open in a Space →

Search and Scrape Results

@Firecrawl search for 'best project management tools 2024' and scrape the top 5 results, returning title, URL, and a one-paragraph summary of each.
Open in a Space →

Check Crawl Job Status

@Firecrawl get the status of crawl job [paste-job-id-here] and summarize how many pages have been processed and how many remain.
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

@firecrawl search for 'best practices for API rate limiting' and return the top 3 results with markdown content

Output

Firecrawl performed a web search and scraped the top 3 results. The response includes: (1) A CloudFlare blog post discussing token bucket algorithms and sliding window approaches, (2) A Stripe engineering article on adaptive rate limiting with real-world implementation details, and (3) An AWS architecture guide covering throttling strategies for serverless APIs. Each result contains the page title, URL, and full markdown-formatted content extracted from the page body.

Notes

This example demonstrates Firecrawl's search-then-scrape workflow, which combines discovery with content extraction. Be aware that search results depend on external search engine availability, and scraping all top results can consume significant API credits quickly. Each scraped page counts toward your plan's usage limits.

Prompt

@firecrawl start a crawl from https://docs.example.com with a limit of 50 pages, excluding any URLs matching '/archive/' or '/deprecated/'

Output

Firecrawl initiated crawl job #crawl_abc123 and began discovering pages from the starting URL. The job completed after 8 minutes, successfully crawling 47 pages (3 were excluded by the regex filters). The response includes metadata for each discovered page: URL, title, content in markdown format, and any structured data extracted. Two pages returned 404 errors and were skipped. The full result set is available for download or can be processed page-by-page.

Notes

Crawling is Firecrawl's most powerful feature for bulk content extraction, but it's asynchronous and can take minutes to hours depending on site size. The tool polls automatically until completion, which may time out in some contexts. Crawl jobs consume credits per page scraped, so set appropriate limits to avoid unexpected usage.

Prompt

@firecrawl extract structured data from https://news.ycombinator.com using this schema: for each post, get the title, points, author, and comment count

Output

Firecrawl extracted structured data from the Hacker News homepage and returned a JSON array of 30 posts. Each entry contains: {"title": "Show HN: Open-source tool for...", "points": 247, "author": "username", "commentCount": 83}. The extraction used Firecrawl's LLM-powered parsing to interpret the page structure and map it to the requested schema, handling variations in post layout automatically. The full dataset is ready for analysis or storage.

Notes

Structured extraction is ideal when you need consistent data shapes from semi-structured HTML. It works best on pages with repeating patterns (lists, tables, feeds). Note that LLM-based extraction adds latency and cost compared to simple scraping, and accuracy depends on page complexity—always validate critical extractions before relying on them in production workflows.

Use-case deep-dives

Competitor pricing research sprint

When Firecrawl beats manual scraping for market intel

A 5-person growth team needs to track competitor pricing pages across 12 SaaS vendors every quarter. Firecrawl's structured extraction tool lets them define a schema once (product name, tier names, monthly price, annual discount) and run it against all 12 URLs in a single job. The crawl completes in under two minutes and returns clean JSON they pipe directly into a Google Sheet. This works because the target pages are public, change infrequently, and follow predictable HTML patterns. If competitors gate pricing behind login walls or use heavy client-side rendering, Firecrawl's scraper will miss data and the team needs a different approach. For quarterly cadence and public pages, this MCP turns a half-day manual task into a 10-minute automated run.

Customer support knowledge base refresh

Crawling your own docs to feed a support bot

A 3-person support team maintains 200+ help articles across Notion and a legacy WordPress site. They want to index all content into a vector database so their AI assistant can answer customer questions with current links. Firecrawl's crawl job starts from the root URL, discovers all subpages via sitemap, and extracts markdown or plain text in one pass. The team runs this weekly via a cron job and uploads the output to Pinecone. This setup works because the docs are public, the sitemap is accurate, and 200 pages fit comfortably under Firecrawl's rate limits. If the knowledge base grows past 2,000 pages or requires authentication, the crawl becomes slow and the team should evaluate a dedicated docs-indexing service. For small-to-mid-size public doc sites, Firecrawl is the fastest path to a refreshed support context.

Lead gen from industry directory scraping

When Firecrawl's search tool replaces manual prospecting

A 2-person sales team at a niche B2B startup needs to build a list of 50 target accounts from a public industry directory. They use Firecrawl's search tool with a query like 'manufacturing companies ISO certified California' and scrape the top 20 results. The structured extraction pulls company name, location, and contact page URL into a CSV they import to HubSpot. This works because the directory pages are public, the search query is specific enough to surface relevant results, and 20 pages stay within a single API call. If the directory requires login, uses aggressive bot detection, or the team needs 500+ leads, Firecrawl will hit rate limits or return incomplete data. For one-time or monthly prospecting runs under 50 targets, this MCP turns a 4-hour manual task into a 15-minute script.

Frequently asked

What does the Firecrawl MCP do in Switchy?

It lets your AI agents scrape web pages, crawl entire sites, and extract structured data without writing code. Agents can search the web via Firecrawl's API, map a domain to discover URLs, or pull specific fields from pages using natural language prompts. All seven tools run server-side, so you're not managing headless browsers or parsing HTML yourself.

Do I need a Firecrawl account to use this MCP?

Yes. You'll need a Firecrawl API key, which means signing up at firecrawl.dev and choosing a plan. The MCP authenticates every request with that key. Free-tier limits apply to crawl depth and monthly scrapes, so check Firecrawl's pricing if your team plans heavy usage.

Can it scrape sites that require login or JavaScript rendering?

Firecrawl handles JavaScript rendering by default, so React or Vue apps work fine. Login-protected content is trickier—Firecrawl supports pre-scrape browser actions, but you'll need to configure cookies or auth headers in the scrape request. If the site blocks bots aggressively, results may be incomplete.

How is this different from just using Firecrawl's API directly?

The MCP wraps Firecrawl's API so your agents can call it conversationally—no curl commands or SDK boilerplate. Crawl jobs poll automatically until complete, and structured extraction happens inline. If you're already scripting Firecrawl in Python, the MCP won't add much. If you want agents to scrape on-demand during a chat, it's faster.

Who on the team should connect the Firecrawl MCP?

Whoever holds the Firecrawl API key and understands your scraping budget. Crawls can burn through credits quickly if agents start mapping large sites. One person should own the connection, monitor usage in Firecrawl's dashboard, and set guardrails in Switchy so agents don't launch expensive jobs without approval.

Data last verified 607 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.