Firecrawl
Firecrawl automates web crawling and data extraction, enabling organizations to gather content, index sites, and gain insights from online sources at scale
Verdict
Common use cases
- Pull competitor pricing into weekly reports
- Monitor product changelog for customer updates
- Scrape job boards for candidate sourcing
- Extract contact details from directory sites
- Index help docs for internal search
Integration
- Vendor
- Firecrawl
- Category
- other
- Auth
- API_KEY
- Tools
- 7
- Composio slug
firecrawl
Tools
- Cancel a crawl job
Cancels an active or queued web crawl job using its id; attempting to cancel completed, failed, or previously canceled jobs will not change their state.
- Extract structured data
Extracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a json `schema` (one must be provided).
- Get the status of a crawl job
Retrieves the current status, progress, and details of a web crawl job, using the job id obtained when the crawl was initiated.
- Map multiple URLs
Maps a website by discovering urls from a starting base url, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.
- Scrape URL
Scrapes a publicly accessible url, optionally performing pre-scrape browser actions or extracting structured json using an llm, to retrieve content in specified formats.
- Search
Performs a web search for a query, scrapes content from the top search results using firecrawl, and returns details in specified formats.
- Start a web crawl
Initiates a firecrawl web crawl from a given url, applying various filtering and content extraction rules, and polls until the job is complete; ensure the url is accessible and any regex patterns for paths are valid.
Setup
Setup guide
- 11. Open your Switchy workspace and navigate to Settings > Integrations > MCP Servers. 2. Click 'Add MCP Server' and select Firecrawl from the directory. 3. Visit firecrawl.dev, sign up or log in, then copy your API key from the dashboard under Account > API Keys. 4. Paste the key into Switchy's configuration modal and click 'Connect'. 5. Open any Space and type '@Firecrawl scrape https://example.com' to test the connection — you should see formatted content appear in seconds. 6. For ongoing crawls, start a job with '@Firecrawl start a web crawl from https://docs.example.com', then check progress later with '@Firecrawl get the status of crawl job [id]'. 7. To extract structured data like pricing or contact info, use '@Firecrawl extract structured data from https://example.com/pricing' and describe the fields you need in plain English.
What teammates see: by default, memories from Firecrawl are scoped to the Space (PROJECT visibility) - you can mark any memory PRIVATE or share it ORG-wide.
Works well with
Top models
Compatibility data appears once enough Spaces have used this MCP together with a given model.
How Switchy teams use it
Starter prompts
Scrape Competitor Homepage
@Firecrawl scrape https://competitor.com and return the main content in markdown format, excluding navigation and footer elements.Open in a Space →
Crawl Documentation Site
@Firecrawl start a web crawl from https://docs.example.com, limit to 100 pages, and include only URLs under the /docs path.Open in a Space →
Extract Pricing Table
@Firecrawl extract structured data from https://example.com/pricing using this schema: plan name, monthly price, annual price, feature list.Open in a Space →
Search and Scrape Results
@Firecrawl search for 'best project management tools 2024' and scrape the top 5 results, returning title, URL, and a one-paragraph summary of each.Open in a Space →
Check Crawl Job Status
@Firecrawl get the status of crawl job [paste-job-id-here] and summarize how many pages have been processed and how many remain.Open in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
@firecrawl search for 'best practices for API rate limiting' and return the top 3 results with markdown content
Firecrawl performed a web search and scraped the top 3 results. The response includes: (1) A CloudFlare blog post discussing token bucket algorithms and sliding window approaches, (2) A Stripe engineering article on adaptive rate limiting with real-world implementation details, and (3) An AWS architecture guide covering throttling strategies for serverless APIs. Each result contains the page title, URL, and full markdown-formatted content extracted from the page body.
This example demonstrates Firecrawl's search-then-scrape workflow, which combines discovery with content extraction. Be aware that search results depend on external search engine availability, and scraping all top results can consume significant API credits quickly. Each scraped page counts toward your plan's usage limits.
@firecrawl start a crawl from https://docs.example.com with a limit of 50 pages, excluding any URLs matching '/archive/' or '/deprecated/'
Firecrawl initiated crawl job #crawl_abc123 and began discovering pages from the starting URL. The job completed after 8 minutes, successfully crawling 47 pages (3 were excluded by the regex filters). The response includes metadata for each discovered page: URL, title, content in markdown format, and any structured data extracted. Two pages returned 404 errors and were skipped. The full result set is available for download or can be processed page-by-page.
Crawling is Firecrawl's most powerful feature for bulk content extraction, but it's asynchronous and can take minutes to hours depending on site size. The tool polls automatically until completion, which may time out in some contexts. Crawl jobs consume credits per page scraped, so set appropriate limits to avoid unexpected usage.
@firecrawl extract structured data from https://news.ycombinator.com using this schema: for each post, get the title, points, author, and comment count
Firecrawl extracted structured data from the Hacker News homepage and returned a JSON array of 30 posts. Each entry contains: {"title": "Show HN: Open-source tool for...", "points": 247, "author": "username", "commentCount": 83}. The extraction used Firecrawl's LLM-powered parsing to interpret the page structure and map it to the requested schema, handling variations in post layout automatically. The full dataset is ready for analysis or storage.
Structured extraction is ideal when you need consistent data shapes from semi-structured HTML. It works best on pages with repeating patterns (lists, tables, feeds). Note that LLM-based extraction adds latency and cost compared to simple scraping, and accuracy depends on page complexity—always validate critical extractions before relying on them in production workflows.
Use-case deep-dives
When Firecrawl beats manual scraping for market intel
A 5-person growth team needs to track competitor pricing pages across 12 SaaS vendors every quarter. Firecrawl's structured extraction tool lets them define a schema once (product name, tier names, monthly price, annual discount) and run it against all 12 URLs in a single job. The crawl completes in under two minutes and returns clean JSON they pipe directly into a Google Sheet. This works because the target pages are public, change infrequently, and follow predictable HTML patterns. If competitors gate pricing behind login walls or use heavy client-side rendering, Firecrawl's scraper will miss data and the team needs a different approach. For quarterly cadence and public pages, this MCP turns a half-day manual task into a 10-minute automated run.
Crawling your own docs to feed a support bot
A 3-person support team maintains 200+ help articles across Notion and a legacy WordPress site. They want to index all content into a vector database so their AI assistant can answer customer questions with current links. Firecrawl's crawl job starts from the root URL, discovers all subpages via sitemap, and extracts markdown or plain text in one pass. The team runs this weekly via a cron job and uploads the output to Pinecone. This setup works because the docs are public, the sitemap is accurate, and 200 pages fit comfortably under Firecrawl's rate limits. If the knowledge base grows past 2,000 pages or requires authentication, the crawl becomes slow and the team should evaluate a dedicated docs-indexing service. For small-to-mid-size public doc sites, Firecrawl is the fastest path to a refreshed support context.
When Firecrawl's search tool replaces manual prospecting
A 2-person sales team at a niche B2B startup needs to build a list of 50 target accounts from a public industry directory. They use Firecrawl's search tool with a query like 'manufacturing companies ISO certified California' and scrape the top 20 results. The structured extraction pulls company name, location, and contact page URL into a CSV they import to HubSpot. This works because the directory pages are public, the search query is specific enough to surface relevant results, and 20 pages stay within a single API call. If the directory requires login, uses aggressive bot detection, or the team needs 500+ leads, Firecrawl will hit rate limits or return incomplete data. For one-time or monthly prospecting runs under 50 targets, this MCP turns a 4-hour manual task into a 15-minute script.
Frequently asked
What does the Firecrawl MCP do in Switchy?
It lets your AI agents scrape web pages, crawl entire sites, and extract structured data without writing code. Agents can search the web via Firecrawl's API, map a domain to discover URLs, or pull specific fields from pages using natural language prompts. All seven tools run server-side, so you're not managing headless browsers or parsing HTML yourself.
Do I need a Firecrawl account to use this MCP?
Yes. You'll need a Firecrawl API key, which means signing up at firecrawl.dev and choosing a plan. The MCP authenticates every request with that key. Free-tier limits apply to crawl depth and monthly scrapes, so check Firecrawl's pricing if your team plans heavy usage.
Can it scrape sites that require login or JavaScript rendering?
Firecrawl handles JavaScript rendering by default, so React or Vue apps work fine. Login-protected content is trickier—Firecrawl supports pre-scrape browser actions, but you'll need to configure cookies or auth headers in the scrape request. If the site blocks bots aggressively, results may be incomplete.
How is this different from just using Firecrawl's API directly?
The MCP wraps Firecrawl's API so your agents can call it conversationally—no curl commands or SDK boilerplate. Crawl jobs poll automatically until complete, and structured extraction happens inline. If you're already scripting Firecrawl in Python, the MCP won't add much. If you want agents to scrape on-demand during a chat, it's faster.
Who on the team should connect the Firecrawl MCP?
Whoever holds the Firecrawl API key and understands your scraping budget. Crawls can burn through credits quickly if agents start mapping large sites. One person should own the connection, monitor usage in Firecrawl's dashboard, and set guardrails in Switchy so agents don't launch expensive jobs without approval.