otherapi_key

Replicate

Replicate allows users to run AI models via a cloud API without managing infrastructure.

Verdict

Replicate gives your team on-demand access to thousands of open-source AI models — image generators, speech-to-text engines, video upscalers, LLMs — without managing infrastructure. @mention Replicate inside a Space to run predictions on any public model, upload files for processing, or inspect model schemas before invoking them. Designers can generate concept art mid-conversation; support teams can transcribe call recordings; product managers can prototype features that need specialized models. You'll need an API key from Replicate's dashboard, and predictions consume credits from your Replicate account. Models run asynchronously, so prompts should specify 'wait for completion' if you need immediate results.

Common use cases

  • Generate product mockups from text descriptions
  • Transcribe customer call recordings for analysis
  • Upscale low-res images for presentations
  • Prototype features using specialized AI models
  • Convert speech to text during interviews

Integration

Vendor
Replicate
Category
other
Auth
API_KEY
Tools
8
Composio slug
replicate

Tools

  • Create file

    Tool to create a file by uploading content. use when you need to upload and store a file for later reference.

  • Create Prediction

    Tool to create a prediction for a given deployment. use when you need to run model inference with specified inputs. use 'wait for' to wait until the prediction completes.

  • Get File Details

    Tool to get details of a file by its id. use when you need to inspect uploaded file information before further operations.

  • Get Model Details

    Tool to get details of a specific model by owner and name. use when you need model metadata (schema, urls) before running predictions.

  • Get Model README

    Tool to get the readme content for a model in markdown format. use after retrieving model details when you want to view its documentation.

  • List Files

    Tool to list all files created by the user or organization. use after authenticating to fetch files list.

  • List model collections

    Tool to list all collections of models. use when you need to retrieve available model collections.

  • List model examples

    Tool to list example predictions for a specific model. use when you want to retrieve author-provided illustrative examples after identifying the model.

Setup

Setup guide

  1. 11. Open your Switchy workspace settings and navigate to the MCP integrations panel. 2. Click 'Add Integration' and select Replicate from the list. 3. Log into your Replicate account at replicate.com, navigate to Account Settings, then API Tokens, and generate a new token. 4. Paste the token into Switchy's connection dialog and click 'Connect'. 5. Return to any Space and type '@Replicate list model collections' to verify the connection works. 6. To run a prediction, first use '@Replicate get model details' with an owner and model name to retrieve the input schema, then '@Replicate create prediction' with your deployment and inputs. 7. For file-based workflows, use '@Replicate create file' to upload content, then reference the returned file ID in subsequent predictions. 8. Always include 'wait for completion' in your prompt if you need the result immediately, since predictions run asynchronously by default.

What teammates see: by default, memories from Replicate are scoped to the Space (PROJECT visibility) - you can mark any memory PRIVATE or share it ORG-wide.

Works well with

Top models

Compatibility data appears once enough Spaces have used this MCP together with a given model.

How Switchy teams use it

Not enough Spaces yet to publish anonymised usage stats (we require ≥ 50 Spaces per week).

Starter prompts

Generate Concept Art

@Replicate get model details for stability-ai/sdxl, then create a prediction with the prompt 'a minimalist logo for a fintech startup, vector art, clean lines' and wait for completion
Open in a Space →

Transcribe Audio File

@Replicate create file from the attached audio, then get model details for openai/whisper, create a prediction using the file ID, and wait for the transcription
Open in a Space →

Upscale Product Photo

@Replicate create file from the attached image, get model details for nightmareai/real-esrgan, then create a prediction to upscale it 4x and wait for the result
Open in a Space →

Explore Model Options

@Replicate list model collections, then show me examples from the image generation collection so I can pick the right model for our campaign
Open in a Space →

Inspect Model Schema

@Replicate get model details for meta/llama-2-70b-chat, then show me the readme so I understand the input format and parameters
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

@Replicate what models are available in the image-generation collection?

Output

Based on Replicate's image-generation collection, here are several available models: Stable Diffusion XL (stability-ai/sdxl) for high-quality text-to-image generation, FLUX.1 (black-forest-labs/flux-schnell) for fast image synthesis, and Playground v2.5 (playgroundai/playground-v2.5-1024px-aesthetic) optimized for aesthetic outputs. Each model has different speed/quality trade-offs. SDXL typically takes 3-5 seconds per image, while FLUX can generate in under 2 seconds. All support standard parameters like prompt, negative_prompt, and guidance_scale.

Notes

This example shows collection browsing, useful when you're exploring what's available before committing to a specific model. The MCP returns metadata only—you'll need a separate create-prediction call to actually run inference. Collection contents change as Replicate adds or deprecates models.

Prompt

@Replicate run stability-ai/sdxl with prompt 'a serene mountain lake at sunset' and wait for the result

Output

Prediction created (ID: abc123xyz). Status: starting... processing... succeeded. The model generated a 1024x1024 image in 4.2 seconds. Output URL: https://replicate.delivery/pbxt/abc123.png (expires in 24 hours). The image shows a tranquil alpine lake with orange-pink sunset reflections on calm water, surrounded by pine-covered slopes. Inference cost: $0.0032. Note: Output URLs are temporary—download or reference the file within 24 hours if you need persistent storage.

Notes

This demonstrates the core workflow: creating a prediction and polling until completion. The 'wait for' instruction tells the MCP to block until the model finishes (typically 2-15 seconds for image models). Be aware that predictions consume credits, and output URLs expire. For production use, pipe results to your own storage immediately.

Prompt

@Replicate upload this product photo as a file, then run a background removal model on it and show me the result

Output

File uploaded successfully (ID: file_789def, 2.4 MB). Now running cjwbw/rembg on your uploaded image... Prediction completed in 1.8 seconds. The background removal model isolated the product cleanly, outputting a PNG with transparency. Original dimensions preserved (1920x1080). The model detected the main subject (a coffee mug) and removed the cluttered desk background. Output: https://replicate.delivery/pbxt/def456.png. This workflow is useful for e-commerce product prep—you can chain multiple models (upscaling, relighting) by referencing the same file ID.

Notes

This example chains file upload with model inference, showing how to reference uploaded assets across predictions. File IDs persist in your Replicate account until deleted, unlike temporary prediction outputs. Useful for workflows where you process the same input multiple ways or need to reference source material in prompts.

Use-case deep-dives

Design team image generation workflow

When Replicate fits a 3-person design team's asset pipeline

A small design team generating marketing visuals in Slack needs fast iteration on AI-generated images without managing GPU infrastructure. Replicate's MCP is the right call here: the Create Prediction tool runs Stable Diffusion or Flux models on-demand, and Create File uploads reference images for style transfer or inpainting. The team authenticates once with an API key, then any designer can prompt the model from Switchy without touching Replicate's UI. The threshold: if you're running more than 200 predictions per day, you'll want to cache model metadata with Get Model Details to avoid rate limits on the schema endpoint. For teams generating 10-30 assets per week, this setup cuts the feedback loop from minutes to seconds.

Customer support chatbot prototyping

Why Replicate works for one-off LLM experiments, not production bots

A 2-person support team wants to test whether a fine-tuned Llama model can answer tier-1 questions before committing to a hosted solution. Replicate's MCP lets them run predictions against community models using Get Model README to vet candidates and List Model Examples to sanity-check outputs. This is a good fit for the prototyping phase: you can test 5 different models in an afternoon without provisioning anything. The break point: once you're running more than 50 queries per day or need sub-200ms latency, Replicate's cold-start overhead (3-8 seconds for most models) makes it too slow for live chat. Use this MCP to validate the approach, then migrate to a dedicated endpoint if the experiment works.

Video content moderation pipeline

When Replicate's video models handle batch moderation at small scale

A 4-person content team reviews 20-40 user-submitted videos per day for a niche community platform. They need to flag NSFW content and detect brand logos before manual review. Replicate's MCP connects to video classification models that run asynchronously: Create Prediction with 'wait for' polls until the job finishes, then the team reviews flagged clips in Switchy. This works because the volume is low enough that per-prediction pricing (typically $0.01-0.10 per video) stays under $50/month. The constraint: if your queue grows past 100 videos per day, you'll hit Replicate's concurrency limits and need to batch predictions yourself. For teams moderating under 1,000 videos monthly, this MCP keeps the workflow in one place without building a custom pipeline.

Frequently asked

What does the Replicate MCP let me do in Switchy?

It runs AI model predictions directly from your Switchy workspace. You can upload files, trigger inference on Replicate's hosted models, wait for results, and inspect model metadata without leaving the conversation. Think of it as a programmatic interface to Replicate's model library, controlled by natural language.

Do I need a paid Replicate account to use this MCP?

You need a Replicate API key, which requires a Replicate account. Free-tier keys work, but you'll pay Replicate's per-prediction pricing based on the models you run. The MCP itself doesn't add extra cost — it just authenticates your requests using the key you provide in Switchy's settings.

Can the Replicate MCP fine-tune models or only run predictions?

It only runs predictions on existing models. Fine-tuning, model deployment, and training workflows aren't exposed through these tools. If you need to train a custom model, do that in Replicate's dashboard first, then use the MCP to run predictions against your deployed version.

Why use this instead of calling Replicate's API directly?

You skip writing integration code. The MCP handles auth, request formatting, and polling for async predictions. Your team can trigger model runs in plain English, share results in the same thread, and chain predictions with other MCPs. It's faster for ad-hoc tasks; use the API for production pipelines.

Who on the team should connect the Replicate MCP?

Whoever owns the Replicate API key. That person's usage quota and billing apply to all predictions the workspace triggers. If multiple people need access, consider a shared Replicate organization account and rotate the key in Switchy when ownership changes.

Data last verified 607 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.