Ollama
Run large language models locally or in the cloud with Ollama
Verdict
Common use cases
- Draft emails without cloud API calls
- Summarize meeting notes on airgapped networks
- Prototype prompts before production deployment
- Compare responses across local model versions
- Generate code snippets in regulated environments
Integration
- Vendor
- Ollama
- Category
- other
- Auth
- API_KEY
- Tools
- 8
- Composio slug
ollama
Tools
- Chat with Ollama model
Tool to send a chat message with conversation history to Ollama. Use when you need to have a multi-turn conversation with an LLM model.
- Generate Text with Ollama
Tool to generate text responses from Ollama models with optional raw mode. Use raw=true to bypass prompt templating when you need full control over the prompt for debugging or custom processing. Note that raw mode will not return a context.
- Get Ollama Version
Tool to get the version of Ollama running locally. Use to check which version of Ollama is currently installed.
- List Models
Tool to list all available Ollama models and their details. Use when you need to fetch installed models with metadata including name, size, last modified timestamp, digest, and format information.
- List Models (OpenAI Compatible)
Tool to list available models using OpenAI-compatible API format. Use when you need to retrieve locally available Ollama models with metadata following OpenAI's model list format.
- OpenAI-Compatible Chat Completion
Tool to create OpenAI-compatible chat completions using Ollama models. Use when you need conversational AI responses with OpenAI API format compatibility.
- OpenAI-Compatible Text Completion
Tool to create OpenAI-compatible text completions using Ollama models. Use when you need text generation with OpenAI API format compatibility beyond chat-based interactions.
- Show Model Information
Tool to show comprehensive information about an Ollama model. Use when you need to retrieve model details, parameters, template, license, or system prompt.
Setup
Setup guide
- 11. Install Ollama on a local machine or server and pull at least one model (e.g., `ollama pull llama2`). 2. Start the Ollama service so it listens on a network-accessible address if your team works remotely, or keep it localhost if everyone's on the same machine. 3. In Switchy, navigate to Settings > Integrations > Add Integration and select Ollama. 4. Enter the base URL where Ollama is running (default `http://localhost:11434` for local setups, or your server's IP and port). 5. Paste an API key if you've configured authentication on your Ollama instance; leave blank if running without auth. 6. Click Connect and wait for Switchy to verify it can reach the Ollama API. 7. Open any Space, type `@Ollama list available models` and confirm you see your installed models in the response. 8. If the connection fails, check firewall rules, confirm Ollama is running (`ollama serve`), and verify the URL matches your setup.
What teammates see: by default, memories from Ollama are scoped to the Space (PROJECT visibility) - you can mark any memory PRIVATE or share it ORG-wide.
Works well with
Top models
Compatibility data appears once enough Spaces have used this MCP together with a given model.
How Switchy teams use it
Starter prompts
List Installed Models
@Ollama list all available models with their sizes and last modified datesOpen in a Space →
Draft a Project Brief
@Ollama generate a one-page project brief for a customer onboarding redesign, including goals, timeline, and success metricsOpen in a Space →
Explain Code Snippet
@Ollama explain this Python function step-by-step: def merge_sort(arr): if len(arr) <= 1: return arr mid = len(arr) // 2; left = merge_sort(arr[:mid]); right = merge_sort(arr[mid:]); return merge(left, right)Open in a Space →
Check Ollama Version
@Ollama what version of Ollama is currently running on this instanceOpen in a Space →
Multi-Turn Conversation
@Ollama start a conversation about database indexing strategies, then ask follow-up questions about B-tree vs hash indexesOpen in a Space →
Example outputs
Illustrative - representative of the model's voice and quality, not literal recordings.
@ollama list all the models I have installed locally and show me which ones are best for code generation
Based on your local Ollama installation, you have 5 models available: **Code-optimized models:** - codellama:7b (3.8GB, modified 2 days ago) — Best for code completion and explanation - deepseek-coder:6.7b (3.8GB, modified 1 week ago) — Strong at multi-language code generation **General models:** - llama2:7b (3.8GB, modified 3 days ago) - mistral:7b (4.1GB, modified 1 week ago) - phi:2.7b (1.6GB, modified 2 weeks ago) For code generation tasks, I'd recommend codellama:7b or deepseek-coder:6.7b. Both are specifically trained on code and will give you better results than the general-purpose models.
This example demonstrates the MCP's ability to query your local Ollama instance and combine model metadata with reasoning. The MCP requires Ollama to be running locally (typically on localhost:11434). The actual model list depends entirely on what you've pulled via `ollama pull`, so responses will vary by user.
@ollama use the mistral model to generate a technical explanation of how JWT tokens work, keeping it under 100 words
I've generated this explanation using your local mistral:7b model: "JWT (JSON Web Token) is a compact, URL-safe authentication method consisting of three Base64-encoded parts: header (algorithm and token type), payload (claims like user ID and expiration), and signature (cryptographic verification). The server creates a JWT by signing the header and payload with a secret key. Clients include this token in subsequent requests. The server verifies the signature without database lookups, enabling stateless authentication. JWTs expire automatically and can't be revoked server-side without additional infrastructure." Generated in 2.3 seconds using 94 tokens.
This showcases direct text generation against a specific local model. The MCP sends your prompt to Ollama and returns the model's output. Response quality and speed depend on your hardware and which model you select. Larger models (13B+) produce better results but require more RAM and take longer to generate.
@ollama start a conversation with llama2 where I'm debugging a Python error, then ask it to explain why my list comprehension is failing
I've initiated a multi-turn chat session with your local llama2:7b model. Here's the conversation: **You:** I'm getting a TypeError when running this list comprehension: `squares = [x**2 for x in range(10) if x > 5]` **llama2:** That list comprehension syntax looks correct and shouldn't produce a TypeError. Can you share the exact error message? The code should create a list [36, 49, 64, 81] without issues. The TypeError likely comes from surrounding code — perhaps you're trying to use `squares` in an incompatible operation afterward? The conversation context is maintained, so you can continue debugging by sending follow-up messages through the same chat session.
This demonstrates the chat tool's ability to maintain conversation history across multiple turns, useful for iterative problem-solving. The MCP manages the message history locally, but context windows vary by model (typically 2048-4096 tokens for 7B models). Long conversations may truncate early messages, and the session resets if Ollama restarts.
Use-case deep-dives
When Ollama wins for rapid prototyping without API costs
A 3-person startup building a chatbot prototype needs to test 6 different prompt strategies across 4 model sizes before committing to a cloud provider. Ollama is the right call here: the team runs models locally, iterates on prompts with the raw-mode generate tool, and compares outputs from llama3, mistral, and gemma without burning through API credits. The chat-with-history tool lets them simulate multi-turn conversations, and the show-model-information tool surfaces which template each model uses. This works until the prototype needs production scale or the team grows past 5 people—at that point, the local-only constraint becomes a collaboration bottleneck. If you're pre-revenue and testing model behavior daily, Ollama keeps your burn rate low while you find product-market fit.
When local LLMs beat cloud latency for field teams
A 12-person field service company has technicians drafting customer emails from job sites with spotty internet. The support lead sets up Ollama on each laptop with a 7B parameter model and connects it through Switchy. Technicians use the OpenAI-compatible chat completion tool to draft responses to common issues—warranty claims, troubleshooting steps, follow-up scheduling—without waiting for cloud API calls or risking timeouts. The list-models tool lets the support lead verify which version each laptop is running during quarterly updates. This setup breaks down if the team needs real-time knowledge from a central CRM or if response quality demands a 70B+ model that won't fit on laptop hardware. If your team works offline more than 30% of the time and can live with smaller models, Ollama removes the connectivity dependency.
When on-premise inference is the only legal option
A 6-person legal ops team at a healthcare company needs to summarize patient intake forms but can't send PHI to third-party APIs under their BAA. They run Ollama on an on-premise server with a HIPAA-compliant model, then route requests through Switchy's workspace. The generate-text tool processes intake PDFs into structured summaries, and the get-version tool helps IT audit which Ollama build is deployed during quarterly compliance reviews. The team stays under the 8-tool limit because they only need generation and model introspection—no embeddings, no fine-tuning. This works as long as the server can handle the team's request volume and the model quality meets their accuracy bar. If you're in a regulated vertical where data residency isn't negotiable, Ollama is often the only MCP that keeps you compliant without custom infrastructure.
Frequently asked
What does the Ollama MCP do in Switchy?
It connects Switchy to your local Ollama instance so AI agents can generate text, hold multi-turn conversations, and query model metadata without leaving the workspace. You run models on your own hardware; Switchy just routes the requests. Useful when you want privacy or need to use models not available through cloud providers.
Do I need an Ollama account or just the API key?
You need Ollama installed and running locally on your machine or a server your team can reach. The API key authenticates Switchy to that instance. No Ollama cloud account exists — it's self-hosted software. Point the MCP at your instance's URL and paste the key you configured in Ollama's settings.
Can the Ollama MCP fine-tune or train models?
No. It lists installed models, generates text, and retrieves model info, but it doesn't create, delete, or retrain models. You handle model management through Ollama's CLI or web UI. The MCP is read-and-inference only. If you need to pull a new model, do that outside Switchy first.
Why use this instead of calling Ollama's API directly?
Switchy agents can invoke Ollama tools mid-workflow without you writing integration code. The MCP also exposes OpenAI-compatible endpoints, so agents built for GPT can swap in local models with zero prompt changes. You get observability and version control on inference calls that raw API usage doesn't provide.
Who on the team should connect the Ollama MCP?
Whoever runs the Ollama instance and controls its API key. If Ollama lives on a shared dev server, that person adds the connection. Each Switchy workspace can point to one Ollama instance; multiple team members inherit access once it's connected. Usage doesn't count against Switchy seat limits.