Every AI model, in one place.
Pricing, benchmarks, provider latency, and how teams actually use each one.
About llm models
- Newz-aiZ.ai: GLM 5.2
GLM-5.2 is Z.ai’s flagship model for the era of long-horizon tasks. With a truly usable 1M-token context window, it can handle project-level engineering context, execute long-running tasks more reliably, follow...
Language262k ctx$1.40/M - NewopenrouterOpenRouter: Fusion
Fusion turns your prompt into a small multi-model deliberation. A panel of expert models (see below) analyzes your prompt in parallel with web search and web fetch enabled, then a...
Language1000k ctxFree tier - NewmoonshotaiMoonshotAI: Kimi K2.7 Code
MoonshotAI: Kimi K2.7 Code is a coding-focused model in Moonshot AI's Kimi K2 family, built to complete end-to-end programming tasks reliably over long contexts. It uses a native multimodal mixture-of-experts...
Language262k ctx$0.74/M - NewanthropicAnthropic: Claude Fable Latest
This model always redirects to the latest model in the Claude Fable family.
Language1000k ctx$10.00/M - NewanthropicAnthropic: Claude Fable 5
Claude Fable 5 is a Mythos-class model from Anthropic, built for autonomous knowledge work and coding. It supports text, image, and file inputs with text output, with reasoning support and...
Language1000k ctx$10.00/M - NewNAnex-agiNex AGI: Nex-N2-Pro (free)
Nex-N2-Pro is an agentic mixture-of-experts model from Nex AGI, with 17B active parameters out of 397B total. Built on the Qwen3.5 architecture, it accepts text and image input and produces...
Language262k ctx$0.00/M - NewnvidiaNVIDIA: Nemotron 3.5 Content Safety (free)
NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting...
Language128k ctx$0.00/M - NewnvidiaNVIDIA: Nemotron 3 Ultra
NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...
Language262k ctx$0.50/M - NewnvidiaNVIDIA: Nemotron 3 Ultra (free)
NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...
Language1000k ctx$0.00/M - NewqwenQwen: Qwen3.7 Plus
Qwen3.7-Plus is a cost-effective model in Alibaba's Qwen3.7 series. It supports text and image input with text output, building on the series' text capabilities with a comprehensive upgrade to its...
Language1000k ctx$0.32/M - NewMminimaxMiniMax: MiniMax M3
MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding,...
Language524k ctx$0.30/M - NewSstepfunStepFun: Step 3.7 Flash
Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters...
Language256k ctx$0.20/M - NewanthropicAnthropic: Claude Opus 4.8 (Fast)
Fast-mode variant of [Opus 4.8](/anthropic/claude-opus-4.8) - identical capabilities with higher output speed at 2x pricing relative to regular Opus 4.8. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode
Language1000k ctx$10.00/M - NewanthropicAnthropic: Claude Opus 4.8
Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family. It supports text, image, and file inputs with text output, with reasoning support and a 1M-token...
Language1000k ctx$5.00/M - NewqwenQwen: Qwen3.7 Max
Qwen3.7-Max is the flagship model in Alibaba's Qwen3.7 series. It supports text input and output and is designed for agent-centric workloads, with particular strengths in coding, office and productivity tasks,...
Language1000k ctx$1.25/M - Newx-aixAI: Grok Build 0.1
Grok Build 0.1 is xAI’s fast coding model trained specifically for agentic software engineering workflows. It supports text and image inputs with text output, and is optimized for interactive coding...
Language256k ctx$1.00/M - NewgoogleGoogle: Gemini 3.5 Flash
Gemini 3.5 Flash is Google's high-efficiency multimodal model, bringing near-Pro level coding and reasoning at Flash-tier cost and speed. It is highly optimized for coding proficiency and parallel agentic execution...
Language1049k ctx$1.50/M - NewanthropicAnthropic: Claude Opus 4.7 (Fast)
Fast-mode variant of [Opus 4.7](/anthropic/claude-opus-4.7) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode
Language1000k ctx$30.00/M - NewPperceptronPerceptron: Perceptron Mk1
Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for video and embodied reasoning.** It accepts image and video inputs paired with natural language queries, and produces detailed visual understanding...
Language33k ctx$0.15/M - NewIinclusionaiinclusionAI: Ring-2.6-1T
Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters, built for real-world agent workflows that require both strong capability and operational efficiency. It is optimized for coding agents, tool...
Language262k ctx$0.07/M - NewgoogleGoogle: Gemini 3.1 Flash Lite
Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic...
Language1049k ctx$0.25/M - NewopenaiOpenAI: GPT Chat Latest
GPT Chat Latest points to OpenAI's stable API alias `chat-latest` that always resolves to the latest Instant chat model used in ChatGPT. As OpenAI rolls out new Instant model updates...
Language400k ctx$5.00/M - Newx-aixAI: Grok 4.3
Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual...
Language1000k ctx$1.25/M - NewIGibm-graniteIBM: Granite 4.1 8B
Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window and is designed for enterprise tasks...
Language131k ctx$0.05/M - NewmistralaiMistral: Mistral Medium 3.5
Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...
Language262k ctx$1.50/M - NewopenrouterOwl Alpha
Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution....
Language1049k ctx$0.00/M - NewnvidiaNVIDIA: Nemotron 3 Nano Omni (free)
NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...
Language256k ctx$0.00/M - NewPpoolsidePoolside: Laguna XS.2 (free)
Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering...
Language262k ctx$0.00/M - NewPpoolsidePoolside: Laguna M.1 (free)
Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 128K...
Language262k ctx$0.00/M - NewanthropicAnthropic Claude Haiku Latest
This model always redirects to the latest model in the Anthropic Claude Haiku family.
Language200k ctx$1.00/M - NewopenaiOpenAI GPT Mini Latest
This model always redirects to the latest model in the OpenAI GPT Mini family.
Language400k ctx$0.75/M - NewgoogleGoogle Gemini Pro Latest
This model always redirects to the latest model in the Google Gemini Pro family.
Language1049k ctx$2.00/M - NewmoonshotaiMoonshotAI Kimi Latest
This model always redirects to the latest model in the MoonshotAI Kimi family.
Language262k ctx$0.68/M - NewgoogleGoogle Gemini Flash Latest
This model always redirects to the latest model in the Google Gemini Flash family.
Language1049k ctx$1.50/M - NewanthropicAnthropic Claude Sonnet Latest
This model always redirects to the latest model in the Anthropic Claude Sonnet family.
Language1000k ctx$3.00/M - NewopenaiOpenAI GPT Latest
This model always redirects to the latest model in the OpenAI GPT family.
Language1050k ctx$5.00/M - NewqwenQwen: Qwen3.5 Plus 2026-04-20
Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window. This...
Language1000k ctx$0.30/M - NewqwenQwen: Qwen3.6 Flash
Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window. Tiered pricing kicks in...
Language1000k ctx$0.19/M - NewqwenQwen: Qwen3.6 35B A3B
Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybrid sparse mixture-of-experts architecture combining Gated...
Language262k ctx$0.15/M - NewqwenQwen: Qwen3.6 Max Preview
Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture with approximately 1 trillion total parameters. It is optimized for agentic coding, tool use, and...
Language262k ctx$1.04/M - NewqwenQwen: Qwen3.6 27B
Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video inputs...
Language262k ctx$0.29/M - NewopenaiOpenAI: GPT-5.5 Pro
GPT-5.5 Pro is OpenAI’s high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads. It features a 1M+ token context window (922K input, 128K output) with support for...
Language1050k ctx$30.00/M - NewopenaiOpenAI: GPT-5.5
GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on hard tasks. It features a 1M+ token...
Language1050k ctx$5.00/M - NewdeepseekDeepSeek: DeepSeek V4 Pro
DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning, coding,...
Language1049k ctx$0.43/M - NewdeepseekDeepSeek: DeepSeek V4 Flash
DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...
Language1000k ctx$0.09/M - NewIinclusionaiinclusionAI: Ling-2.6-1T
Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company’s trillion-parameter flagship, designed for real-world agents that require fast execution and high efficiency at scale. It uses a “fast...
Language262k ctx$0.07/M - NewTtencentTencent: Hy3 preview
Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to...
Language262k ctx$0.07/M - NewXxiaomiXiaomi: MiMo-V2.5-Pro
MiMo-V2.5-Pro is Xiaomi’s flagship model, delivering strong performance in general agentic capabilities, complex software engineering, and long-horizon tasks, with top rankings on benchmarks such as ClawEval, GDPVal, and SWE-bench Pro....
Language1049k ctx$0.43/M