# Meta: Llama 3.2 11B Vision Instruct

Provider: meta-llama  
Category: llm  
Model ID: `meta-llama/llama-3.2-11b-vision-instruct`

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

## Specs

- Context length: 131072 tokens
- Max output: 16384 tokens
- Modalities: text, image
- Released: 2024-09-25

## Pricing

- Input: $0.24 per million tokens
- Output: $0.24 per million tokens

## Providers

- **meta-llama** — ctx 131072, input $0.24/M, output $0.24/M

---
Last verified: 2026-04-23T23:46:29.618Z  
Canonical URL: https://switchy.build/models/llama-3-2-11b-vision-instruct