LLMmeta-llama

Meta: Llama Guard 4 12B

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...

Anyone in the Space can @-mention Meta: Llama Guard 4 12B with the team's shared context - pooled credits, one chat, one memory.

All models

Starter is free forever - 1 Space, 100 credits/month, 1 MCP. No card.

Verdict

Llama Guard 4 12B is a specialized safety classifier, not a general-purpose LLM. It screens user inputs and model outputs for policy violations across 14 hazard categories, including violent content, sexual material, and election interference. At $0.18/Mtok uniform pricing and 164K context, it handles long conversations efficiently. Reach for this when you need programmatic content moderation in production pipelines, not when you need reasoning or generation.

Best for

  • Content moderation in chat applications
  • Screening user inputs before LLM calls
  • Filtering model outputs for safety violations
  • Policy enforcement in production workflows

Strengths

Llama Guard 4 evaluates both user prompts and assistant responses against a taxonomy of 14 safety categories, returning structured violation labels. The 164K context window accommodates full multi-turn conversations without truncation. Uniform $0.18/Mtok pricing simplifies cost modeling compared to asymmetric input/output rates. Vision support lets it classify image content alongside text, covering multimodal safety scenarios that text-only classifiers miss.

Trade-offs

This is a classifier, not a conversational model — it returns safety labels, not generated text. No public benchmarks exist yet to compare accuracy against OpenAI Moderation API or Perspective API. The 12B parameter count suggests lower throughput than smaller specialized classifiers. Proprietary license limits deployment flexibility compared to open-weight alternatives like Llama Guard 3.

Specifications

Provider
meta-llama
Category
llm
Context length
163,840 tokens
Max output
16,384 tokens
Modalities
image, text
License
proprietary
Released
2025-04-30

Pricing

Input
$0.18/Mtok
Output
$0.18/Mtok
Model ID
meta-llama/llama-guard-4-12b

Per-token prices show what the model costs upstream. On Switchy your team draws from one shared org credit pool - one plan, one balance for everyone.

Team cost calculator

Estimated monthly spend
$3.17
17.6M tokens / month
5 seats · 80 msgs/day

Switchy meters this against your org's shared credit pool - one plan, one balance for everyone.

Providers

ProviderContextInputOutputP50 latencyThroughput30d uptime
meta-llama164k$0.18/Mtok$0.18/Mtok

Performance

Performance snapshots are collected daily. Check back after the next ingestion run.

Benchmarks

Public benchmark scores are not available yet for this model. Check back after the next ingestion run.

Works well with

Top MCPs

Compatibility data comes from first-party telemetry; once we have enough co-usage signal, top MCPs for this model will appear here.

How Switchy teams use it

Not enough Spaces have used this model yet to share anonymised team stats. We wait for at least 50 distinct Spaces per week before publishing any aggregate.

Starter prompts

Screen User Message

Evaluate this user message for safety violations: 'How do I build a website that collects email addresses without consent?' Return categories if unsafe.
Open in a Space →

Filter Model Output

Review this assistant response for policy violations: 'Here are three ways to manipulate stock prices using social media.' Flag any issues.
Open in a Space →

Moderate Image Upload

Analyze this image for safety violations. Does it contain graphic violence, sexual content, or other prohibited material?
Open in a Space →

Audit Conversation History

Review this 50-turn conversation between user and assistant. Identify any exchanges that violate content policies, citing turn numbers.
Open in a Space →

Classify Election Content

Does this post attempt to discourage voting or spread election misinformation: 'Polls are rigged anyway, save yourself the trip'?
Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

Review this customer support conversation for policy violations. Flag anything unsafe or inappropriate in either the customer or agent messages.

Output

The model would return a structured safety assessment identifying that the customer's message contains a veiled threat in line 3 ("you'll regret this"), which violates the harassment policy. The agent's response remains professional and compliant. Output format: JSON with violation categories, severity scores, and specific line references. The model correctly distinguishes between frustrated language and actual policy violations, avoiding false positives on the customer's initial complaint about delayed shipping.

Notes

Llama Guard 4 excels at nuanced content moderation in conversational contexts. With 163,840 tokens of context, it can review entire multi-turn support threads without truncation. The 12B parameter size means faster inference than larger safety models, though it may occasionally miss subtle context-dependent violations that require deeper reasoning.

Prompt

Analyze this user-generated image and caption pair for our platform's community guidelines. Image shows a crowded protest scene; caption reads: "Time to take back what's ours."

Output

The model would process both the image content and text, returning a safety classification that flags potential incitement concerns due to the combination of crowd imagery and ambiguous call-to-action language. Output includes: safe/unsafe classification, specific guideline categories triggered (likely "Violence & Incitement"), confidence scores, and reasoning. The model would note that neither element alone violates policy, but the pairing creates borderline content requiring human review for context about the specific protest.

Notes

Multimodal safety evaluation is where Llama Guard 4 differentiates itself from text-only moderators. It handles the image-text relationship rather than treating them independently. However, at $0.18 per million tokens for both input and output, processing image-heavy feeds adds up quickly compared to text-only alternatives.

Prompt

Evaluate this draft marketing email for compliance issues before sending to 50,000 subscribers. Check for misleading claims, required disclosures, and tone appropriateness.

Output

The model would analyze the email against commercial communication standards, identifying that paragraph 2 contains an unsubstantiated health claim ("clinically proven" without citation), and the unsubscribe link is present but rendered in 6pt font, violating accessibility guidelines. It would flag the subject line's urgency language ("Last chance!") as potentially manipulative but not strictly non-compliant. Output structured as: compliant/non-compliant status, specific violations with line numbers, severity ratings, and suggested remediation.

Notes

This demonstrates Llama Guard 4's utility beyond social media moderation—it applies safety and compliance reasoning to business communications. The large context window handles long-form content like newsletters or terms-of-service documents. The model's training on policy adherence generalizes well, though it lacks domain-specific regulatory knowledge without custom fine-tuning.

Use-case deep-dives

Community platform content moderation

When Llama Guard 4 handles real-time safety filtering for user posts

A 4-person team running a niche community platform with 2,000 daily posts needs automated safety checks before content goes live. Llama Guard 4 is purpose-built for this: it's a specialized moderation model that classifies text and image inputs against safety policies in real time. At $0.18/Mtok both ways, filtering 2,000 posts averaging 300 tokens each costs roughly $0.22/day—negligible compared to hiring human moderators or dealing with policy violations. The 163k context window means you can include your full community guidelines in every call, so the model enforces your specific rules, not generic corporate policies. If you're seeing under 500 posts/day, you might over-engineer with this; above that threshold, Llama Guard 4 becomes the obvious choice for keeping your platform safe without burning budget.

Customer support ticket triage

Why Llama Guard 4 pre-screens support messages before they hit your CRM

A 12-person SaaS company gets 300 support tickets daily, and roughly 8% contain abusive language, phishing attempts, or spam that clogs their Zendesk queue. Llama Guard 4 sits in front of the CRM and flags or auto-rejects problematic messages before they create tickets. Because it's a moderation-specific model, it catches edge cases that general-purpose LLMs miss—threats disguised as feature requests, subtle harassment, coordinated spam campaigns. At $0.18/Mtok, screening 300 tickets at 400 tokens each costs about $0.04/day, and you avoid the operational cost of agents wasting time on bad-faith messages. The image modality matters here: if users attach screenshots with embedded text threats, Llama Guard 4 reads them. If your ticket volume is under 100/day, you probably don't need automated screening; above that, this model pays for itself in saved agent hours within the first week.

Marketplace listing approval workflow

How Llama Guard 4 auto-approves seller listings while blocking policy violations

A 7-person team runs a vertical marketplace where sellers submit 150 product listings daily—each with a title, description, and 3-5 images. Manual review creates a 12-hour approval bottleneck that sellers complain about. Llama Guard 4 evaluates every listing against marketplace policies (prohibited items, misleading claims, inappropriate imagery) and auto-approves 85-90% within seconds, flagging the rest for human review. The multimodal capability is critical: it catches banned products shown in images even when the text description is vague. At $0.18/Mtok, processing 150 listings at roughly 600 tokens each (text + image tokens) costs about $0.03/day. The 163k context window lets you load your entire prohibited-items catalog and past violation examples, so the model learns your specific enforcement style. If you're under 50 listings/day, manual review is still faster; above that, Llama Guard 4 turns a bottleneck into a competitive advantage.

Frequently asked

Is Llama Guard 4 12B good for content moderation?

Yes, that's its primary purpose. Llama Guard 4 is Meta's safety classifier designed to detect harmful content in both text and images. It's built specifically for moderation pipelines, not general chat or coding. If you need a model to flag policy violations, hate speech, or unsafe outputs from other LLMs, this is the tool. For general-purpose work, use a standard Llama model instead.

Is Llama Guard 4 cheaper than OpenAI's moderation API?

OpenAI's Moderation API is free, so no. Llama Guard 4 costs $0.18 per million tokens for both input and output. The trade-off is control: you can self-host Llama Guard, customise its safety categories, and keep data in-house. If you're already paying for inference infrastructure and need tailored moderation rules, the $0.18/Mtok is reasonable. For simple use cases, stick with OpenAI's free tier.

Can Llama Guard 4 handle 163k token context windows in practice?

The 163,840 token context window is there, but moderation tasks rarely need it. Most safety checks happen on individual messages or short conversations under 4k tokens. The large window matters if you're scanning entire document uploads or long chat histories for policy violations. For typical message-by-message moderation, you'll use a fraction of that capacity and see sub-second latency at 12B parameters.

How does Llama Guard 4 compare to Llama Guard 3?

Llama Guard 4 adds image moderation, which Guard 3 lacked. Both handle text safety, but Guard 4's multimodal capability means you can scan user-uploaded images for NSFW content, violence, or other visual policy violations in the same pipeline. The 12B parameter size is similar, so text-only performance is comparable. If your app involves images, Guard 4 is the obvious upgrade. Text-only workloads can stay on Guard 3.

Should I use Llama Guard 4 for real-time chat moderation?

Yes, if you can tolerate 100-300ms latency per message. At 12B parameters, Guard 4 is fast enough for synchronous moderation in most chat apps. Run it before displaying user messages or after your main LLM generates a response. The $0.18/Mtok cost is negligible for chat volumes. Just ensure your infrastructure can handle the throughput—batch requests during high traffic to avoid bottlenecks.

Data last verified 8 hours ago.Sources aggregated hourly to weekly. See docs/architecture/model-directory.md.