developer-toolsapi_key

Datadog

Observability, metrics, traces, logs.

Verdict

Datadog via MCP exposes your metrics, logs, monitors, and traces to the model. The integration that turns "go check the dashboard" into "ask the AI what the p99 looks like." What we notice: the model is genuinely useful for ad-hoc queries — "what's CPU on the api service over the last hour", "show me the slow queries on the orders DB", "which monitors fired this week and why". For dashboard summarisation it's a clean win; for alerting and incident routing the existing Datadog flow already works and the MCP doesn't add much. Logs are token-heavy, so be specific in queries. Best for: ad-hoc metric queries during debugging; weekly reliability reviews with the model summarising monitor activity; correlating a deploy event with a metric change; bridging "I don't have the Datadog dashboard memorised" gaps for newer engineers. Avoid for: high-volume log search (the API costs and token counts add up fast); incident response where you need real-time graphs (the dashboard wins); orgs with strict Datadog access controls — verify the MCP's scope matches your role boundary. Practical frame: API access is included with paid Datadog tiers. The MCP itself is free; usage shows up in your normal Datadog metering. Most useful as the second-opinion layer on top of dashboards you already trust.

Common use cases

Create incident monitors during war rooms
Schedule downtime before deployments
Build SLOs from reliability discussions
Generate dashboards for sprint demos
Log deployment events from chat

Integration

Vendor: Datadog
Category: developer-tools
Auth: API_KEY
Tools: 42
Composio slug: datadog

Tools

Create Dashboard
Create a dashboard in datadog. dashboards provide customizable visualizations for monitoring your infrastructure, applications, and business metrics in a unified view.
Create downtime
Creates a new downtime in datadog to suppress alerts during maintenance windows or planned outages. useful for preventing false alarms during deployments or maintenance.
Create event
Creates a new event in datadog. events are useful for tracking deployments, outages, configuration changes, and other important occurrences.
Create monitor
Creates a new datadog monitor to track metrics, logs, or other data sources with configurable alerting thresholds and notifications.
Create SLO
Create a service level objective (slo) in datadog. slos help you define and track reliability targets for your services, enabling data-driven decisions about service quality and reliability investments.
Create Synthetic API Test
Create a synthetic api test in datadog. creates a new synthetic api test that continuously monitors api endpoints from multiple locations worldwide. useful for proactive monitoring of api uptime, performance, and functionality.
Create Webhook
Create a webhook in datadog. webhooks enable you to receive notifications from datadog monitors and alerts to external services and applications.
Delete Dashboard
destructive
Delete a dashboard in datadog. permanently removes a dashboard from your organization. this action cannot be undone. use with caution.
Delete monitor
destructive
Deletes a datadog monitor permanently. use with caution as this action cannot be undone.
Get Dashboard
Get a specific dashboard from datadog. retrieves detailed information about a dashboard including its widgets, layout, template variables, and metadata.
Get host tags
Retrieves all tags associated with a specific host in datadog. useful for understanding host metadata and organizing infrastructure.
Get monitor
Retrieves detailed information about a specific datadog monitor, including its current state, configuration, and any active downtimes.
Get Service Dependencies
Get service dependency mapping from datadog apm. this action retrieves the dependency graph for a specific service, showing both upstream services (that call this service) and downstream services (that this service calls). it's essential fo
Get Synthetics Locations
Tool to retrieve all available public and private locations for synthetic tests in datadog. use when you need a list of location identifiers for creating or managing synthetic tests.
Get Trace by ID
Get detailed information about a specific trace by its id. this action retrieves comprehensive details about a distributed trace, including all spans, timing information, errors, and metadata. it's essential for: - deep diving into specific
Get usage summary
Retrieves usage summary information from datadog including api calls, hosts, containers, and other billable usage metrics. useful for cost monitoring and usage analysis.
List All Tags
List all tags from datadog. tags help organize and filter your infrastructure and applications. this action shows all tags in use across your organization.
List API Keys
List api keys in datadog. retrieves all api keys in the organization for security auditing, access management, and key rotation planning. helps maintain security posture by tracking key usage and ownership.
List APM Services
List apm services from datadog. application performance monitoring (apm) provides deep visibility into your applications, helping you track performance, errors, and dependencies.
List AWS Integration
List aws integrations in datadog. retrieves all configured aws account integrations, showing which aws accounts are monitored by datadog and their configuration settings. useful for cloud infrastructure management and ensuring comprehensive
List dashboards
Lists all datadog dashboards with basic information. useful for dashboard management and getting an overview of available dashboards.
List events
Lists events from datadog within a specified time range. events track important occurrences like deployments, outages, and configuration changes.
List hosts
Lists all hosts in your datadog infrastructure with detailed information including metrics, tags, and status. useful for infrastructure monitoring and management.
List Incidents
List incidents from datadog. incident management helps you track, manage, and resolve incidents efficiently with comprehensive timeline and impact tracking.
List Log Indexes
Tool to retrieve a list of all log indexes configured in datadog. use when you need to get the names and configurations of log indexes.
List monitors
Get all monitor details. this endpoint allows you to retrieve information about all monitors configured in your organization. you can filter by group states, name, tags, and use pagination to manage large result sets.
List Roles
List roles from datadog organization. roles define sets of permissions that control what users can do within your datadog organization.
List service checks
Lists service checks from datadog. service checks are status checks that track the health of your services and infrastructure components.
List SLOs
List service level objectives (slos) from datadog. service level objectives help you track the reliability and performance of your services by setting measurable targets for key metrics.
List Synthetics Tests
List synthetics tests from datadog. synthetics monitoring allows you to proactively monitor your applications and apis by simulating user interactions and api calls from various locations.
List Users
List users from datadog organization. user management allows you to see team members, their roles, and access levels within your datadog organization.
List Webhooks
List webhooks from datadog. webhooks allow you to send notifications to external services when monitors trigger, enabling integration with your workflows.
Mute Monitor
Mute a monitor in datadog. temporarily silences alerts from a monitor, which is useful during maintenance windows, deployments, or when investigating known issues to prevent alert fatigue.
Query metrics
Queries datadog metrics and returns time series data. useful for retrieving historical metric data, creating custom dashboards, or building reports.
Search logs
Searches datadog logs with advanced filtering capabilities. important notes: - sort parameter is not supported by the datadog logs api and will cause errors - time parameters must be in milliseconds (13-digit unix timestamps) - limit parame
Search Spans Analytics
Search and analyze span data with aggregations in datadog. this action uses the datadog spans analytics api to perform advanced queries and aggregations on trace span data. it's essential for: - analyzing error rates and latency patterns -
Search Traces
Search for traces in datadog apm. this action allows you to search for distributed traces across your services. it's essential for: - finding specific request flows during incident investigation - analyzing performance bottlenecks across se
Submit metrics
Submits custom metrics to datadog. useful for sending application-specific metrics, business kpis, or custom performance indicators.
Unmute Monitor
Unmute a monitor in datadog. re-enables alerts from a previously muted monitor, returning it to normal monitoring and alerting behavior. use this after maintenance windows or issue resolution to resume monitoring.
Update Dashboard
Update a dashboard in datadog. updates an existing dashboard with new configuration, widgets, or layout while preserving its identity and creation metadata.
Update host tags
Updates tags for a specific host in datadog. this replaces all existing tags from the specified source with the new tags provided.
Update monitor
Updates an existing datadog monitor with new configuration, thresholds, or notification settings. only specified fields will be updated.

Setup

Setup guide

11. In Switchy, open your workspace settings and navigate to the Integrations tab. 2. Find Datadog in the MCP catalog and click Connect. 3. You'll be prompted to enter two credentials: a Datadog API key and an application key (both available under Organization Settings > API Keys in your Datadog account). 4. Grant the keys write permissions — the MCP needs to create monitors, dashboards, and other resources on your behalf. 5. Click Save and wait for the green confirmation badge. 6. Open any Space in Switchy and type '@Datadog create a monitor for CPU usage above 80%' to test the connection. 7. If the MCP responds with monitor details, you're ready to use all 42 tools; if it errors, double-check your application key has the correct scopes in Datadog.

What teammates see: by default, memories from Datadog are scoped to the Space (PROJECT visibility) - you can mark any memory PRIVATE or share it ORG-wide.

Works well with

Top models

Compatibility data appears once enough Spaces have used this MCP together with a given model.

How Switchy teams use it

Not enough Spaces yet to publish anonymised usage stats (we require ≥ 50 Spaces per week).

Starter prompts

Monitor API Latency

@Datadog create a monitor that alerts when the p95 latency of our checkout API exceeds 500ms over the last 5 minutes

Open in a Space →

Schedule Deploy Downtime

@Datadog create a downtime for all monitors tagged 'production' starting tonight at 11pm UTC for 2 hours with the message 'v2.4 deployment'

Open in a Space →

Log Deployment Event

@Datadog create an event titled 'Backend v2.4 deployed' tagged with 'deployment' and 'backend' and set the priority to normal

Open in a Space →

Build Service Dashboard

@Datadog create a dashboard called 'Payment Service Health' with widgets for request rate, error rate, and p99 latency from the 'payment-api' service

Open in a Space →

Define Uptime SLO

@Datadog create an SLO for 99.9% uptime on the 'checkout-service' monitor over a 30-day rolling window

Open in a Space →

Example outputs

Illustrative - representative of the model's voice and quality, not literal recordings.

Prompt

@datadog show me all critical monitors that fired in the last 24 hours and summarize the most common failure patterns

Output

Found 7 critical monitors that triggered in the past 24 hours. The most common pattern is elevated error rates in the payment-service (4 incidents), followed by database connection timeouts in user-db (2 incidents), and one cache eviction spike in redis-primary. Peak incident time was 2:15 AM UTC when three monitors fired simultaneously. All payment-service alerts resolved within 15 minutes; the database timeouts required manual intervention.

Notes

This example demonstrates Datadog's monitor querying combined with AI synthesis to surface operational patterns. Requires read access to monitors and events. The AI can correlate timing and service names, but won't have access to underlying metric data unless you explicitly grant timeseries read permissions in your Datadog API key scope.

Prompt

@datadog create a downtime for the staging-api monitors from 2 PM to 4 PM today because we're doing a database migration

Output

Created downtime 'staging-api database migration' scheduled for today 14:00-16:00 UTC. This downtime will suppress alerts for 3 monitors: staging-api-health, staging-db-connections, and staging-response-time. Notifications are muted for the ops-staging Slack channel during this window. The downtime will automatically expire at 16:00 UTC, and monitors will resume normal alerting.

Notes

This write action prevents alert fatigue during planned maintenance. The MCP will create the downtime immediately—there's no preview or confirmation step. Make sure your timezone is correctly specified (defaults to UTC). If monitor names are ambiguous, the MCP will ask for clarification before proceeding. Requires monitors_write scope.

Prompt

@datadog create an SLO tracking 99.9% availability for the checkout-service API based on the last 30 days of uptime data

Output

Created SLO 'checkout-service-availability' with a 99.9% target over a 30-day rolling window. The SLO monitors the checkout-service.api.uptime metric and calculates availability as (total requests - 5xx errors) / total requests. Current status: 99.87% (12 minutes of error budget remaining this period). You'll receive warnings when error budget drops below 25%. The SLO is now visible in your Datadog SLO dashboard.

Notes

This example shows how to establish reliability targets that tie monitoring to business outcomes. The MCP will infer reasonable defaults for the SLO configuration (metric query, time window, alert thresholds) but you should verify these match your service's actual behavior. Requires slos_write scope. SLO creation is permanent—use the delete tool if you need to remove test SLOs.

Use-case deep-dives

Post-deploy alert tuning for SRE teams

When Datadog MCP cuts down on-call noise after a release

A 6-person SRE team ships twice a week and drowns in false positives every Friday. The Datadog MCP is the right call here: one engineer writes a Switchy prompt that creates a downtime window 10 minutes before deploy, logs the release event with the commit SHA, and re-enables monitors after smoke tests pass. The OAuth2 flow means the team shares one connection—no API key sprawl. The 42 tools cover the full lifecycle (downtime, event, monitor CRUD), so you're not duct-taping three integrations. If your deploy cadence is monthly or you have fewer than 3 active monitors, the setup overhead outweighs the win. But at twice-a-week velocity with 15+ monitors, this MCP pays for itself in the first sprint by turning deploy day from a pager storm into a one-click ritual.

SLO reporting for customer success handoffs

How this MCP bridges engineering SLOs and CS check-ins

A 12-person B2B SaaS company runs quarterly business reviews with enterprise customers, and CS needs to show uptime data without bugging engineering. The Datadog MCP fits: a CS lead runs a Switchy prompt that pulls the last 90 days of SLO compliance for the customer's tenant, formats it into a one-pager, and attaches the dashboard link. The Create SLO and Create Dashboard tools let engineering define the metrics once; CS self-serves the reports. OAuth2 means CS doesn't need API keys or Datadog seats. The trade-off: if your SLOs change weekly or you're still defining what to measure, the MCP adds friction—you'll spend more time updating prompts than running them. But once SLOs stabilize and you're doing monthly or quarterly reviews, this turns a 30-minute Slack thread into a 2-minute prompt run.

Incident postmortem automation for platform teams

When this MCP closes the loop on incident write-ups

A 9-person platform team runs 4-6 incidents a month and struggles to document them consistently. The Datadog MCP is a strong fit: after an incident resolves, the on-call engineer runs a Switchy prompt that creates an event with the timeline, updates the relevant monitors with new thresholds learned from the incident, and logs a webhook to Slack with the postmortem link. The Create Event, Create Monitor, and Create Webhook tools cover the full loop without leaving the chat. OAuth2 means the whole team can trigger the flow—no key rotation when someone leaves. The boundary: if your incidents are rare (under 2/month) or you don't have a postmortem template yet, the MCP is overkill. But at 4+ incidents monthly with a defined process, this turns a 45-minute manual checklist into a 5-minute prompt that actually gets run every time.

Frequently asked

What does the Datadog MCP do in Switchy?

It lets your AI agents create and manage Datadog monitoring resources—dashboards, monitors, SLOs, synthetic tests, downtime schedules, and events—directly from chat. Instead of switching to the Datadog UI to set up a new monitor or suppress alerts during a deploy, you describe what you need and the agent builds it. Useful for teams who want monitoring setup to happen in the same conversation as the work itself.

Do I need admin access to connect Datadog via OAuth?

You need a Datadog account with permissions to create and delete the resources you want the MCP to manage—dashboards, monitors, SLOs, synthetic tests. Standard user roles work if your org grants those permissions. Admin access isn't strictly required, but if your role can't create monitors, the MCP can't either. Check your Datadog role settings before connecting.

Can the Datadog MCP query existing metrics or logs?

No. The 42 tools focus on creating and deleting monitoring resources—dashboards, monitors, SLOs, synthetic tests, webhooks, downtime schedules. It doesn't read metric data, run log queries, or fetch incident history. If you need to pull metrics into a conversation, use Datadog's API directly or export data to a tool the agent can read.

Why use this instead of the Datadog API or Terraform?

The MCP is faster for one-off tasks during a conversation—your agent can create a monitor or schedule downtime without you leaving chat. For repeatable infrastructure-as-code setups, Terraform is still better. For custom integrations outside Switchy, use the API. The MCP shines when monitoring setup is part of an ad-hoc workflow, not a scripted pipeline.

Who on the team should connect the Datadog integration?

Whoever owns your Datadog monitoring setup—usually a DevOps lead or SRE. That person's OAuth token determines what the agent can create or delete. If multiple people need to manage monitors through Switchy, each should connect their own Datadog account. The integration doesn't share credentials across team members.

Compare with

Sentry

Compare with anything else →