What is MCP security?

MCP security is the practice of protecting Model Context Protocol connections between AI agents and the tools, files, and APIs they use. Because MCP servers give an autonomous model direct access to sensitive systems, MCP security focuses on authenticating and authorizing every connection, validating tool inputs and definitions, scoping credentials tightly, and logging all tool calls.

Is MCP secure by default?

No. MCP is a protocol, not a security product, and many servers ship with weak defaults. A 2026 audit found roughly 40% of MCP servers require no authentication, 43% carry command-injection flaws, and 79% handle credentials in plaintext. MCP also inherits the prompt-injection problem, so servers and clients need deliberate hardening before production use.

What is tool poisoning in MCP?

Tool poisoning is when an attacker crafts a tool’s name, description, or schema so that it contains hidden instructions. Because the model reads tool descriptions as instructions rather than documentation, it can be induced to exfiltrate data or take unauthorized actions while appearing to perform something harmless. Pinning and reviewing tool definitions is the primary defence.

What is an MCP rug pull attack?

A rug pull is when a tool that was benign at the time you approved it silently changes its description or behaviour in a later load to become malicious. Invariant Labs demonstrated a proof of concept where a harmless server swapped its interface to leak WhatsApp chat history. Detecting post-approval changes to tool definitions defeats this class of attack.

How do you secure an MCP server?

Enable OAuth 2.1 with mandatory PKCE, enforce HTTPS, validate the token audience, and never forward a client token to upstream APIs. Scope tokens to least privilege, keep access-token lifetimes short and rotate refresh tokens on use, allow-list and validate every tool input, block SSRF egress to private ranges, require human approval for irreversible actions, and log every call.

What is the confused deputy problem in MCP?

The confused deputy problem is when a trusted component is tricked into acting for an unauthorized party. In MCP the host application is the deputy: it grants its LLM authority to call tools based on both user input and untrusted data, and the LLM cannot tell them apart. The fix is to constrain the deputy’s privileges and require confirmation for sensitive actions.

MCP Security: Risks and How to Secure MCP Servers

MCP security is the discipline of protecting the Model Context Protocol connections that let AI agents reach your tools, files, and APIs. It matters because an MCP server is not a passive API — it hands an autonomous, reasoning model direct programmatic access to sensitive systems, and that model cannot reliably tell trusted instructions apart from malicious data. Get this wrong and a single poisoned tool description or unauthenticated local server turns an AI assistant into an insider threat.

The pattern echoes what we covered in our guide to prompt injection attacks against LLM applications: the model is not the vulnerability, the capabilities you wire into it are. MCP simply raises the stakes because it standardises those capabilities and makes them trivial to add. This piece walks through what MCP is, why its attack surface is larger than most teams realise, the specific risks worth naming, and a concrete way to secure both servers and clients.

What MCP Is, Briefly

The Model Context Protocol is an open standard Anthropic introduced in November 2024 to give LLMs and agents a uniform way to connect to external tools and data sources. Instead of writing a bespoke integration for every system, developers expose an MCP server that advertises a set of tools, and any MCP-aware client — a chat app, an IDE, an agent framework — can discover and call them. Adoption accelerated sharply through 2025 and into 2026, and MCP is now widely described as the connective tissue of agentic AI.

That convenience is the whole point, and it is also the problem. Every MCP server represents an external connection, a permission surface, and a potential credential exposure point. Because servers aggregate access to multiple back-end services, they concentrate risk: a single breached server deployed without authentication can hand an attacker access to every database, file system, and cloud service the agent touches.

The MCP Attack Surface

Traditional application security assumes code paths are reviewed and deterministic. MCP breaks that assumption in two ways. First, the agent decides which tools to call based on natural-language reasoning over content it does not control, so the control flow is emergent rather than written. Second, tool definitions — names, descriptions, and schemas — are themselves untrusted input that the model reads and acts on. A tool description is not documentation to the model; it is an instruction.

The empirical picture is not reassuring. A 2026 audit cited by Practical DevSecOps found that 40% of MCP servers still require no authentication, 43% carry command-injection vulnerabilities, and 79% handle credentials in plaintext. Security researcher Simon Willison and others documented in 2025 that MCP inherits the full weight of the prompt-injection problem, because the model still cannot separate data from instructions when both arrive as text.

A 2026 review of the MCP ecosystem reported that roughly 40% of MCP servers require no authentication at all, 43% contain command-injection flaws, and 79% store or transmit credentials in plaintext. Source: Practical DevSecOps, MCP Security Best Practices (2026).

Top MCP Security Risks

These are the risk classes we see named most consistently across current MCP security research, including Checkmarx, Elastic Security Labs, Invariant Labs, and the OWASP MCP Security Cheat Sheet:

Tool poisoning — a malicious or crafted tool description hides instructions the model faithfully executes, such as exfiltrating a file while appearing to do something benign
Prompt injection via tool outputs — data returned by a tool (a web page, a document, a database row) carries hidden instructions that hijack the agent, an indirect injection with a new delivery path
Over-broad OAuth scopes and token theft — a server that only needs read access holds a token that can write everywhere, so one poisoned call becomes a breach instead of a nuisance
Unauthenticated or local MCP servers — servers bound to localhost with no auth are reachable by any local process or, when misconfigured, from the network
Confused deputy — the host application grants its LLM authority to call tools, and the LLM cannot tell a legitimate request from malicious data, so it acts on the attacker’s behalf
Cross-server data exfiltration and tool shadowing — with multiple servers connected, one malicious server can override or shadow the tools of a trusted server and weaponise it
Supply-chain risk in third-party servers — servers pull schemas, config, and runtime logic from external sources, so a malicious dependency update compromises the toolchain
Rug-pull tool redefinition — a tool that was benign at approval time silently mutates its description or behaviour in a later load, as Invariant Labs demonstrated with a server that swapped its interface to leak WhatsApp chat history

Notice how many of these are variations on a single theme: the model treats untrusted text as trusted instructions. That is the same root cause we mapped in our OWASP Top 10 for LLM applications overview, and it is why point fixes rarely hold. MCP adds two amplifiers — dynamic tool definitions and multi-server composition — that make the blast radius bigger.

How to Secure MCP Servers and Clients

The robust posture is the one we argue for throughout our work on securing GenAI with defence in depth: treat the model as untrusted code and constrain what it can do, not what it can be told. Applied to MCP, that becomes a concrete sequence.

Turn on OAuth 2.1 authorization with mandatory PKCE, and never accept Bearer tokens over plain HTTP in production
Validate the token audience so the server only accepts tokens minted specifically for it, and never pass a client token straight through to an upstream API
Scope tokens to least privilege — a server that reads one repository must not hold a token that can write to every repository in the org
Set short access-token lifetimes (5 to 30 minutes by sensitivity) and rotate refresh tokens on use, because a rotated token used twice is a theft signal
Allow-list and validate every tool input, and block SSRF egress to private IP ranges and metadata endpoints
Pin and review tool definitions — detect when a tool’s description or schema changes after approval to defeat rug-pull and shadowing attacks
Isolate servers by trust level and avoid mixing untrusted third-party servers with servers that hold sensitive credentials in the same agent context
Require human confirmation for irreversible or high-impact actions such as sending money, deleting data, or emailing externally
Log every tool call to an immutable, centralised audit trail so you can attribute and investigate agent behaviour after the fact

None of these controls tries to make the model immune to being tricked. Each one limits the damage when it is. If the worst thing a poisoned tool call can achieve is a rejected, logged, out-of-scope request, you have a defensible architecture. If it can quietly exfiltrate your customer records, you have an architecture problem that no amount of prompt tuning will fix.

Enterprise Governance of MCP

Individual server hardening does not scale on its own. The failure mode in most organisations is sprawl: third-party servers developers wired in without review, internal servers product teams shipped, and servers embedded in IDE configurations that security never saw. The governing principle is simple — if an MCP server can reach a production system, it is in scope. Current enterprise guidance converges on four capabilities that work together: a centralised catalog of approved servers, identity-based access controls, structured audit logging, and real-time policy enforcement.

The architectural decision that makes this enforceable is the MCP gateway. Rather than every client holding a direct, ungoverned line to every server, a gateway sits in front of all servers as a single checkpoint that authenticates the calling agent, authorizes the request, routes it to the right upstream, logs it, and applies policies like rate limiting and data masking. It also reframes the problem correctly: every agent, server, and tool is a non-human identity that must authenticate, be authorized, and remain traceable to a verified actor.

Enterprise guidance in 2026 (TrueFoundry, Strata, Obot) converges on treating every agent, server, and tool as a non-human identity, fronted by an MCP gateway that centralises authentication, authorization, logging, and policy enforcement rather than leaving each connection ungoverned.

If you are building agents rather than just wiring up existing ones, fold MCP into your threat model from the start. Our walkthrough of STRIDE threat modeling for LLM applications and our field notes on agentic design patterns in production both apply directly: enumerate every tool the agent can reach, name the worst-case action per tool, and put the control on the action. MCP security is not a new discipline so much as the discipline of remembering that convenient access and dangerous access are the same access.