MCP security is the discipline of protecting the Model Context Protocol connections that let AI agents reach your tools, files, and APIs. It matters because an MCP server is not a passive API — it hands an autonomous, reasoning model direct programmatic access to sensitive systems, and that model cannot reliably tell trusted instructions apart from malicious data. Get this wrong and a single poisoned tool description or unauthenticated local server turns an AI assistant into an insider threat.
The pattern echoes what we covered in our guide to prompt injection attacks against LLM applications: the model is not the vulnerability, the capabilities you wire into it are. MCP simply raises the stakes because it standardises those capabilities and makes them trivial to add. This piece walks through what MCP is, why its attack surface is larger than most teams realise, the specific risks worth naming, and a concrete way to secure both servers and clients.
What MCP Is, Briefly
The Model Context Protocol is an open standard Anthropic introduced in November 2024 to give LLMs and agents a uniform way to connect to external tools and data sources. Instead of writing a bespoke integration for every system, developers expose an MCP server that advertises a set of tools, and any MCP-aware client — a chat app, an IDE, an agent framework — can discover and call them. Adoption accelerated sharply through 2025 and into 2026, and MCP is now widely described as the connective tissue of agentic AI.
That convenience is the whole point, and it is also the problem. Every MCP server represents an external connection, a permission surface, and a potential credential exposure point. Because servers aggregate access to multiple back-end services, they concentrate risk: a single breached server deployed without authentication can hand an attacker access to every database, file system, and cloud service the agent touches.
The MCP Attack Surface
Traditional application security assumes code paths are reviewed and deterministic. MCP breaks that assumption in two ways. First, the agent decides which tools to call based on natural-language reasoning over content it does not control, so the control flow is emergent rather than written. Second, tool definitions — names, descriptions, and schemas — are themselves untrusted input that the model reads and acts on. A tool description is not documentation to the model; it is an instruction.
The empirical picture is not reassuring. A 2026 audit cited by Practical DevSecOps found that 40% of MCP servers still require no authentication, 43% carry command-injection vulnerabilities, and 79% handle credentials in plaintext. Security researcher Simon Willison and others documented in 2025 that MCP inherits the full weight of the prompt-injection problem, because the model still cannot separate data from instructions when both arrive as text.
A 2026 review of the MCP ecosystem reported that roughly 40% of MCP servers require no authentication at all, 43% contain command-injection flaws, and 79% store or transmit credentials in plaintext. Source: Practical DevSecOps, MCP Security Best Practices (2026).
Top MCP Security Risks
These are the risk classes we see named most consistently across current MCP security research, including Checkmarx, Elastic Security Labs, Invariant Labs, and the OWASP MCP Security Cheat Sheet:
- Tool poisoning — a malicious or crafted tool description hides instructions the model faithfully executes, such as exfiltrating a file while appearing to do something benign
- Prompt injection via tool outputs — data returned by a tool (a web page, a document, a database row) carries hidden instructions that hijack the agent, an indirect injection with a new delivery path
- Over-broad OAuth scopes and token theft — a server that only needs read access holds a token that can write everywhere, so one poisoned call becomes a breach instead of a nuisance
- Unauthenticated or local MCP servers — servers bound to localhost with no auth are reachable by any local process or, when misconfigured, from the network
- Confused deputy — the host application grants its LLM authority to call tools, and the LLM cannot tell a legitimate request from malicious data, so it acts on the attacker’s behalf
- Cross-server data exfiltration and tool shadowing — with multiple servers connected, one malicious server can override or shadow the tools of a trusted server and weaponise it
- Supply-chain risk in third-party servers — servers pull schemas, config, and runtime logic from external sources, so a malicious dependency update compromises the toolchain
- Rug-pull tool redefinition — a tool that was benign at approval time silently mutates its description or behaviour in a later load, as Invariant Labs demonstrated with a server that swapped its interface to leak WhatsApp chat history
Notice how many of these are variations on a single theme: the model treats untrusted text as trusted instructions. That is the same root cause we mapped in our OWASP Top 10 for LLM applications overview, and it is why point fixes rarely hold. MCP adds two amplifiers — dynamic tool definitions and multi-server composition — that make the blast radius bigger.
How to Secure MCP Servers and Clients
The robust posture is the one we argue for throughout our work on securing GenAI with defence in depth: treat the model as untrusted code and constrain what it can do, not what it can be told. Applied to MCP, that becomes a concrete sequence.
- Turn on OAuth 2.1 authorization with mandatory PKCE, and never accept Bearer tokens over plain HTTP in production
- Validate the token audience so the server only accepts tokens minted specifically for it, and never pass a client token straight through to an upstream API
- Scope tokens to least privilege — a server that reads one repository must not hold a token that can write to every repository in the org
- Set short access-token lifetimes (5 to 30 minutes by sensitivity) and rotate refresh tokens on use, because a rotated token used twice is a theft signal
- Allow-list and validate every tool input, and block SSRF egress to private IP ranges and metadata endpoints
- Pin and review tool definitions — detect when a tool’s description or schema changes after approval to defeat rug-pull and shadowing attacks
- Isolate servers by trust level and avoid mixing untrusted third-party servers with servers that hold sensitive credentials in the same agent context
- Require human confirmation for irreversible or high-impact actions such as sending money, deleting data, or emailing externally
- Log every tool call to an immutable, centralised audit trail so you can attribute and investigate agent behaviour after the fact
None of these controls tries to make the model immune to being tricked. Each one limits the damage when it is. If the worst thing a poisoned tool call can achieve is a rejected, logged, out-of-scope request, you have a defensible architecture. If it can quietly exfiltrate your customer records, you have an architecture problem that no amount of prompt tuning will fix.
Enterprise Governance of MCP
Individual server hardening does not scale on its own. The failure mode in most organisations is sprawl: third-party servers developers wired in without review, internal servers product teams shipped, and servers embedded in IDE configurations that security never saw. The governing principle is simple — if an MCP server can reach a production system, it is in scope. Current enterprise guidance converges on four capabilities that work together: a centralised catalog of approved servers, identity-based access controls, structured audit logging, and real-time policy enforcement.
The architectural decision that makes this enforceable is the MCP gateway. Rather than every client holding a direct, ungoverned line to every server, a gateway sits in front of all servers as a single checkpoint that authenticates the calling agent, authorizes the request, routes it to the right upstream, logs it, and applies policies like rate limiting and data masking. It also reframes the problem correctly: every agent, server, and tool is a non-human identity that must authenticate, be authorized, and remain traceable to a verified actor.
Enterprise guidance in 2026 (TrueFoundry, Strata, Obot) converges on treating every agent, server, and tool as a non-human identity, fronted by an MCP gateway that centralises authentication, authorization, logging, and policy enforcement rather than leaving each connection ungoverned.
If you are building agents rather than just wiring up existing ones, fold MCP into your threat model from the start. Our walkthrough of STRIDE threat modeling for LLM applications and our field notes on agentic design patterns in production both apply directly: enumerate every tool the agent can reach, name the worst-case action per tool, and put the control on the action. MCP security is not a new discipline so much as the discipline of remembering that convenient access and dangerous access are the same access.