Everyone's Installing MCP Servers from GitHub. Nobody's Checking What They Do.

Teams are integrating MCP servers into their AI infrastructure with barely a second thought about what's actually being pulled in. A link from GitHub. Maybe a quick code review. But nobody's asking the hard question: what happens when that tool runs?

That's the new supply chain attack.

Tool poisoning isn't theoretical anymore. MCP servers (Model Context Protocol servers) are the integration layer between your AI agents and the systems they actually touch. They have permission to read files, call APIs, modify data, sometimes even execute code. Install a poisoned one, and someone else controls what your agents do.

The attack surface nobody's mapping

The pattern looks like this: a team installs some-org/useful-tool from GitHub. It does three things. Or at least, that's what they think. In reality, it's calling back to a C2 server, logging every prompt it sees, or silently modifying the outputs before they reach the user. You can't see this in a code review if you're only skimming the surface.

The risks come in flavours. Tool poisoning is the obvious one: the server does what it claims, then also does something else you didn't notice. Rug pulls happen when the maintainer updates the server to add malicious behaviour weeks or months after you've already integrated it. Prompt injection through MCP tools can break out of your guardrails entirely. Memory poisoning corrupts the context that other tools rely on. And cross-server interference means compromising one tool can cascade through your entire toolkit.

Most teams have no way to detect this until it's too late. You're running non-deterministic systems at scale. How would you even notice?

What the OWASP cheatsheet says you should actually be doing

The OWASP practical guide for securely using third-party MCP servers lays out the controls that matter, organised across vulnerability awareness, client hardening, authorisation, tooling, and governance.

Know what's happening. You need visibility into the attack surface itself. What are the common poisoning vectors? What does a rug pull look like? If you can't answer those questions about your own infrastructure, you're flying blind.

Harden your client and verify servers before you trust them. This is where most teams fail. Client-side controls mean rate-limiting tool calls, implementing sandboxing, requiring human-in-the-loop oversight for sensitive operations. On the server side, pin specific versions, maintain a manifest of approved servers and tools, use checksums to verify nothing's changed between installations. Don't pull the "latest" version on every startup. That's how you get surprised by updates you didn't review.

Implement proper authorisation, not broad permission grants. OAuth 2.1 and OIDC matter here. Every tool should operate under least-privilege: read-only access if it only needs to read, no API credentials at all if they're not required for that specific function. This is the hard work. It requires thinking through what each tool actually needs, not just giving it a database connection string and hoping for the best.

And build governance. Most teams don't have this. A trusted registry of internal MCP servers. Version pinning in their manifests. Checksums recorded somewhere. Someone actually reviewing what new servers get added. It sounds bureaucratic because it is. And it's necessary.

The version pinning reality

Here's what worries me most. Teams keep getting hit by legitimate server updates that introduce breaking changes because they're pulling "latest" without pinning. Not even malicious updates. Just incompatible ones. Now imagine an attacker doing that deliberately, hiding malicious code in what looks like a routine bug fix.

You need checksums. You need version locks. You need to know exactly what you approved and exactly what's running. Not "approximately" running. Exactly.

The approach that works: generate SHA-256 checksums for every MCP server binary and store them alongside the version manifest. On startup, compare the running binary against the approved checksum. A mismatch triggers an alert and the tool gets sandboxed until a human reviews it. It takes half a day to set up. Probably the cheapest security investment you'll make this year.

The honest problem

Right now, most organisations deploying agents have no governance framework for tool integration. They're moving fast, plugging in whatever MCP server solves the immediate problem, pushing to production. The security team finds out later, if at all.

Building a proper MCP security model means treating tool integration like you'd treat any other dependency supply chain issue. Because it is one. The tools to do this exist. OWASP published a practical guide. Checksums are free. Version pinning takes five minutes per server. What's missing is the discipline to actually do it before someone demonstrates what a poisoned tool can really do.

We're Building MCP Servers Like They're Traditional APIs for why old API security patterns don't apply
Now It's Agent Skills for the same supply chain problem, new attack surface
Admin and Devs Are Great Targets for why developer tooling is under attack

Everyone's Installing MCP Servers from GitHub. Nobody's Checking What They Do.

The attack surface nobody's mapping

What the OWASP cheatsheet says you should actually be doing

The version pinning reality

The honest problem

Related