Learning from Deliberately Broken Agents

Most security guidance still expects developers to learn from description alone. In my experience, that only goes so far. People usually remember the bug they just reproduced more clearly than the paragraph they skimmed while trying to ship a feature.

That is why I like OWASP's Insecure Agent Samples. They do something much more useful than another polished list of principles. They give you deliberately insecure agent implementations and say, effectively, "go break this and see what happens." For most developers, that is a much better teacher.

If you want developers to internalise agent security, give them a broken agent to run, not another abstract warning they will forget tomorrow.

Why the broken version teaches better

OWASP's objective is very explicit: show how insecure code and framework-specific misconfigurations can lead to vulnerabilities aligned to the OWASP Top 10 for GenAI and LLM Applications 2025. And it does that across frameworks people are actually using: LangChain, LangGraph, CrewAI, AutoGen, OpenAI Swarm, and Amazon Bedrock Agents.

That matters because agent security failures are not just generic "AI issues". They are often framework-shaped.

One framework makes it easy to over-trust tool outputs. Another encourages broad tool definitions. Another has defaults that are convenient for a demo and terrible for production. A developer can read a secure-looking example for one stack, copy the pattern into another, and quietly create a problem they do not yet know how to name.

This is where deliberately broken samples pull ahead of normal guidance.

You can see how the failure actually appears in your stack. Not in theory. In code. In runtime behaviour. In a workflow you can inspect.

And once you have watched an agent accept loose tool parameters, carry unsafe context forward, or misuse a capability because the boundaries were never explicit, the lesson sticks in a way generic advice does not.

When the team conversation finally gets useful

A lot of developers still hear "agent security" as a compliance flavour of "please be careful". That is too abstract to shape habits. But if you hand them a sample that demonstrates how an insecure configuration lets the agent cross a trust boundary, misuse a tool, or pass dangerous context downstream, the discussion gets concrete fast. Now you can ask the useful questions: which tools are too broad, which parameters are model-controlled when they should not be, which outputs are trusted too early, and which internal integrations are being treated as safe just because they are internal. Those are engineering questions, which is a much better place to be.

I also like what this approach says about how the field is maturing. Agent security is finally producing things developers can run, test, and learn from directly. That is healthier than a world where everyone keeps publishing yet another top-ten-style list and hoping people translate it correctly into code.

If you are building with one of the supported frameworks, the most useful thing you can do is not just read the project page. Pull the samples. Find the one closest to your stack. Reproduce the failure. Then compare it against your own code and tool definitions.

You will probably recognise something.

That is not a criticism of your team. It is just what fast-moving ecosystems look like. Bad patterns spread quickly. Secure patterns usually arrive later, and with less fanfare.

So yes, read the guidance, but do not stop there. Run the broken thing. Security becomes much more concrete when you can watch the failure happen in front of you.

Autonomy Is the Threat Model for why the LLM Top 10 wasn't enough for agents
Securing Your LLM Applications with the OWASP Top 10 for the LLM-focused risks

Learning from Deliberately Broken Agents

Why the broken version teaches better

When the team conversation finally gets useful

Related