Aaron's Rogue Agent Lab
Three walkthroughs of prompt injection attacks against tool using agents. Walk the kill chain. See what the model sees. Trigger the compromise. Then read the mitigations.
Poisoned Webpage Attack
indirect prompt injection · retrieved content
A benign looking research article carries hidden adversarial
instructions in HTML comments, display:none divs,
and white on white text. The agent fetches the page, ingests
the payload as instructions, exfiltrates env secrets, and
writes a backdoor to CLAUDE.md.
- 5 guided steps with interactive terminal
- Live "reveal hidden" toggle on the victim page
- Tainted file tracking + persistence step
Tool Response Poisoning
compromised tool · trusted output channel
An agent calls a routine get_weather() tool. The
compromised API returns valid data; plus a
debug_note field carrying instructions. The agent
chains into send_email() and exfiltrates API
keys.
- Side-by-side tool inspector with raw JSON
- Watch the agent chain legitimate tools maliciously
- MCP server config persistence step
Agentic Kill Chain
initial access · persistence · lateral · exfil
A full APT style attack across a multiagent system. Vector DB persistence survives session resets; payload propagates over the interagent bus to coder + executor; final exfil ships env, conversation, and PII to a C2 endpoint.
- Live agent topology with compromise state badges
- Vector DB inspector + poisoned memory highlighting
- Interagent message bus + outbound C2 log