Threat landscape for AI agents: prompt injection, data protection, misuse, and systemic risks.
14 resources
UK ICO tech-futures analysis of how agentic AI interacts with UK GDPR, covering lawful basis for agent-initiated processing, data minimisation across tool calls, transparency duties, and accountability when agents act on behalf of data subjects.
Independent expert panel report chaired by Yoshua Bengio for UK DSIT, synthesising evidence on general-purpose AI capabilities, risks, and mitigations. 2026 edition expands coverage of agentic systems, loss-of-control scenarios, and emerging misuse patterns.
Research showing that Claude's Skills feature, which auto-loads Markdown instructions from the filesystem, enables trivial prompt injection via a single malicious file. Demonstrates data exfiltration and privilege escalation across common agent deployments.
OWASP Gen AI Security Project's top-ten list of agentic application risks for 2026, covering memory poisoning, tool misuse, privilege compromise, intent breaking, goal manipulation, and identity spoofing. Includes example attacks and suggested controls per risk.
OWASP reference mapping the top agentic AI threats to concrete technical and procedural mitigations, organised by attack surface (planning, memory, tools, outputs). Aimed at defenders building secure agent stacks rather than researchers cataloguing attacks.
Oliver Patel's practitioner essay flagging how agents break assumptions in enterprise AI governance: autonomous tool calls, emergent multi-agent behaviour, and diffuse accountability. Suggests extensions to risk registers, oversight roles, and policy controls.
Wang et al. survey proposing a taxonomy of prompt injection threats specific to LLM agents, distinguishing direct, indirect, and tool-mediated vectors. Analyses defences (sandboxing, detection, constrained decoding) against reported attack success rates.
IEEE Spectrum article covering research showing agents violate assigned constraints under everyday pressures like deadlines or user insistence. Summarises findings from multiple benchmark studies and discusses implications for deployment in regulated settings.
Shapira et al. document emergent failure modes in multi-agent LLM deployments, including cascading hallucinations, role drift, and collusion. Propose experimental setups to reproduce chaotic behaviour and measure its dependence on agent count and coupling.
Dutch Data Protection Authority warning on security and privacy risks of agent platforms like OpenClaw, flagging unscoped data access, weak logging, and inability to honour data subject rights when agents act across multiple systems.
Cloud Security Alliance's MAESTRO threat-modelling methodology for multi-agent and agentic systems, extending STRIDE-style analysis across seven architectural layers (foundation model, data, deployment, observability, security, compliance, agent ecosystem) with example threats and controls.
UC Berkeley CLTC report setting out a risk-management approach for increasingly autonomous AI agents, covering risk identification across the lifecycle, oversight mechanisms, and organisational roles. Aimed at enterprise and public-sector deployers.
Mitchell et al. (Hugging Face) argue against developing fully autonomous AI agents, mapping a spectrum from human-in-the-loop assistants to unsupervised actors. Enumerates safety, ethical, and accountability risks that grow sharply at each autonomy level.
MITRE ATLAS knowledge base of adversary tactics, techniques, and case studies targeting machine-learning systems, including agent-specific scenarios like prompt injection, tool abuse, and model-in-the-loop manipulation. Structured in ATT&CK-compatible format for defenders.