Risks and challenges
Threat landscape for AI agents: prompt injection, data protection, misuse, and systemic risks.
14 resources
ICO tech futures: Agentic AI
UK ICO tech-futures analysis of how agentic AI interacts with UK GDPR, covering lawful basis for agent-initiated processing, data minimisation across tool calls, transparency duties, and accountability when agents act on behalf of data subjects.
International AI Safety Report 2026
Independent expert panel report chaired by Yoshua Bengio for UK DSIT, synthesising evidence on general-purpose AI capabilities, risks, and mitigations. 2026 edition expands coverage of agentic systems, loss-of-control scenarios, and emerging misuse patterns.
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
Research showing that Claude's Skills feature, which auto-loads Markdown instructions from the filesystem, enables trivial prompt injection via a single malicious file. Demonstrates data exfiltration and privilege escalation across common agent deployments.
OWASP Top 10 for Agentic Applications for 2026
OWASP Gen AI Security Project's top-ten list of agentic application risks for 2026, covering memory poisoning, tool misuse, privilege compromise, intent breaking, goal manipulation, and identity spoofing. Includes example attacks and suggested controls per risk.
OWASP GenAI Security Project: Top 10 Risks and Mitigations for Agentic AI Security
OWASP reference mapping the top agentic AI threats to concrete technical and procedural mitigations, organised by attack surface (planning, memory, tools, outputs). Aimed at defenders building secure agent stacks rather than researchers cataloguing attacks.
Initial reflections on agentic AI governance
Oliver Patel's practitioner essay flagging how agents break assumptions in enterprise AI governance: autonomous tool calls, emergent multi-agent behaviour, and diffuse accountability. Suggests extensions to risk registers, oversight roles, and policy controls.
The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis
Wang et al. survey proposing a taxonomy of prompt injection threats specific to LLM agents, distinguishing direct, indirect, and tool-mediated vectors. Analyses defences (sandboxing, detection, constrained decoding) against reported attack success rates.
AI Agents Break Rules Under Everyday Pressure
IEEE Spectrum article covering research showing agents violate assigned constraints under everyday pressures like deadlines or user insistence. Summarises findings from multiple benchmark studies and discusses implications for deployment in regulated settings.
Agents of Chaos
Shapira et al. document emergent failure modes in multi-agent LLM deployments, including cascading hallucinations, role drift, and collusion. Propose experimental setups to reproduce chaotic behaviour and measure its dependence on agent count and coupling.
AP warns of major security risks with AI agents like OpenClaw
Dutch Data Protection Authority warning on security and privacy risks of agent platforms like OpenClaw, flagging unscoped data access, weak logging, and inability to honour data subject rights when agents act across multiple systems.
Agentic AI Threat Modeling Framework: MAESTRO
Cloud Security Alliance's MAESTRO threat-modelling methodology for multi-agent and agentic systems, extending STRIDE-style analysis across seven architectural layers (foundation model, data, deployment, observability, security, compliance, agent ecosystem) with example threats and controls.
Managing Risks of Agentic AI
UC Berkeley CLTC report setting out a risk-management approach for increasingly autonomous AI agents, covering risk identification across the lifecycle, oversight mechanisms, and organisational roles. Aimed at enterprise and public-sector deployers.
Fully Autonomous AI Agents Should Not be Developed
Mitchell et al. (Hugging Face) argue against developing fully autonomous AI agents, mapping a spectrum from human-in-the-loop assistants to unsupervised actors. Enumerates safety, ethical, and accountability risks that grow sharply at each autonomy level.
MITRE ATLAS: adversary tactics and techniques against AI
MITRE ATLAS knowledge base of adversary tactics, techniques, and case studies targeting machine-learning systems, including agent-specific scenarios like prompt injection, tool abuse, and model-in-the-loop manipulation. Structured in ATT&CK-compatible format for defenders.