Microsoft's groundbreaking whitepaper transforms theoretical AI safety concerns into a practical field guide for understanding how AI agents actually fail in the wild. Born from extensive internal red teaming exercises, this taxonomy doesn't just list potential problems—it categorizes real failure modes observed when AI agents interact with systems, make decisions, and operate with varying degrees of autonomy. The research provides a structured framework for identifying, categorizing, and ultimately preventing the kinds of failures that occur when AI moves beyond simple question-answering into complex, multi-step task execution.
Unlike academic risk assessments that theorize about potential AI failures, this taxonomy emerged from Microsoft's hands-on red teaming activities—essentially organized attempts to make AI agents fail in controlled environments. This approach reveals failure modes that only surface when AI agents are actively trying to accomplish goals, interact with APIs, navigate security boundaries, and make sequential decisions. The result is a classification system grounded in observed behaviors rather than hypothetical scenarios, making it invaluable for teams building or deploying agentic AI systems.
The taxonomy organizes AI agent failures into distinct categories that reflect how these systems actually break down in practice:
Start by mapping your AI agent's intended capabilities against the taxonomy's failure categories to identify which failure modes are most relevant to your specific use case. Focus initial testing efforts on the failure types most likely to cause significant impact in your environment—boundary violations might be critical for enterprise deployments, while goal misalignment could be paramount for customer-facing agents.
Use the taxonomy as a checklist during design reviews, ensuring each category is explicitly considered and addressed through technical controls, monitoring, or operational procedures. The framework works best when integrated into existing development workflows rather than treated as a separate safety exercise.
This taxonomy reflects failure modes observed in Microsoft's specific testing environments and may not capture every possible failure scenario across all AI agent architectures. The research focuses on current AI agent capabilities and may need updates as agentic AI systems become more sophisticated. Additionally, while the taxonomy excels at categorizing technical failures, it provides less guidance on organizational or process failures that can compound technical risks.
Publicado
2025
JurisdicciĂłn
Global
CategorĂa
Risk taxonomies
Acceso
Acceso pĂşblico
US Executive Order on Safe, Secure, and Trustworthy AI
Regulations and laws • White House
Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence
Regulations and laws • U.S. Government
Highlights of the 2023 Executive Order on Artificial Intelligence
Regulations and laws • Congressional Research Service
VerifyWise le ayuda a implementar frameworks de gobernanza de IA, hacer seguimiento del cumplimiento y gestionar riesgos en sus sistemas de IA.