researchactive
Prioritizing Real-Time Failure Detection in AI Agents
Madhulika Srikumar et al., Partnership on AI
View original resourcePartnership on AI paper arguing offline evaluation is insufficient and detailing methods for real-time failure detection in deployed agents - anomaly scoring on trajectories, tool-call validation, and escalation triggers. Includes recommendations for deployers and auditors.
Tags
agentic AIevaluation
At a glance
Published
2025
Jurisdiction
International
Category
Evaluation and benchmarks
Access
Public access
Build your AI governance program
VerifyWise helps you implement AI governance frameworks, track compliance, and manage risk across your AI systems.