Madhulika Srikumar et al., Partnership on AI
View original resourcePartnership on AI paper arguing offline evaluation is insufficient and detailing methods for real-time failure detection in deployed agents - anomaly scoring on trajectories, tool-call validation, and escalation triggers. Includes recommendations for deployers and auditors.
Published
2025
Jurisdiction
International
Category
Evaluation and benchmarks
Access
Public access
VerifyWise helps you implement AI governance frameworks, track compliance, and manage risk across your AI systems.