Madhulika Srikumar et al., Partnership on AI
researchactive

Prioritizing Real-Time Failure Detection in AI Agents

Madhulika Srikumar et al., Partnership on AI

View original resource

Partnership on AI paper arguing offline evaluation is insufficient and detailing methods for real-time failure detection in deployed agents - anomaly scoring on trajectories, tool-call validation, and escalation triggers. Includes recommendations for deployers and auditors.

Tags

agentic AIevaluation

At a glance

Published

2025

Jurisdiction

International

Category

Evaluation and benchmarks

Access

Public access

Build your AI governance program

VerifyWise helps you implement AI governance frameworks, track compliance, and manage risk across your AI systems.

Prioritizing Real-Time Failure Detection in AI Agents | VerifyWise AI Governance Library