arXiv
researchactive

AgentRx: Diagnosing AI Agent Failures from Execution Trajectories

View original resource

Introduces the AGENTRY benchmark for attributing the first unrecoverable failure in AI agents across API workflows, incident management, and real-world web/file tasks, supporting trajectory-based diagnosis of why agents fail.

Tags

agentic AIevaluationfailure detectiontrajectories

At a glance

Published

2026

Jurisdiction

Global

Category

Evaluation and benchmarks

Access

Public access

Build your AI governance program

VerifyWise helps you implement AI governance frameworks, track compliance, and manage risk across your AI systems.

AgentRx: Diagnosing AI Agent Failures from Execution Trajectories | VerifyWise AI Governance Library