arXiv
researchactive

Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework

View original resource

Presents a taxonomy of seven failure modes unique to production agentic systems, shows where standard metrics miss each one, and proposes a production evaluation framework for catching drift and silent failures in deployed agents.

Tags

agentic AIevaluationfailure modesproduction

At a glance

Published

2026

Jurisdiction

Global

Category

Evaluation and benchmarks

Access

Public access

Build your AI governance program

VerifyWise helps you implement AI governance frameworks, track compliance, and manage risk across your AI systems.

Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework | VerifyWise AI Governance Library