researchactive
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
View original resourcePrinceton’s proposal for a holistic agent leaderboard, arguing that bare-model, vendor-scaffolded, and full-system results diverge by 30–50 points and that fair agent evaluation needs standardized infrastructure across scaffolds and environments.
Tags
agentic AIevaluationbenchmarksleaderboard
At a glance
Published
2025
Jurisdiction
Global
Category
Evaluation and benchmarks
Access
Public access
Build your AI governance program
VerifyWise helps you implement AI governance frameworks, track compliance, and manage risk across your AI systems.