researchactive
GAIA: A Benchmark for General AI Assistants
View original resourceA benchmark of real-world questions that are conceptually simple for humans but require agents to chain together reasoning, web browsing, tool use and multimodality to answer. GAIA is widely used to compare assistant-style agents, and the gap between human and model performance makes it a practical governance reference point for what agents can and cannot yet do reliably.
Tags
agentic AIevaluationbenchmark
At a glance
Published
2023
Jurisdiction
International
Category
Evaluation and benchmarks
Access
Public access
Build your AI governance program
VerifyWise helps you implement AI governance frameworks, track compliance, and manage risk across your AI systems.