Mialon et al. (Meta AI, Hugging Face and collaborators)
researchactive

GAIA: A Benchmark for General AI Assistants

View original resource

A benchmark of real-world questions that are conceptually simple for humans but require agents to chain together reasoning, web browsing, tool use and multimodality to answer. GAIA is widely used to compare assistant-style agents, and the gap between human and model performance makes it a practical governance reference point for what agents can and cannot yet do reliably.

Tags

agentic AIevaluationbenchmark

At a glance

Published

2023

Jurisdiction

International

Category

Evaluation and benchmarks

Access

Public access

Build your AI governance program

VerifyWise helps you implement AI governance frameworks, track compliance, and manage risk across your AI systems.

GAIA: A Benchmark for General AI Assistants | VerifyWise AI Governance Library