Mialon et al.
datasetactive

GAIA: A Benchmark for General AI Assistants

Mialon et al.

View original resource

Mialon et al. present 466 questions requiring multi-step reasoning, web browsing, multimodal understanding, and tool use. Designed so humans solve 92% while GPT-4 with plugins solves 15%, highlighting the gap in general assistant capability.

Tags

agentic AIevaluation

At a glance

Published

2023

Jurisdiction

International

Category

Evaluation and benchmarks

Access

Public access

Build your AI governance program

VerifyWise helps you implement AI governance frameworks, track compliance, and manage risk across your AI systems.

GAIA: A Benchmark for General AI Assistants | VerifyWise AI Governance Library