Sierra Research
datasetactive

tau-bench: A benchmark for tool-agent-user interaction

Sierra Research

View original resource

Sierra Research's open benchmark for tool-using agents, simulating airline and retail customer-service tasks with an LLM user and rule-based APIs. Measures task success, adherence to policy, and consistency across repeated trials under realistic constraints.

Tags

agentic AIevaluation

At a glance

Published

2024

Jurisdiction

International

Category

Evaluation and benchmarks

Access

Public access

Build your AI governance program

VerifyWise helps you implement AI governance frameworks, track compliance, and manage risk across your AI systems.

tau-bench: A benchmark for tool-agent-user interaction | VerifyWise AI Governance Library