datasetactive
WebArena: A Realistic Web Environment for Building Autonomous Agents
Zhou et al.
View original resourceZhou et al. introduce a self-hosted web environment covering e-commerce, forums, software development, and CMS apps, with 812 natural-language tasks. Evaluates end-to-end browsing agents on realistic multi-step workflows with verifiable outcomes.
Tags
agentic AIevaluation
At a glance
Published
2023
Jurisdiction
International
Category
Evaluation and benchmarks
Access
Public access
Build your AI governance program
VerifyWise helps you implement AI governance frameworks, track compliance, and manage risk across your AI systems.