How enterprises are building, deploying, and measuring AI agents in production.
12 resources
Anthropic survey of enterprises building on Claude, reporting adoption patterns, top use cases (coding, research, customer operations), deployment models, and common blockers around evaluation, cost, and access controls for tool-using agents in production.
Anthropic report on how engineering teams use coding agents in 2026, with data on task types, languages, session length, and review practices. Highlights the shift from autocomplete to delegated, multi-step code changes.
Pan et al. propose a measurement framework for production agents covering task success, trajectory quality, cost, latency, and regression detection. Argues offline benchmarks miss drift and tool-call errors, and outlines continuous evaluation for live traffic.
LangChain's survey of 1,000+ practitioners on agent engineering practice, covering framework choice, observability stacks, evaluation approaches, and the biggest blockers moving from prototype to production. Tracks year-over-year shifts in tool use and deployment maturity.
McKinsey QuantumBlack annual AI survey, with a 2025 focus on agents: adoption by function, value captured, organisational redesign, and the small share of firms reporting bottom-line impact. Includes benchmarks for governance maturity.
UK AI Security Institute report on frontier model capability and deployment trends, including tool use, long-horizon tasks, and agentic scaffolding. Summarises AISI evaluation results and flags capability thresholds that change the threat picture.
Domino Data Lab overview of operational risks enterprises hit when deploying agents at scale: data leakage through tools, identity sprawl, non-deterministic cost, weak audit trails, and accountability gaps. Proposes a governance checklist for platform teams.
Microsoft Azure engineering post laying out five observability practices for production agents: full-trace logging, evaluation-driven CI, quality metrics, safety monitoring, and operational telemetry. Shows concrete implementations using Azure AI Foundry tracing.
Permit.io practitioner guide on inserting human approvals into agent workflows, covering interruption patterns, approval queues, and permission models. Compares implementations in LangGraph, CrewAI, and AutoGen, with a demo app gating sensitive tool calls.
Databricks engineering post introducing Agent Bricks as a governed platform for building, evaluating, and monitoring enterprise agents. Covers unity-catalog-backed permissions, AI gateway policies, and synthetic evaluation pipelines tied to production telemetry.
CIO Dive report summarising Salesforce and Databricks launches of agentic governance tooling, including Agentforce permission sets, audit trails, and Databricks Mosaic AI Gateway. Frames vendor moves as responses to enterprise concerns about sprawl and accountability.
OpenAI's Python SDK for orchestrating agents with tools, handoffs between specialised sub-agents, guardrails for input and output, and tracing. Successor to the experimental Swarm SDK, used to build both single-agent and manager-pattern systems.