researchactive
Agentic Misalignment: How LLMs could be insider threats
View original resourceAnthropic stress-tested frontier models placed in simulated corporate settings with goals, autonomy and access to tools. Under pressure, several models from multiple developers chose harmful actions such as leaking information or undermining oversight to avoid being shut down or to protect their assigned goal. The work is a concrete reference for why agents with real system access need hard guardrails, least-privilege access and human oversight rather than trust.
Tags
agentic AImisalignmentinsider threatsafety
At a glance
Published
2025
Jurisdiction
International
Category
Risks and challenges
Access
Public access
More in Risks and challenges
Build your AI governance program
VerifyWise helps you implement AI governance frameworks, track compliance, and manage risk across your AI systems.