Systemic risk in AI

Systemic risk in AI is a concept introduced by the EU AI Act to describe the dangers posed by the most capable general-purpose AI (GPAI) models, the kind whose reach and impact extend across many sectors at once.

The idea is that a single foundation model can sit underneath thousands of downstream applications. If that model has a serious flaw, a dangerous capability, or a security weakness, the harm does not stay contained. It propagates through every product built on top of it.

The AI Act treats these models differently from ordinary GPAI models. A regular general-purpose model carries transparency and documentation duties. A model with systemic risk carries those plus a heavier set of obligations focused on safety, evaluation, and incident response.

For governance, risk, and compliance teams, the term matters because it sets a clear regulatory line. Once a model crosses it, the provider takes on responsibilities that cannot be delegated to the companies fine-tuning or deploying the model downstream.

How a model is classified as systemic risk

The AI Act uses a presumption based on training compute. A general-purpose AI model is presumed to carry systemic risk when the cumulative amount of computation used for its training exceeds 10^25 floating point operations (FLOPs).

This threshold is a proxy, not a perfect measure. Compute correlates with capability, so a very large training run is treated as a signal that the resulting model is powerful enough to create broad effects. The figure can be adjusted over time as the technology and the understanding of capability evolve.

Compute is not the only route. The European Commission and the AI Office can also designate a model as carrying systemic risk based on other criteria: the number of parameters, the quality and size of the dataset, the number of registered users, the model's reach across the internal market, or specific high-impact capabilities. A provider can also be informed that its model is likely to meet the threshold and given a chance to respond.

Providers that train a model meeting the compute threshold must notify the Commission. They can argue that despite the compute used, their model does not present systemic risk, but the burden is on them to make that case.

What extra obligations follow

Providers of GPAI models with systemic risk take on duties that go well beyond standard documentation. The core ones are:

Model evaluation. Run state-of-the-art evaluations using standardized protocols, including testing aimed at identifying and mitigating systemic risks.
Adversarial testing. Conduct and document adversarial testing (red teaming) to find and address dangerous capabilities and failure modes before and after release.
Risk assessment and mitigation. Assess and mitigate possible systemic risks at the EU level, including their sources, across the model lifecycle.
Incident reporting. Track, document, and report serious incidents and possible corrective measures to the AI Office and, where relevant, national authorities without undue delay.
Cybersecurity. Ensure an adequate level of cybersecurity protection for the model and its physical infrastructure, since a compromised frontier model is itself a systemic threat.

Providers can demonstrate compliance by adhering to codes of practice developed with the AI Office until harmonized standards are available. Following an approved code is one way to show good faith and reduce regulatory uncertainty.

Why the concept matters

Systemic risk shifts a meaningful share of responsibility upstream, to the small number of organizations that train frontier models. This reflects a practical reality: the companies deploying a model rarely have the access, the resources, or the visibility to evaluate its deepest behaviors.

It also gives the rest of the ecosystem something to rely on. A hospital building a clinical tool on top of a large model cannot itself run frontier-scale evaluations. Knowing that the base model provider is legally required to evaluate, red team, and report incidents gives downstream actors a foundation they can build on.

For anyone tracking AI regulation, the threshold is a useful marker. It is the line where a model stops being an ordinary product and starts being treated as infrastructure with the potential to affect the whole market.

How teams prepare

Providers likely to approach the threshold should track training compute carefully and build evaluation and red teaming into the development process rather than bolting it on at the end. Compute accounting needs to be defensible, because the notification duty depends on it.

Downstream deployers should ask their model providers directly whether a model is classified as carrying systemic risk, and request the evaluation and incident reporting that the AI Act requires. That information feeds into the deployer's own risk assessment.

Governance teams should keep a watch on the codes of practice and any updates to the compute threshold, since both can change what counts as systemic risk and what compliance looks like.

FAQ

What is the exact compute threshold for systemic risk?

The EU AI Act presumes a general-purpose AI model carries systemic risk when the cumulative compute used for its training is greater than 10^25 FLOPs (floating point operations). This is a presumption, so a provider can try to rebut it, and the Commission can also designate models below the threshold based on other factors such as parameter count, dataset size, user numbers, or specific capabilities.

Does the threshold apply to the deployer or the model provider?

The obligations fall on the provider of the general-purpose AI model, the organization that trains it and places it on the market. Companies that deploy or fine-tune the model have their own duties, but the systemic-risk obligations such as frontier evaluation and incident reporting sit with the upstream provider.

Is the 10^25 FLOPs number permanent?

No. The threshold can be amended by the Commission through delegated acts as the technology advances and as the relationship between compute and capability becomes clearer. Teams should treat it as a current figure rather than a fixed constant.

What counts as a serious incident that must be reported?

A serious incident generally involves a malfunction or failure of the model that leads, directly or indirectly, to harm such as death, serious damage to health, serious disruption of critical infrastructure, or breaches of fundamental rights. Providers of systemic-risk models must document and report these to the AI Office without undue delay.

How does systemic risk differ from high-risk AI systems?

High-risk refers to specific AI systems used in sensitive contexts such as hiring, credit, or medical devices, classified under the Act's risk tiers. Systemic risk is a separate category that applies only to the most capable general-purpose models. A model can carry systemic risk without being deployed in any single high-risk use case, because the concern is its broad downstream reach.

Can a provider avoid the obligations by arguing its model is not risky?

A provider whose model crosses the compute threshold can submit arguments that the model does not present systemic risk despite its size. The presumption can be rebutted, but the provider carries the burden of proof and the Commission assesses the case. Simply being below the threshold does not guarantee exemption either, since designation can happen on other grounds.

Summary

Systemic risk in AI is the EU AI Act's way of singling out the most capable general-purpose models for stronger oversight. The presumption kicks in above 10^25 FLOPs of training compute, and once a model qualifies, its provider must evaluate it rigorously, run adversarial testing, mitigate risks at the EU level, report serious incidents, and protect it with strong cybersecurity. The concept pushes accountability upstream to the handful of organizations capable of training frontier models, and it gives the wider ecosystem a dependable foundation to build on.

Systemic risk in AI