Compliance

Jun 4, 2026

Updated Jun 12, 2026

41 min read

OSFI Guideline E-23: how AI and ML models fit the new model risk rules

OSFI Guideline E-23 takes effect 1 May 2027 and applies to every model with non-negligible risk at every federally regulated financial institution in Canada, including AI and ML. Here is what changed, what counts as a model and how to ready your AI inventory in 18 months.

If you work on models at a Canadian bank, insurer, trust company or foreign branch, OSFI Guideline E-23 has just changed your job. The revised guideline was published on 11 September 2025, takes effect on 1 May 2027 for every federally regulated financial institution, and pulls every model with non-negligible risk into a single enterprise governance regime. That includes the AI and ML systems most FRFIs have quietly deployed over the past three years.

What OSFI changed

The 2017 E-23 was a deposit-taking institution regime. It covered capital models at Canadian banks and not much else. The revised text says something larger. The Letter to industry that ships with the final guideline puts it plainly: "the expansion of the guideline's scope of application to all models at all FRFIs remains in place as a key revision."

Two changes do most of the work.

The population of institutions changes. The 2027 version applies to every federally regulated financial institution: banks, foreign bank branches, life insurers, fraternals, property and casualty insurers, trust and loan companies and foreign insurance branches. Federally regulated pension plans are excluded in the final text, on the grounds that OSFI's mandate to supervise them is different and there is alternative pension industry guidance.

The population of models also changes. The old regime focused on models that required regulatory approval, which mostly meant capital models. The 2027 regime treats every model with non-negligible inherent risk as in scope, whether it produces a capital number, an underwriting decision, an AML alert or a customer chatbot response.

An actuarial reserving model at a P&C insurer, an XGBoost credit decisioning model at a Schedule II bank, a vendor LLM that triages a claims queue at a life insurer and a spreadsheet pricing tool at a trust company are now governed by the same set of expectations.

OSFI's backgrounder explains why: "models are increasingly central to how financial institutions operate, from established actuarial tools to advanced AI. As institutions adopt more complex and innovative modeling techniques, OSFI is updating its guidance to ensure model risks are managed effectively."

How OSFI defines a model

The definition is broader than most institutions assumed during the consultation. From section A.4 of the final guideline:

"An application of theoretical, empirical, judgmental assumptions or statistical techniques, including AI/ML methods, which processes input data to generate results. A model has three distinct components: (1) data input component that may also include relevant assumptions, (2) processing component that identifies relationships between inputs, and (3) result component that presents outputs in a format that is useful and meaningful to business lines and control functions."

Stakeholders asked OSFI to narrow this. OSFI declined. The Letter explains: "We have left the definition intentionally broad in the final guideline. Institutions are expected to identify models and manage model risk commensurate with the model risk rating."

Proportionality is the safety valve. Only models with non-negligible inherent risk go on the inventory. The institution makes the risk-tiering call and defends it.

The boundary you have to draw runs between three buckets. Tools you would never have called a model before but that technically meet OSFI's definition (a pricing spreadsheet, an Excel macro that allocates risk capital, a rule-based fraud screener). Traditional quantitative models you already validate (capital, credit, market risk, actuarial, AML). AI and ML systems your business stood up in the last few years (an XGBoost underwriting model, a recommendation engine that nudges customers toward products, a GenAI assistant that drafts claim correspondence, an embeddings-based AML alert prioritiser).

E-23 expects all three buckets to pass through the same model identification step. The risk rating then decides what level of governance each one earns.

How AI and ML sit inside E-23

Stakeholders asked OSFI for a separate AI track. OSFI declined. The Letter says: "the outcomes and principles provided in the guideline do not vary based on the algorithmic approach." AI and ML obligations are woven through the same nine principles.

The places to watch:

Where in the guideline	What it does to AI and ML
Model definition (A.4)	Names AI and ML methods inside the definition
Footnote 1	Imports the OECD AI definition as the anchor
Principle 1.1	Requires staffing for "novel technologies, like AI" and multi-disciplinary MRM teams
Principle 2.2	Adds "level of autonomy" as a qualitative risk rating factor
Principle 3.1	Requires policies "flexible" enough for black-box and autonomous AI/ML
Principle 3.2	Calls out bias risk in AI/ML training data
Principle 3.3	Requires explainability that scales with autonomy, regulation and customer impact
Principle 3.4	Requires reviewers to evaluate explainability for AI/ML
Principle 3.6	Adds monitoring obligations for autonomous decision making, autonomous re-parametrization and drift

Two terminology notes worth flagging.

OSFI does not use the terms "LLM", "foundation model", "GenAI" or "agentic AI" in the body of the guideline. Footnote 1 routes around this by importing the OECD definition: "An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments." If your system fits the OECD definition, the guideline treats it as a model.

OSFI says autonomous decision making and autonomous re-parametrization, not agentic AI. If your AI governance documentation uses "agentic" everywhere, map it to OSFI's terms before validation.

On GenAI scope: stakeholders pushed OSFI to carve out low-risk uses like drafting marketing emails or summarising documents with ChatGPT. OSFI declined to carve them out, but added: "institutions are empowered to make risk-intelligent decisions, based on the application of the model within the organizational context, when establishing model risk ratings." A junior analyst pasting a meeting transcript into ChatGPT is technically using a model. You do not have to put every such use case on the inventory. You do have to defend why each one is low-risk and how you triaged it.

The 3 outcomes and 9 principles

Most E-23 explainers describe a flat list of five or seven principles. The final guideline is structured as three outcomes with nine numbered principles underneath.

Outcome 1: Model risk is well understood and managed across the enterprise

Principle 1.1 (organisational enablement). Senior management owns the MRM operating model, the staffing and the board reporting. Multi-disciplinary teams. AI and ML expertise in particular.

Principle 1.2 (MRM framework). Your framework has to fit into broader risk and governance, reflect your model risk appetite, cover model identification, inventory, ratings, lifecycle requirements and reporting, and explicitly cover models or data sourced from foreign offices or third-party vendors. The cross-reference to Guideline B-10 on Third-Party Risk Management is here.

Principle 1.3 (use of models). A model earns its place only if it meaningfully contributes to a decision. Models that no longer fit-for-purpose should be modified, replaced or decommissioned.

Outcome 2: Model risk is managed using a risk-based approach

Principle 2.1 (model identification). Periodically identify every model in the enterprise, including vendor and third-party models. Triage. Provisional rating. Central inventory.

Principle 2.2 (model risk rating). Inherent risk based, with quantitative and qualitative factors. Quantitative: portfolio size, growth, operational, security or financial impact. Qualitative: business use, complexity, level of autonomy, data input reliability, customer impact, regulatory risk. OSFI does not prescribe rating labels. You design your own tiering.

Principle 2.3 (risk management intensity). The rating drives the frequency, intensity and scope of model review, documentation, approval authority, monitoring frequency and re-rating cadence.

Outcome 3: Model governance covers the entire model lifecycle

Principle 3.1 (policies, procedures and controls). Apply commensurate with risk. Define who owns what. Engage data science, business, compliance, ethics, legal, IT and risk early. Maintain independence. Flexible enough for black-box and autonomous AI/ML models.

Principle 3.2 (model data). Accurate, fit-for-use, relevant, representative, compliant, traceable and timely. Enterprise-grade data governance. Explicit bias caution for AI and ML.

Principle 3.3 (model development). Documentation standards. Sound methodology. AI and ML training and parameter optimisation. Explainability requirements that scale with autonomy, regulation and customer impact.

Principle 3.4 (model review). Independent from development. Internal or objective third-party reviewers. Triggered by new models, modifications, performance breaches, significant data changes or scheduled periodic reviews. AI and ML reviewers evaluate explainability.

Principle 3.5 (model deployment). Quality and change control processes. Pre-deployment cyber, infrastructure and operational risk assessment, with cross-references to Guideline B-13 (Technology and Cyber Risk) and Guideline E-21 (Operational Risk Management). Explainability review at deployment.

Principle 3.6 (model monitoring and decommission). Quantitative metrics plus qualitative checks. Operational drift tracking. Thresholds for material modifications. Contingency plans for failure. The explicit AI and ML language on autonomous decision making, autonomous re-parametrization and model drift.

The phrase to internalise is risk-based and proportional. The Letter repeats it: the framework "applies on a risk-basis, proportional to the institution's size, strategy, risk profile, nature, scope, and complexity of operations, and interconnectedness."

The accounting firm MNP put this well in its May 2026 commentary: "E-23 exposes that proportionality isn't a statement; it's an operating model." If your tiering criteria are inconsistent, your validation depth is one-size-fits-all, or your governance evolved around a small set of high-profile models, the proportionality principle is the one that will bite first.

Two worked AI examples through the E-23 lifecycle

The published advisor commentary keeps saying "AI and ML are in scope" and stops there. Here is what the lifecycle looks like for two AI models a Canadian FRFI would have in production. The examples are illustrative, not drawn from a specific institution. The questions below are the ones a compliance director needs to be able to answer, not the ones a data scientist would.

Example 1: a machine learning credit decisioning model at a mid-tier bank

A Schedule II bank uses a gradient-boosted machine learning model (the kind of XGBoost-style ensemble that has become the industry default for credit risk) to score unsecured consumer credit applications. The model takes around 35 input features about the applicant and the application, produces a probability that the customer will default, and feeds an automated approve / refer to a human / decline decision for smaller-dollar personal credit applications.

The model exists today. E-23 forces the bank to formalise the following.

Rationale. The model owner documents what the model is for, who it covers and what controls limit its damage if it goes wrong. For a model like this, the alternative controls language in Principle 3.3 is where to commit to design rules that constrain behaviour (the model is forbidden from concluding higher income means higher default risk, for example), price caps and a human review queue for borderline scores.

Risk rating. The model affects pricing for retail customers. Customer impact is high. The model triggers a decision but a human can override. Some of the input data is noisy. The qualitative factors push this above the floor. Most Canadian banks would tier this as high or material.

Development. The development documentation has to cover where the data came from, how it was prepared, how the model was built, how it performed in testing, how it performs across protected groups (age, gender, ethnicity proxies), and why the final version was chosen.

Review. Principle 3.4 requires an independent review by someone who did not build the model. The reviewer checks whether the model type is the right tool for the job, whether the bank can explain a decline to a customer and whether the fairness tests hold up. The reviewer also confirms the risk rating.

Deployment. Change control around the model file. The data it sees in production has to match the data it was trained on. Principle 3.5 wants a pre-deployment cyber and operational risk check (cross-referenced to Guidelines B-13 and E-21).

Monitoring. Quarterly is the floor, not the ceiling. The bank tracks whether the input features are drifting away from what the model was trained on, whether the model's stated default rate matches actual defaults, whether outcomes are consistent across protected groups, how often staff override the model, and whether decline reasons are explainable.

Example 2: a vendor GenAI assistant that triages claims at a life insurer

A life insurer licenses a claims management platform from a vendor. The vendor has added a GenAI feature that drafts a triage summary and a routing decision for incoming claims. Under the hood, the feature calls a foundation model hosted by a separate cloud provider.

This is the fourth-party case the Letter addresses directly. The insurer carries responsibility for the foundation model behind the feature, even though the vendor owns the contract.

Rationale. The model owner documents the use case (first-line triage), the controls that limit it (the feature drafts, a human approves before payout, no claim is paid without a licensed adjuster signing off) and the explainability strategy. Foundation models cannot explain themselves the way a credit model can. The rationale commits to evidence about the model's behaviour instead.

Risk rating. The output influences a claims decision. Customer impact is high. Today a human is in the loop. If the vendor later adds auto-approval for low-value claims, the rating changes. Data quality varies wildly across claim forms. High or material.

Development. The insurer did not develop the model. Its development documentation is whatever the vendor will share, plus its own configuration (the prompts it sends, the documents it makes available to the model, the guardrails it sets, any fine-tuning). Principle 3.3's documentation requirements still apply. A vendor that will not share enough to meet them puts the insurer in a position the reviewer cannot defend.

Review. Independent review covers the prompts, the documents the model can access, the guardrails, the test set, red team results, hallucination rate, fairness across customer segments and the third-party model lineage. The reviewer follows the chain to the foundation model. OSFI's verbatim language: "a third-party model review should include inputs into the model, including feeder models." If the reviewer cannot see the feeder model, that is what they write up.

Deployment. Cyber and operational risk review at the platform layer (B-13). Data classification check (no protected information leaves the vendor's data boundary unless contractually safeguarded).

Monitoring. The insurer samples model outputs each month to check for hallucinations. It watches whether the documents the model retrieves are drifting. It samples customer complaints. It tracks adjuster overrides. It watches the vendor's change log because if the foundation model behind the feature gets silently upgraded, the insurer needs to know.

The lifecycle obligations are identical between the two examples. What differs is where the controls live. The XGBoost model is built and run inside the bank, so the controls are internal. The GenAI assistant is run by a vendor over a foundation model, so the controls live partly in the vendor contract and partly in joint governance between the insurer and the vendor.

Explainability for AI and ML

Principle 3.3 names explainability. Principle 3.4 makes reviewers evaluate it. The guideline does not prescribe techniques. The bar your reviewer applies will come from somewhere, and it is better if that somewhere is your own policy than an examiner's reading.

A short field guide to what the techniques are and when each one is the right control.

SHAP. Tells you which input features drove a specific prediction, and by how much. Good for credit, fraud and propensity models where you have to explain individual decisions to customers or supervisors. Expensive to compute at scale but the standard answer for high-stakes tabular models.

LIME. A faster, simpler version of SHAP for explaining individual predictions. Less stable than SHAP across runs, so use it for case-level explanations rather than for evidence to a reviewer.

Monotonic constraints. Not an explanation technique. A design rule applied while the model is being built that forbids it from behaving in directions that do not make business sense (income up should mean default risk down, never the reverse). Easier to defend to a reviewer than any post-hoc explanation because it is built into the model itself.

Surrogate models. A simpler, interpretable model trained to mimic what the production model does, so you can explain the production model's overall behaviour. Useful for showing supervisors how the model thinks in general, less useful for individual cases.

Counterfactual explanations. "What would the customer need to change for this decision to flip?" Strong for the adverse action notices banks and insurers have to send when they decline an application.

For LLMs and foundation models the field has not converged. There is no SHAP-equivalent for a 70B-parameter model. What reviewers ask for instead is evidence about the model's behaviour rather than its internals: the model card or system card the provider publishes, performance on the institution's own test set, red team results, refusal rate, hallucination rate on a held-out test set, behavioural consistency on adversarial inputs.

A defensible institutional policy commits to the right technique for each model class. SHAP or LIME for high-risk tabular models that decide customer outcomes. Monotonic constraints and similar design rules where the use case allows them. Behavioural evidence (model cards, test sets, red team outputs, refusal and hallucination rates) for LLM and foundation model use cases. Counterfactual explanations as the default format for adverse action notices.

The policy should also name what current techniques cannot do. A reviewer who expects SHAP to explain a foundation model will get an unhelpful answer. A policy that names the right tool for each model class is one less surprise during validation.

What "independent" means under Principle 3.4

OSFI says model review must be "independent from model development" and that institutions can use "internal reviewers or objective third parties." It does not define independence further. Every institution has to defend its own interpretation to a supervisor.

The defensible interpretations in current practice:

Second line of defence. The model risk function reports independently of model owners and developers. Most large Canadian banks already run this. Cleanest answer and easiest to evidence.

Separate team in the same business unit. A team that did not build the model but sits in the same business. Workable for smaller institutions but requires careful evidence of reporting lines, no shared incentives and review work product that survives external scrutiny.

Internal audit. Audit can do model review work if it has the technical capability and the rotation. Independence is unambiguous; technical depth often is the constraint. Some institutions use audit as a periodic backstop on top of a second-line review.

Objective third parties. External firms (Big 4, model risk specialists). Useful for novel methodologies or to plug capacity gaps. The Letter explicitly allows this. The risk is concentration on a single firm and rotation challenges.

Model developer as reviewer of their own model. Not defensible. Even framed as self-validation it does not meet the "independent from model development" test. Avoid it.

For AI and ML specifically, reviewer capability is the constraint that bites first. A reviewer who validated logistic regression for the last decade and has never built an XGBoost model, let alone evaluated a GenAI assistant, is not capable. Principle 1.1's staffing requirement and Principle 3.4's independence requirement intersect here. The team that reviews AI models has to be technically up to the model under review, not just structurally independent of the developer.

The model inventory and what to put in it

Appendix 1 of the guideline lists the minimum information you have to maintain per model. For every identified model: model ID; model name and description of key features and use; model risk rating; model owner; model developer; model origin (internally developed or vendor).

For models with non-negligible risk (anything that goes on the inventory rather than being triaged off), you also need: model version; date of deployment into production; model reviewer; model approver; model dependencies; data sources and description; approved uses; model limitations (including exceptions); date of most recent model review; monitoring status; next review date.

The Principle 2.1 language on the inventory itself is worth knowing by heart. The inventory should be "comprehensive… maintained at the enterprise level… accurate, evergreen, and subject to robust controls." The word "evergreen" is what a supervisor will press on. A spreadsheet refreshed quarterly will not pass. You need a live system of record with controls, version history and an audit trail.

For AI and ML models, extend the inventory with fields OSFI does not prescribe but that practitioners are converging on: training data lineage, base model and provider, model card or system card link, fine-tuning record, post-deployment evaluation set, red team or jailbreak test outcomes. The Guideline B-10 cross-reference on third-party risk does most of the work here, but only if your model inventory and your vendor inventory share keys.

What an AI model inventory row looks like

The template below shows the fields a single inventory row needs for the GenAI claims triage assistant from the second worked example. Placeholders are illustrative.

Field	What it captures
Model ID	Unique identifier assigned by the inventory system
Name	Short name the model is known by internally
Description	What the model does, what it is used for, what it is not used for
Risk rating	The institution's inherent risk tier (e.g. high), with the drivers listed
Owner	The accountable business leader (e.g. Director, Claims Operations)
Developer	Internal team, or the vendor name
Origin	Internal vs vendor; if vendor, the feeder model behind the GenAI feature
Version	The vendor product version plus the base model version it runs over
Deployment date	When the model first went into production
Reviewer	The independent review function (e.g. second-line MRM)
Approver	The body that approved the model (e.g. Model Risk Committee)
Dependencies	Other platforms, APIs and retrieval indices the model relies on
Data sources	Where the model gets its inputs
Approved uses	What the model is allowed to do, and what it is not
Limitations	Known constraints; for GenAI: that the foundation model is third-party and that vendor updates require change control
Last review	Date of the most recent independent review
Monitoring status	What is being monitored, at what frequency
Next review	When the next independent review is scheduled, based on the risk rating

A mid-sized FRFI will have several hundred rows. A large bank will have several thousand.

Third-party and vendor models, including GenAI from a fourth party

Vendor models are explicitly in scope. Principle 1.2 names them. Principle 2.1 says identification must include "vendor and third-party models." The Letter clarifies that institutions should "comply with third-party risk management principles established under guideline B-10" and that "third-party models receive validation and monitoring commensurate to the model risk."

OSFI did not give institutions a grace period. The Letter: "We have not incorporated a grace period for third-party model risk management in the final guideline. However, institutions may still establish an exceptions policy under which models can be used for limited and specific applications prior to the completion of validation."

The fourth-party question is the more interesting one. Picture a vendor claims platform that embeds GenAI from a separate cloud provider. Is the foundation model in your scope?

OSFI's answer, verbatim:

"We did not make any specific revisions to the final guideline to account for fourth party models. Institutions have a responsibility to ensure that third-party model use is within the institution's risk appetite limit. A third-party model review should include inputs into the model, including feeder models."

The fourth-party model is not separately regulated, but you are responsible for the chain. Your third-party model review for the claims platform has to look at the foundation model behind the GenAI feature. Lack of visibility into the feeder model is the finding the reviewer writes up.

This is where the AI governance work and the third-party risk management work have to merge into a single operating model. If B-10 and E-23 are run by two separate teams that do not talk, the first vendor that adds a "+AI" feature to its product roadmap is going to expose the gap. The Toronto GRC firm ISA Cybersecurity flagged the pattern in a recent practitioner note: vendors silently bolt AI features onto existing platforms, your existing vendor inventory does not capture them, and internal audit is usually the first to ask why an unapproved model is making decisions in production.

Foreign bank and foreign insurance branches

E-23 binds foreign bank branches and foreign insurance branches "to the extent it is consistent with applicable requirements and legal obligations related to their business in Canada as set out in Guideline E-4 on Foreign Entities Operating in Canada on a Branch Basis."

That cross-reference is doing two things. It pulls branches into scope for models that drive their Canadian business. And it lets branches lean on home-jurisdiction model risk frameworks for the Canadian piece, provided those frameworks meet OSFI's expectations or there is a documented mapping.

For a US branch already running under SR 11-7, the gap to E-23 is small but real. SR 11-7 is silent on AI and ML specifically; E-23 is not. SR 11-7 does not impose the Appendix 1 inventory schema. SR 11-7 does not require the multi-disciplinary MRM staffing language Principle 1.1 calls out.

For a European branch running an EU AI Act high-risk programme plus an internal model risk framework, the gap is different. EU AI Act compliance gives you the AI-specific evidence (technical documentation, risk management system, post-market monitoring). E-23 gives you the operating model wrapper.

In both cases, what OSFI examiners will want is a mapping document that shows which home-jurisdiction control answers which E-23 principle, and where the gaps are filled by branch-level work.

Cross-walk: E-23 alongside SR 11-7, EU AI Act, NIST AI RMF and ISO 42001

E-23 does not stand alone. For an FRFI with multi-jurisdiction exposure or an existing AI risk programme, the question is how E-23 lines up against what you already have.

Topic	OSFI E-23 (Canada, 2027)	Fed SR 11-7 (US, 2011)	EU AI Act (EU, 2024-2028)	NIST AI RMF (US voluntary)	ISO/IEC 42001 (international)
Scope	All FRFI models with non-negligible risk, AI and ML included	Bank capital and decision models	High-risk AI systems (Annex III)	All AI systems (voluntary)	AI management system (org-wide)
AI and ML treatment	Woven through all 9 principles	Not addressed directly	The primary subject	The primary subject	The primary subject
Model inventory	Required (Appendix 1)	Implicit (model database)	Required (registration)	Recommended	Required (Annex A controls)
Independent review	Required (Principle 3.4)	Required	Conformity assessment	Recommended (Govern function)	Required (Clause 9)
Bias and fairness	Required (Principle 3.2)	Implicit	Required (Article 10)	Required (Map, Measure)	Required (Annex A)
Explainability	Required (Principles 3.3, 3.4)	Implicit	Required (Article 13)	Required (Manage)	Required (Annex A)
Third-party models	Required (B-10 cross-ref)	Implicit	Required (Article 25)	Recommended	Required (Annex A)
Penalties	Supervisory action	Supervisory action	Up to €35M or 7% turnover	None (voluntary)	Certification withdrawal
Effective date	1 May 2027	In force	2 December 2027 (high-risk Annex III, post-omnibus); GPAI in force since August 2025	In force	In force

The overlaps are substantial. A mature SR 11-7 programme covers a good 70% of E-23 once you add the AI and ML treatment and the explicit inventory schema. A certified ISO 42001 programme covers most of the operating-model wrapper. The EU AI Act gives you AI-specific evidence that maps to several E-23 principles directly.

The cross-walk is the document supervisors want to see if you are running multiple regimes. It also helps internally: it shows which control work is reused and which is genuinely new.

How to think about the timeline

The hard date is 1 May 2027. The guideline was published on 11 September 2025 with an 18-month transition. OSFI extended this from the consulted 12 months at industry's request.

If you are reading this in mid-2026, you have roughly 11 months of usable runway, less the year-end and quarter-end blackouts your institution observes. Backloading this is not realistic.

A workable phasing for a mid-sized FRFI:

Now through Q3 2026: identification and inventory. Run the enterprise-wide model identification exercise. Triage. Stand up the model inventory in a system, not a spreadsheet. Capture the Appendix 1 fields. Add AI and ML fields. Make sure your AI use cases (anything matching the OECD definition) are in scope of the identification. Vendor and third-party models too.

Q4 2026: risk rating and proportionality. Design and approve your risk rating methodology. Apply it to the inventory. Make sure the methodology reflects level of autonomy, data input reliability, customer impact and regulatory risk. Document the rating logic so a reviewer can reproduce it.

Q1 2027: policy, validation and monitoring uplift. Refresh your MRM policies, procedures and controls. Lock in independence between development and review. Run validation cycles on the high-risk and material models that have not been validated under the new framework. Stand up AI and ML monitoring (drift, performance, explainability).

Q2 2027: dress rehearsal. Before 1 May 2027, walk a supervisor through the inventory, the tiering, a sample of validation reports, a third-party model with a feeder model and a high-autonomy AI use case. The findings from that rehearsal are what you fix in the last few weeks.

The traps showing up most often

Two patterns are causing the most problems in advisor commentary.

Proportionality on paper, not in operation. MNP's framing: proportionality is an operating model, not a sentence in a policy. If every model gets the same validation playbook, you are over-investing in low-risk models and under-scrutinising the ones that move the needle.

Shadow AI. Employees use AI tools their employer never approved. Customer service agents paste customer data into a public chatbot. The team building a recommendation engine adds an LLM call for intent classification without involving model risk. If your inventory is built only on what was officially submitted, the shadow models will not appear. Active discovery (network telemetry, vendor disclosures, internal surveys) is the only way to find them.

What to do this quarter

Four things to start this quarter.

Stand up an inventory that can hold AI models. Move off spreadsheets if you are still there. Build for the Appendix 1 fields plus the AI and ML extensions (training data lineage, base model, system card, fine-tuning record, evaluation set, red team results). Make it evergreen with controls and audit history.

Run the model identification exercise enterprise-wide. Including vendor models. Including AI features vendors added to existing platforms. Including shadow uses by your own teams. The exercise should produce a single triaged list that says, for each candidate, whether it is in or out and why.

Refresh your risk rating methodology. Make sure it includes the qualitative factors OSFI named: business use, complexity, level of autonomy, data input reliability, customer impact, regulatory risk. Apply it to the inventory.

Stand up a working group with the right people. OSFI was explicit that an MRM team has to be multi-disciplinary. If your current model risk function is purely quantitative, this is the quarter to add legal, compliance, ethics, data science and IT seats at the table.

Where VerifyWise fits

VerifyWise is an AI governance platform that was built around the same operating model E-23 now codifies. It ships with native coverage for EU AI Act, ISO 42001, ISO 27001 and NIST AI RMF, so the cross-walk you saw above is built in rather than something you have to assemble from spreadsheets.

Mapped to the E-23 obligations in this post:

Model inventory (Appendix 1 + AI extensions). A central registry that captures provider, model, version, approver, status, capabilities, biases, limitations, hosting provider, security-assessment payload and reference links. Built for the Appendix 1 fields and the AI and ML extensions practitioners are converging on.
Model risk rating (Principle 2.2). Risk records per model with category, level, owner, impact, likelihood, key metrics, thresholds and mitigation plans. The rating travels with the model through the lifecycle.
Independent review and approval (Principle 3.4). Multi-step approval workflows with configurable approver requirements per step, approval-request lifecycle tracking and an approval timeline that survives external scrutiny.
Third-party and vendor models (B-10 cross-reference). A vendor domain with data sensitivity, business criticality, regulatory exposure and a vendor risk score, plus vendor-specific risk records linked to the model that depends on them.
Monitoring and decommission (Principle 3.6). Post-market monitoring with configurable frequency per use case, periodic questionnaires, automated reminders, escalation and PDF reports for the audit trail OSFI will want to see.
Evidence trail. A central evidence hub for the documentation that backs every entry on the inventory, linked to models and controls, with file access logging, expiry tracking and change history.

If your FRFI is heading into the 18-month transition window with a spreadsheet, this is the gap the platform closes.

Book a demo to see how each of the features above maps to a specific E-23 principle.
Read our AI Governance Salary Report if you are sizing the team you need to run this.
Read about EU AI Act high-risk obligations if your FRFI has European exposure.

Coming soon: a downloadable OSFI E-23 readiness checklist mapped to all nine principles, with a model inventory template and a risk rating worksheet. Subscribe to be notified when it goes live.

Sources: OSFI's Guideline E-23 – Model Risk Management (2027), the accompanying Letter, the Backgrounder, and practitioner commentary from MNP, ISA Cybersecurity and the Big 4 Canadian advisory practices.

Found this article helpful? Share it with your network.

About the VerifyWise team

VerifyWise builds source-available AI governance software used by organizations to manage risk, compliance, and oversight across their AI portfolios. Our editorial team draws on hands-on experience implementing governance workflows for regulated industries and fast-scaling AI teams.