The AI Model Started Failing in November. The Organization Found Out in February.

On July 10, 2025, Massachusetts Attorney General Andrea Joy Campbell announced a $2.5 million settlement with Earnest Operations LLC over AI underwriting practices the state alleged had run for years without adequate fair lending testing and oversight controls. The settlement required Earnest to build a written governance system of fair lending testing, internal controls, and risk assessments. The recurring pattern across regulated AI deployments is an organization whose governance structure can detect, document, route, and escalate a failing model, yet has no formal process that converts that detection into a decision within a defined window. Detection without a pre-assigned owner, a defined trigger threshold, pre-delegated authority to pause, and a bounded decision window produces information rather than governance.

Fraud systems score payments as they move. An ACH transfer gets its score as the bank assembles the batch file, before the money leaves. Whether that transfer was fraud, the system finds out later: when the customer calls, when the dispute arrives, when the case closes. The score is made in seconds. The truth takes weeks.

Every month, the risk analytics team closed that gap with a single number. They matched the previous thirty days of cleared payments against confirmed fraud cases whose labels had matured through disputes and case review. The result was the sensitivity rate. At deployment fourteen months earlier, the team had set the threshold to catch 97 of every 100 confirmed fraud cases before release. The 3 that cleared were documented. Accepted.

On the first Tuesday of February, the reconciliation ran.

The sensitivity rate was 71.

Twenty-nine of every 100 confirmed fraud cases in January had cleared undetected. Not flagged. Not held. Released.

The data scientist sent the email at 8:47. Subject line: "URGENT -- January sensitivity 71, down from 94 in December. This needs a decision today."

At 8:49, the product owner's phone lit up on the conference table.

She turned it face down.

The system had been failing since November 4th. That was the day the model had been retrained under an emergency change procedure after a production issue forced a rushed rebuild. The retrain touched the shared account-takeover feature layer used across retail ACH transfers. A required validation step had not been completed. An exception had been approved: the model could go live with compensating controls in place. The exception required enhanced monitoring thresholds, a manual-review overlay for high-risk transfers, and a fourteen-day automatic rollback trigger.

The compensating controls expired on November 18th.

Nobody renewed them.

For seventy-seven of the ninety-one days between redeployment and the first Tuesday of February, the model ran without the controls that had justified the exception.

The December reconciliation had shown 94. The team read it as a reassuring number. What it concealed was a measurement problem. A fraud case takes 45 to 60 days to mature through disputes into a confirmed label. The mule-account pattern that had become the dominant fraud vector was so new in December that most of those cases had not resolved yet. The denominator of the sensitivity calculation, confirmed fraud, was running behind the actual fraud. The metric was technically correct. It was measuring the wrong population.

The January reconciliation showed 88. Down six from December. The team applied the same reasoning: December cases still resolving, labels still maturing. The explanation fit the data it could see. It was the same reasoning that had made 94 feel acceptable a month earlier.

The explanation was not wrong. It was incomplete.

By February, both conditions arrived simultaneously for the first time. The mule-account pattern had hit full volume across January. Enough cases had now matured into confirmed labels to fill the denominator. The metric could finally see what had been accumulating since November 4th.

It saw 71.

Maria woke at 6:44 on a Thursday morning in late January and reached for her phone before her feet touched the floor.

Four unauthorized external transfers had been sent over ACH from her checking account overnight to a newly added external recipient. The first posted at 3:14. The second at 3:29. The third at 3:43. The fourth at 3:55.

Her checking account. The one her paycheck went into every other Friday. Rent was in four days.

Each transfer had been scored before release into the overnight ACH batch. The first looked ordinary. Amount within range, established account history. The payment workflow knew the recipient was new, but the expired compensating-control overlay was the mechanism that would have forced a hold. The model feature store had not reflected the new recipient event added three hours earlier, so the score stayed low. The second transfer should not have looked ordinary. A second large transfer to the same new external recipient within fifteen minutes should have moved the velocity score. By the fourth, the only way the score stayed low was if the model could not see its own last forty minutes.

It could not. The shared feature layer had not refreshed between scoring events. Each transfer had been scored as if the previous one had not happened.

She called the bank at 6:51. Thirty-three minutes on hold. When someone answered she verified her identity, her account, her recent activity. The representative reviewed the four transfers and was quiet for four seconds. He told her a case had been opened. Because this was a consumer EFT dispute, provisional credit would be issued within ten business days pending investigation.

Ten business days.

Rent was in four.

She asked if there was anything he could do about the timing.

He said he understood and that he was sorry.

At 12:09 the product owner read the URGENT email, opened the status update template, added a line under Model Performance – Monitor (informational), and moved to the next message. The monitoring system had automatically opened a governance ticket and routed it to her. The ticket gave her accountability to report the breach. It did not give her authority to stop the model. That authority sat with the model risk committee.

The model risk committee met on Thursdays.

Patricia filed her dispute on January 22nd. Two unauthorized external transfers sent over ACH from her savings account to a recipient she had never added. The first posted at 11:43 in the evening. The second thirteen minutes later. One transfer had been provisionally credited. The second remained under review.

She had a retirement planning appointment the following Monday. Her advisor looked at her account balances without speaking for longer than she expected. Then he told her carefully that depending on the outcome of the second dispute and the timeline of the provisional credit, she might need to think about adjusting her April retirement date.

She had been planning April for eleven years.

She drove home in the rain and sat in her car outside her house before she went inside.

The analytics team completed the root cause analysis the following Wednesday. The analysis identified a combination of contributing factors. The November training dataset had contained almost no examples of the mule-account pattern that had become the dominant fraud vector in December. The model had not been wrong on the data it had seen. It had been blind to the fraud it was now facing. And because the compensating controls had expired on November 18th, there had been nothing watching for that blindness for seventy-seven of the ninety-one days between redeployment and the first Tuesday of February.

The model risk committee reviewed the root cause analysis the following Thursday. They agreed to open emergency change tickets for a new-beneficiary hold on retail ACH transfers above a defined threshold and step-up verification for newly added external recipients. Neither control was live yet.

The system was still running as it had been running since November 4th.

The February reconciliation would run on the first Tuesday of March.

The platform knew Maria as a case number, a fraud label, and four transaction IDs. It did not know what thirty-three minutes on hold had done to a person whose rent was in four days. It knew Patricia as two disputed transfers, one confirmed and one pending. It did not know about the eleven years or the drive home in the rain or the April that was no longer certain.

It scored the transactions. It had been doing so since November 4th, inside a governance structure that had documented every point of the decline and had left the system behavior unchanged.

This is not a story about a fraud system that failed. Every model drifts. Every training dataset has gaps. Every emergency deployment carries risk. The story is about what happened after the URGENT email arrived at 8:47 and before the model risk committee met on Thursday.

Nine editions of this newsletter have documented what happens when AI systems run without the accountability layer beneath them. A revenue miss forced the comparison at Upstart. A court letter arrived at Workday four years after the machine had already decided. In each case the governance structure was present. Meetings. Dashboards. Charters. Tickets. Named committees with defined ownership. What it did not have was a formal process that converted a signal into a decision before something outside the organization converted it instead.

The URGENT email arrived. The governance ticket was opened and routed. The product owner had accountability to report it. She had no authority to stop it. The authority sat with a committee that met weekly. Between 8:47 on a Tuesday and the following Thursday, the system processed every transaction it was going to process. Maria called on Friday morning. Patricia filed on Saturday.

Most governance structures are optimized to document and escalate problems. Far fewer are designed to force a decision while the problem is still unfolding.

Every month the system ran after November 4th without the controls that justified its deployment was a month of compounding Governance Debt. Not the debt of missing documentation. The debt of a detection signal that entered a governance process designed to receive it, route it, escalate it, and discuss it - and produce no change in system behavior. The URGENT email was the signal. The model risk committee meeting was the governance. The gap between those two things was the debt.

On July 10, 2025, the Massachusetts Attorney General settled with Earnest Operations for $2.5 million over AI underwriting governance failures. Attorney General Andrea Joy Campbell stated publicly that Earnest's models had operated for years without the testing and oversight controls the state required them to build.

The state did not find a failing model. It found an organization that had no formal process for what to do when the model showed signs of failing itself.

The state of Massachusetts wrote that process for them. That writing was the enforcement action.

The Disposition Protocol is the document that closes the gap between detection and decision. Six requirements, written before deployment, attached to every AI system in production.

One person owns the signal. A name and a role, in the authorization record before the system goes live. When the monitoring fires, this person receives it and is formally required to act. Not a team. Not a shared inbox. One person. This is the Accuracy Owner.

A number starts the obligation. A specific threshold - an accuracy rate, a sensitivity rate, a reversal rate - at which the Accuracy Owner stops managing a concern and begins a formal review. Written before deployment. If it does not exist, the monitoring produces information. It does not produce governance.

One person can stop the system within predefined guardrails. Named before deployment. Different from the Accuracy Owner. The person with formal, pre-delegated authority to pause production within defined scope while the review runs. If finding that person currently requires a committee meeting, the problem is not the alert threshold.

Re-authorization requires proof. Before the system resumes, specific things must be demonstrated and documented. What those things are is decided before deployment, not assembled under pressure while the business case to resume builds louder by the hour. The same governance gap exists when controls overcorrect: when legitimate transactions are blocked at scale and no named person owns the decision to adjust.

The notification clock starts at trigger. When a threshold is crossed, specific people are notified within a specific window. The legal team. The board risk committee. The regulatory contact. A required notification with a documented timestamp. Not a retrospective slide in the next quarterly review.

A decision window is defined at trigger. Once a threshold is crossed, a formal decision must be reached and documented within a defined window: hold, resume, or escalate. Not a committee meeting scheduled for next Thursday. A named deadline, logged at the time of trigger.

Agent 365, generally available since May 1, 2026, gives your organization a control plane to observe every AI agent running across your Microsoft environment. Microsoft Purview maps what those agents are accessing in real time. In a system like the one described here, those tools would have surfaced the signal earlier and with greater precision. The detection layer is no longer the gap. The Disposition Protocol is what converts that detection into a decision someone is formally required to make.

Without it, the URGENT email becomes a status update. The status update becomes a Thursday agenda item. The agenda item becomes an emergency change ticket. The emergency change ticket sits open while the system runs.

Maria's rent was in four days. Patricia's April took eleven years to plan.

The February reconciliation would run on a Tuesday.

The full Disposition Protocol framework, which defines all six requirements including the Accuracy Owner role, is here. The Organizational Agent Controls, which define the pre-deployment governance decisions the Disposition Protocol extends, are here.

For every AI system your organization has running right now, who is the named person formally required to act when the monitoring fires, and what does your governance structure require them to decide before the February reconciliation runs?