Microsoft's Trust-and-Verify DLP Model for Copilot Has No Equivalent Check for Agent Actions, May 2026

Microsoft's DLP model verifies labeling accuracy. It has no equivalent check for whether an agent's action on correctly labeled data was authorized.

Microsoft Digital's Copilot governance guide, published May 7, 2026 and updated June 8, 2026, describes a trust-and-verify model for employee data handling: employees apply sensitivity labels, and Purview DLP automatically checks that work through auto-labeling, quarantining, and escalation to content owners, legal, and security teams. The guide states this model catches roughly one percent of cases where labeling goes wrong. The verification described applies to whether data is correctly labeled and accessible, not to actions an AI agent takes using that data.

GOVERNANCE IMPLICATION

DLP verification answers whether data is in the right place with the right label. It does not answer whether an autonomous agent's specific write, transaction, or recommendation using that data was authorized by a named accountable owner. An organization can pass every DLP check the guide describes while an agent takes an unauthorized action on perfectly labeled, perfectly permitted data, because the verification layer was built to catch mislabeling, not to evaluate business authorization.

SCENARIO

A logistics firm's Purview DLP environment is fully configured per Microsoft's guide and catches a credentials leak within minutes during a quarterly review. The same review asks whether a procurement agent's contract-amendment recommendation, generated using correctly labeled vendor data three weeks earlier, was authorized before being sent to a vendor. DLP has no record bearing on that question, because the recommendation involved no mislabeled or improperly accessed data.

THE GOVERNANCE QUESTION

Microsoft verifies employee labeling decisions through DLP. What verifies that an agent's specific action using correctly labeled data was a business decision someone authorized?

CONTROL GAP

Purview DLP verification, as described, evaluates label accuracy and data exposure. No equivalent automated verification layer exists in the guide for whether an agent's action on properly labeled data was authorized.

REGULATORY RELEVANCE

NIST Ai RMF

SEC Cyber

PRIMARY SOURCE

How we're tackling Microsoft 365 Copilot governance internally at Microsoft

Alex Fleck

May 7, 2026

Read the primary source â†’

Read the next intelligence note.

Back to Agent Security

JUNE 9, 2026

Agent Security

Anthropic Launches Claude Fable 5 with Runtime Fallback Safeguards and Mandatory 30-Day Data Retention, June 2026

Anthropic launched Claude Fable 5 and Claude Mythos 5 on June 9, 2026. Fable 5 is the first Mythos-class model released for general use. It includes safety classifiers that intercept queries in cybersecurity, biology and chemistry, and distillation categories, routing those queries to Claude Opus 4.8 instead. Anthropic reports the fallback occurs in fewer than 5% of sessions. The launch introduces a mandatory 30-day data retention requirement for all Fable 5 and Mythos 5 traffic on first- and third-party surfaces. Anthropic states the retained data will not be used for model training and will be deleted after 30 days in most cases.

Read note Ã¢â€ â€™

JUNE 4, 2026

Agent Security

Microsoft AI Red Team v2.0 Taxonomy Finds Human-in-the-Loop Bypass the Most Exploited Agent Failure Mode, June 2026

On June 4, 2026, the Microsoft AI Red Team published v2.0 of its Taxonomy of Failure Modes in Agentic AI Systems on the Microsoft Security Blog, grounded in twelve months of red team engagements against deployed agentic systems. The update adds seven new failure mode categories including agentic supply chain compromise, goal hijacking, inter-agent trust escalation, computer-use agent visual attacks, session context contamination, MCP and plugin abuse, and capability disclosure. The most consistently exploited failure mode observed was human-in-the-loop bypass, achieved through consent fatigue, probabilistic invocation manipulation, and incremental escalation, with several engagements demonstrating zero-click end-to-end attack chains.

Read note Ã¢â€ â€™

MAY 18, 2026

Agent Security

NIST Publishes Summary Analysis of RFI Responses on AI Agent Security (TRAI 800-5), May 2026

On May 18, 2026, NIST published 'Summary Analysis of Responses to the Request for Information Regarding Security Considerations for AI Agents' (NIST Trustworthy and Responsible AI, report 800-5, authored by Riggs, Hamin, Perry, Edelman, and Cihon). The report summarizes stakeholder responses to the CAISI request for information (docket NIST-2025-0035). Commenters broadly agreed that AI agents present novel security threats that act as a barrier to adoption, and that while core cybersecurity principles still apply, they require adaptation for agents. Respondents identified roles for government including implementation guidance, information-sharing, and standards.

Read note Ã¢â€ â€™

â† Back to all intelligence notes