Microsoft Launches ASSERT: A New Tool to Test and Tame AI Behavior

Microsoft has announced ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing), a new open-source tool designed to automatically verify whether AI systems are behaving according to specific business requirements and safety guidelines.
The standout feature? No coding is required to create tests. You simply feed the tool a plain-text description of your rules.
How Does It Work?
The testing process is fully automated and broken down into three main steps:
- Requirement Analysis: You write instructions in plain English (e.g., “do not email external addresses” or “share financial data only with C-level executives”). ASSERT translates this text into explicit lists of dos and don’ts.
- Scenario Generation: The system automatically creates tricky test cases to see if the AI can be provoked into making a mistake.
- Audit and Logging: The tool runs the tests and logs every single step of the AI’s decision-making process-including intermediate actions and external tool calls-making it easy to pinpoint exactly where things went wrong.
Why it matters: Standard benchmarks evaluate an AI’s general knowledge but completely miss unique corporate policies. As Sarah Bird, Chief Product Officer of Responsible AI at Microsoft, pointed out, without a clear understanding of an AI’s specific behavior, building a secure and compliant commercial product is virtually impossible.
The ASSERT framework can be integrated during the initial development phase, post-deployment, or used for continuous monitoring of live AI agents. The project is already open-source and available to the community.