February 2026
Formal Verification and Property-Based Testing with AI Agents
References:
- AI and the Future of Formal Verification (Martin Kleppmann, December 2025)
- Property-Based Testing (Anthropic Red, 2026)
- Property-Based Testing in Practice (Kiro)
The Core Thesis
Martin Kleppmann's argument is simple but profound:
LLMs are bad at formal verification, but they're excellent at writing specifications that humans can verify.
The economics of formal verification have always been brutal:
- seL4: ~23 lines of proof per line of code
- CompCert: ~10+ years of effort for a verified C compiler
- Most teams can't afford this
But LLMs change the equation in a different way. They can:
- Generate specifications - Natural language or semi-formal descriptions of intended behavior
- Suggest invariants - Properties that should always hold
- Write property tests - Executable checks that verify behavior
- Translate between levels - Informal to formal, code to spec, spec to test
Property-Based Testing: The Practical Middle Ground
Formal verification proves correctness. Property-based testing (PBT) finds incorrectness.
How PBT Works
// Example: fast-check property test
fc.assert(
fc.property(
fc.string(), // Any string as input
(s) => {
const encoded = base64Encode(s);
const decoded = base64Decode(encoded);
return decoded === s; // Round-trip invariant
}
)
)
The framework generates hundreds or thousands of random inputs automatically. You define properties (invariants), not specific test cases.
Common Property Shapes
| Property Type | Description | Example |
|---|---|---|
| Round-trip | Encode/decode returns original | JSON.parse(JSON.stringify(x)) |
| Idempotence | f(f(x)) == f(x) | Math.abs(Math.abs(x)) |
| Commutativity | f(a, b) == f(b, a) | a + b |
| Associativity | f(f(a, b), c) == f(a, f(b, c)) | (a + b) + c |
| Identity | f(x, identity) == x | x + 0 |
| Invariance | Property P holds before and after | List length after sort equals before |
| No exceptions | Never crashes on valid input | Parser handles any input |
Agent Patterns for Verification
Pattern 1: Specification Generation
Agent Task: Given this code, generate a specification
Input: Source code
Output: Natural-language specification of behavior
Use when: Code exists but documentation is missing
Pattern 2: Invariant Discovery
Agent Task: Identify invariants that should hold for this system
Input: Code + specification
Output: List of properties that should always be true
Use when: You need to understand what to test
Pattern 3: Property Test Generation
Agent Task: Generate property-based tests from this specification
Input: Specification
Output: Executable property tests (fast-check, Hypothesis, etc.)
Use when: You have a spec but no tests
The "Vericoding" Workflow
Traditional development: Write code, write tests, hope it works.
Vericoding: Write spec, generate properties, generate tests, write code, verify.
Intention (What should this do?)
|
v
Spec (Formal or semi-formal description)
|
v
Properties (Invariants that should hold)
|
v
Tests (PBT or example-based)
|
v
Code (Implementation)
|
v
Verify (Run tests, check properties)
Tools and Frameworks
Property-Based Testing
| Language | Framework | Link |
|---|---|---|
| JavaScript/TypeScript | fast-check | github.com/dubzzz/fast-check |
| Python | Hypothesis | hypothesis.works |
| Rust | proptest | github.com/AltSysrq/proptest |
| Go | gopter | github.com/leanovate/gopter |
| Java | jqwik | jqwik.net |
Practical Recommendations
- Start with PBT, not formal verification - The ROI is much higher
- Use agents for spec generation, not proof - They're better at writing than proving
- Human review of specs is mandatory - Agents can produce plausible-but-wrong specifications
- Properties over examples - One well-chosen property is worth 100 example tests
- Round-trip tests are your friend - They catch an enormous class of bugs
The Ceiling
Current AI agents cannot:
- Reliably prove mathematical theorems
- Verify concurrent systems without human guidance
- Replace formal methods for safety-critical systems
But they can:
- Generate specifications that humans can verify
- Write property tests that catch real bugs
- Explore state spaces faster than manual testing
- Act as semantic oracles for non-deterministic output
The sweet spot is human + agent + property testing. Not agent alone. Not formal verification alone. The combination.