Formal Verification with Agents

February 2026

Formal Verification and Property-Based Testing with AI Agents

References:

AI and the Future of Formal Verification (Martin Kleppmann, December 2025)
Property-Based Testing (Anthropic Red, 2026)
Property-Based Testing in Practice (Kiro)

The Core Thesis

Martin Kleppmann's argument is simple but profound:

LLMs are bad at formal verification, but they're excellent at writing specifications that humans can verify.

The economics of formal verification have always been brutal:

seL4: ~23 lines of proof per line of code
CompCert: ~10+ years of effort for a verified C compiler
Most teams can't afford this

But LLMs change the equation in a different way. They can:

Generate specifications - Natural language or semi-formal descriptions of intended behavior
Suggest invariants - Properties that should always hold
Write property tests - Executable checks that verify behavior
Translate between levels - Informal to formal, code to spec, spec to test

Property-Based Testing: The Practical Middle Ground

Formal verification proves correctness. Property-based testing (PBT) finds incorrectness.

How PBT Works

// Example: fast-check property test
fc.assert(
  fc.property(
    fc.string(),                    // Any string as input
    (s) => {
      const encoded = base64Encode(s);
      const decoded = base64Decode(encoded);
      return decoded === s;         // Round-trip invariant
    }
  )
)

The framework generates hundreds or thousands of random inputs automatically. You define properties (invariants), not specific test cases.

Common Property Shapes

Property Type	Description	Example
Round-trip	Encode/decode returns original	JSON.parse(JSON.stringify(x))
Idempotence	f(f(x)) == f(x)	Math.abs(Math.abs(x))
Commutativity	f(a, b) == f(b, a)	a + b
Associativity	f(f(a, b), c) == f(a, f(b, c))	(a + b) + c
Identity	f(x, identity) == x	x + 0
Invariance	Property P holds before and after	List length after sort equals before
No exceptions	Never crashes on valid input	Parser handles any input

Agent Patterns for Verification

Pattern 1: Specification Generation

Agent Task: Given this code, generate a specification

Input: Source code
Output: Natural-language specification of behavior

Use when: Code exists but documentation is missing

Pattern 2: Invariant Discovery

Agent Task: Identify invariants that should hold for this system

Input: Code + specification
Output: List of properties that should always be true

Use when: You need to understand what to test

Pattern 3: Property Test Generation

Agent Task: Generate property-based tests from this specification

Input: Specification
Output: Executable property tests (fast-check, Hypothesis, etc.)

Use when: You have a spec but no tests

The "Vericoding" Workflow

Traditional development: Write code, write tests, hope it works.

Vericoding: Write spec, generate properties, generate tests, write code, verify.

Intention (What should this do?)
    |
    v
Spec (Formal or semi-formal description)
    |
    v
Properties (Invariants that should hold)
    |
    v
Tests (PBT or example-based)
    |
    v
Code (Implementation)
    |
    v
Verify (Run tests, check properties)

Tools and Frameworks

Property-Based Testing

Language	Framework	Link
JavaScript/TypeScript	fast-check	github.com/dubzzz/fast-check
Python	Hypothesis	hypothesis.works
Rust	proptest	github.com/AltSysrq/proptest
Go	gopter	github.com/leanovate/gopter
Java	jqwik	jqwik.net

Practical Recommendations

Start with PBT, not formal verification - The ROI is much higher
Use agents for spec generation, not proof - They're better at writing than proving
Human review of specs is mandatory - Agents can produce plausible-but-wrong specifications
Properties over examples - One well-chosen property is worth 100 example tests
Round-trip tests are your friend - They catch an enormous class of bugs

The Ceiling

Current AI agents cannot:

Reliably prove mathematical theorems
Verify concurrent systems without human guidance
Replace formal methods for safety-critical systems

But they can:

Generate specifications that humans can verify
Write property tests that catch real bugs
Explore state spaces faster than manual testing
Act as semantic oracles for non-deterministic output

The sweet spot is human + agent + property testing. Not agent alone. Not formal verification alone. The combination.