StayFresh

Static archive of workflow research and patterns

February 2026

Formal Verification and Property-Based Testing with AI Agents

References:

The Core Thesis

Martin Kleppmann's argument is simple but profound:

LLMs are bad at formal verification, but they're excellent at writing specifications that humans can verify.

The economics of formal verification have always been brutal:

But LLMs change the equation in a different way. They can:

  1. Generate specifications - Natural language or semi-formal descriptions of intended behavior
  2. Suggest invariants - Properties that should always hold
  3. Write property tests - Executable checks that verify behavior
  4. Translate between levels - Informal to formal, code to spec, spec to test

Property-Based Testing: The Practical Middle Ground

Formal verification proves correctness. Property-based testing (PBT) finds incorrectness.

How PBT Works

// Example: fast-check property test
fc.assert(
  fc.property(
    fc.string(),                    // Any string as input
    (s) => {
      const encoded = base64Encode(s);
      const decoded = base64Decode(encoded);
      return decoded === s;         // Round-trip invariant
    }
  )
)

The framework generates hundreds or thousands of random inputs automatically. You define properties (invariants), not specific test cases.

Common Property Shapes

Property TypeDescriptionExample
Round-tripEncode/decode returns originalJSON.parse(JSON.stringify(x))
Idempotencef(f(x)) == f(x)Math.abs(Math.abs(x))
Commutativityf(a, b) == f(b, a)a + b
Associativityf(f(a, b), c) == f(a, f(b, c))(a + b) + c
Identityf(x, identity) == xx + 0
InvarianceProperty P holds before and afterList length after sort equals before
No exceptionsNever crashes on valid inputParser handles any input

Agent Patterns for Verification

Pattern 1: Specification Generation

Agent Task: Given this code, generate a specification

Input: Source code
Output: Natural-language specification of behavior

Use when: Code exists but documentation is missing

Pattern 2: Invariant Discovery

Agent Task: Identify invariants that should hold for this system

Input: Code + specification
Output: List of properties that should always be true

Use when: You need to understand what to test

Pattern 3: Property Test Generation

Agent Task: Generate property-based tests from this specification

Input: Specification
Output: Executable property tests (fast-check, Hypothesis, etc.)

Use when: You have a spec but no tests

The "Vericoding" Workflow

Traditional development: Write code, write tests, hope it works.

Vericoding: Write spec, generate properties, generate tests, write code, verify.

Intention (What should this do?)
    |
    v
Spec (Formal or semi-formal description)
    |
    v
Properties (Invariants that should hold)
    |
    v
Tests (PBT or example-based)
    |
    v
Code (Implementation)
    |
    v
Verify (Run tests, check properties)

Tools and Frameworks

Property-Based Testing

LanguageFrameworkLink
JavaScript/TypeScriptfast-checkgithub.com/dubzzz/fast-check
PythonHypothesishypothesis.works
Rustproptestgithub.com/AltSysrq/proptest
Gogoptergithub.com/leanovate/gopter
Javajqwikjqwik.net

Practical Recommendations

  1. Start with PBT, not formal verification - The ROI is much higher
  2. Use agents for spec generation, not proof - They're better at writing than proving
  3. Human review of specs is mandatory - Agents can produce plausible-but-wrong specifications
  4. Properties over examples - One well-chosen property is worth 100 example tests
  5. Round-trip tests are your friend - They catch an enormous class of bugs

The Ceiling

Current AI agents cannot:

But they can:

The sweet spot is human + agent + property testing. Not agent alone. Not formal verification alone. The combination.