Context Is a Budget

April 2026

Context Is a Budget

Agent operators talk about context like it is free oxygen. It is not. It is closer to cash, battery, and attention packed into one constraint.

Every extra token costs something: money, latency, retrieval noise, weaker focus, and more opportunities for stale information to masquerade as truth.

Position

Context should be spent where it changes the next decision. Everything else is clutter wearing a lanyard.

What Context Actually Buys

Useful context does one of four jobs:

exposes the current objective
states the active constraints
shows the artifact under change
defines how success will be judged

That is the whole game. If a block of text does not help with one of those four jobs, it is probably there because nobody wanted to decide what mattered.

The Common Failure

Most agent setups die from one of two bad instincts.

Instinct one: dump in everything "just in case."

Instinct two: keep adding summaries of summaries until the system cannot tell current truth from yesterday's interpretation of it.

The result looks thorough and feels intelligent. It is neither.

Wide context often makes agents more obedient to the wrong thing. They follow whatever is easiest to quote, not whatever is most relevant to the live decision.

Why More Context Does Not Mean More Truth

Truth in an agent workflow is not the total volume of available text. It is the subset that still governs the present task.

A stale note, an old roadmap, an outdated README section, and a one-line user correction do not have equal authority. But once they enter the same prompt soup, they start competing as peers.

This is the real problem with context inflation. It flattens authority.

The model sees everything. The operator forgets that not everything should count equally.

Authority Matters More Than Recall

Useful context is hierarchical.

highest authority - current spec, explicit user correction, failing test, live interface contract
middle authority - recent decision log, current implementation, current task brief
low authority - brainstorming notes, old summaries, generic project lore

Most teams have this hierarchy implicitly. Few encode it explicitly. Then they wonder why the agent keeps honoring obsolete notes like a dutiful little historian.

The Four Good Uses of Context

1. State the Objective

What is happening now? Not what the project is "about." Not the company mission. What is the current move?

task: add retry backoff to outbound webhook delivery
done_when:
  - retries use exponential backoff
  - max attempt count is configurable
  - existing webhook tests pass
  - failure metrics remain emitted

2. State the Constraints

Good constraints save tokens because they eliminate branches of thought before the model wanders into them.

constraints:
  - do not change database schema
  - preserve public API response shape
  - prefer existing queue abstractions
  - no new dependencies

3. Show the Work Surface

The most valuable context is usually the artifact under change: the file, interface, failing test, log line, schema, or diff. Not a sermon about the repository.

4. Define Judgment

Agents behave better when the evaluation surface is present. See Reward Engineering for Coding Agents. If success is visible, less narration is needed.

Bad Context Smells

long codebase overviews before a narrow task
duplicate summaries at multiple levels of abstraction
persona fluff that outranks operational facts
old decisions with no validity window
notes that explain what the code already says clearly
entire browser snapshots shoved into every turn

If context starts looking like archival preservation, the workflow has already gone off the rails.

Memory Is Not a Junk Drawer

Persistent memory gets sold as the answer to drift. Sometimes it is. Often it just preserves mistakes at scale.

Memory only helps when it stores things that should survive task boundaries:

stable project facts
decisions that remain binding
open questions that still block progress
operator preferences that genuinely affect output

Everything else should decay.

That sounds harsh. It is correct. A memory system without forgetting is just an optimism-powered landfill.

Compression Beats Accumulation

Strong operators compress state after each meaningful step.

Weak operators accumulate every intermediate thought because deleting anything feels risky.

Compression is not summarization theater. It means writing the smallest artifact that preserves the next valid move.

current_state:
  objective: ship webhook backoff without API changes
  decisions:
    - use existing retry queue
    - default max attempts stays 5
  unresolved:
    - metric name for terminal failure event
  next_step:
    - patch dispatcher and update tests

That is enough. The rest can be reconstructed from code and git history if needed.

Prompt Caching Does Not Solve Epistemology

Cheap cached context is still context. Lower price does not turn fluff into signal.

Caching is useful when the same high-value substrate must stay present across turns. It is not a moral license to keep every paragraph ever written in the loop forever.

Heartbeats and Checkpoints

Long-running agent work benefits from tiny recurring checks. But the heartbeat should ask whether reality changed, not restate the entire universe.

A good heartbeat pattern is boring:

if nothing_changed:
  return HEARTBEAT_OK

if state_changed:
  return {
    changed_fact,
    impact_on_task,
    required_next_action
  }

The point is to surface deltas. If every heartbeat rehydrates the full workflow state, the system is paying rent on its own indecision.

The Better Rule

Pass full context only at real boundaries:

new task class
new artifact surface
spec revision
handoff between roles

Inside a narrow loop, context should shrink, not grow.

Practical Consequences

Prefer references to artifacts over pasting entire artifacts.
Store state in versioned files, not only in conversation history.
Expire summaries when the underlying truth changes.
Make authority explicit: spec beats note, test beats plan, operator correction beats both.
Measure context by decision value, not token count alone.

Context is a spending decision. Spend it on the facts that change the next move. Cut the rest before it starts pretending to be knowledge.

StayFresh