StayFresh

Static archive of workflow research and patterns

February 2026

AGENTS.md Effectiveness: What the Research Says

Reference: Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? (Gloaguen et al., ETH Zurich / LogicStar.ai, February 2026)

The Surprising Finding

Context files like AGENTS.md and CLAUDE.md are widely recommended. Over 60,000 repositories include them. But rigorous evaluation reveals:

Context TypeSuccess Rate ChangeCost Change
NoneBaselineBaseline
LLM-generated-3%+20%
Developer-written+4%+19%

LLM-generated context files make agents worse and more expensive.

Why Context Files Underperform

1. Redundant Documentation

When researchers removed all existing documentation (READMEs, docs folders), LLM-generated context files suddenly became useful (+2.7% improvement). This suggests:

Context files are mostly redundant with what's already in the repository.

2. No Effective Overview

One recommended use of context files is providing a codebase overview. But agents with context files don't find relevant files faster—they often take more steps because they:

  1. Issue multiple commands to find the context file
  2. Read it multiple times despite it being in context
  3. Explore more broadly without better targeting

3. Unnecessary Requirements Make Tasks Harder

Context files add instructions. Agents follow them. But additional requirements—even well-intentioned ones—increase cognitive load and reasoning tokens (14-22% more reasoning with context files).

More instructions does not equal better outcomes.

What Context Files Do Well

Agents Follow Instructions

If a tool is mentioned in the context file, agents use it:

This isn't an instruction-following problem. Agents comply—they're just not being helped by what they're told.

More Exploration, More Testing

Context files increase:

This is the "thoroughness" that drives up costs without improving outcomes.

Practical Recommendations

When to Skip AGENTS.md

When AGENTS.md Helps

What to Include (If You Write One)

Based on the research, context files should contain only minimal requirements:

# Build & Test
- Run tests: `pytest tests/`
- Lint: `ruff check .`

# Conventions
- Use `uv` for dependency management
- Follow existing module patterns

Not:

The "Surprising Behavior" Pattern

When agents encounter something unexpected, that's signal—not noise.

When agents fail, fix the code, not the prompt. Surprising behavior reveals architectural friction.

Instead of adding more instructions to AGENTS.md, consider:

  1. Is the codebase structure confusing? Rename, reorganize, add comments
  2. Are conventions unclear? Add type hints, improve names, add docstrings
  3. Is the task underspecified? Improve the issue description, not the context file

Agent Psychology: The Step-3 Trick

Counterintuitive but effective: if an agent struggles with step 2, tell it to do step 3. The agent often completes step 2 in the process.

This works because:

Token Economics

Context files consume tokens in every request. For a 600-word AGENTS.md:

Question: Is that token budget better spent on task-specific context (the actual code being modified) or on generic repository context?

The research suggests: task-specific context wins.

Further Reading