StayFresh Thesis

Revised April 2026

StayFresh Thesis

This site is not a feed. It is a position.

The position is that AI-assisted development improves when evaluation is explicit, architecture is legible, and empirical notes outrank polished claims.

Position

Prompting shapes language. Evaluation shapes incentives. Architecture determines whether the result survives contact with reality.

What This Site Believes

1. Prompting Is a Constrained Pipeline, Not Magic Language

Prompts don't float free. They move through tokenization, token budgets, and control structures. Understanding what prompts actually are (the mechanism, not the incantation) changes everything about how to use them.

See What is Prompting: Operational Constraints.

2. Reward Surfaces Matter More Than Prompt Tricks

Prompting can steer tone, ordering, and local behavior. It does not reliably determine what gets optimized under pressure.

The control surface is the rubric. See Reward Engineering for Coding Agents.

3. Narrow Metrics Produce Fake Progress

Single numbers are easy to optimize and easy to game.

Useful evaluation is multi-axis and explicit about what counts as failure. See Reward Hacking in Coding Agents.

4. Machine-Readable Criteria Beat Vibe-Based Review

Agents behave more predictably when success criteria are literal and inspectable.

That is why rubric DSLs and preference specs matter. See Reward Rubric DSL and Preference TOML.

5. Context Inflation Is Usually a Smell

When agents need long repository manifestos to function, the codebase is often the real problem.

Costly context is not the same thing as useful context. See AGENTS.md Effectiveness.

6. Loops Beat One-Shot Heroics

Attempt, critique, revision, and rescore usually beat longer and louder prompts.

The boring loop wins because it narrows the next move. See Prompt Patterns.

7. Strong Positions Require Evidence

This archive is opinionated on purpose.

Every durable claim should reduce to a failure mode, an example, or a reproducible workflow pattern.

8. Field Notes Need Compression

Raw observation is not enough. Findings need to compress into something reusable.

The Starfish method exists for that reason: Start, Stop, Continue, Investigate, Amplify.

9. Personal Brand Comes From Doctrine, Not Volume

More posts create more output. A named position creates memorability.

The archive exists to support a doctrine, not to cosplay momentum.

10. Tools Must Lower Comprehension Debt

Tooling should reduce cognitive load, not multiply tabs, polling, and context switching.

Auditory cues are a practical operator channel: they surface completion state quickly and improve recall under split attention. See psay Agent Notifications.

What This Site Rejects

AI writing that sounds like sales enablement sludge.
Benchmarks treated as truth instead of instruments.
One-score evaluation systems presented as rigor.
Architecture problems disguised as prompt problems.
Personal branding with aesthetics and no hard position.

Named Frameworks

Starfish Method: Start, Stop, Continue, Investigate, Amplify. Used to compress empirical findings without flattening them.
Reward Rubric DSL: A machine-readable rubric for scoring agent output.
Preference TOML: A config pattern that uses RLHF-shaped semantics for critique and selection.

Best Starting Reads

Contact

For disagreement, replication attempts, or examples that break these claims: nathanib@pm.me.

If a workflow here breaks in the wild, that is more interesting than agreement. Send the failure.