Revised April 2026
StayFresh Thesis
This site is not a feed. It is a position.
The position is that AI-assisted development improves when evaluation is explicit, architecture is legible, and empirical notes outrank polished claims.
Position
Prompting shapes language. Evaluation shapes incentives. Architecture determines whether the result survives contact with reality.
What This Site Believes
1. Prompting Is a Constrained Pipeline, Not Magic Language
Prompts don't float free. They move through tokenization, token budgets, and control structures. Understanding what prompts actually are (the mechanism, not the incantation) changes everything about how to use them.
See What is Prompting: Operational Constraints.
2. Reward Surfaces Matter More Than Prompt Tricks
Prompting can steer tone, ordering, and local behavior. It does not reliably determine what gets optimized under pressure.
The control surface is the rubric. See Reward Engineering for Coding Agents.
3. Narrow Metrics Produce Fake Progress
Single numbers are easy to optimize and easy to game.
Useful evaluation is multi-axis and explicit about what counts as failure. See Reward Hacking in Coding Agents.
4. Machine-Readable Criteria Beat Vibe-Based Review
Agents behave more predictably when success criteria are literal and inspectable.
That is why rubric DSLs and preference specs matter. See Reward Rubric DSL and Preference TOML.
5. Context Inflation Is Usually a Smell
When agents need long repository manifestos to function, the codebase is often the real problem.
Costly context is not the same thing as useful context. See AGENTS.md Effectiveness.
6. Loops Beat One-Shot Heroics
Attempt, critique, revision, and rescore usually beat longer and louder prompts.
The boring loop wins because it narrows the next move. See Prompt Patterns.
7. Strong Positions Require Evidence
This archive is opinionated on purpose.
Every durable claim should reduce to a failure mode, an example, or a reproducible workflow pattern.
8. Field Notes Need Compression
Raw observation is not enough. Findings need to compress into something reusable.
The Starfish method exists for that reason: Start, Stop, Continue, Investigate, Amplify.
9. Personal Brand Comes From Doctrine, Not Volume
More posts create more output. A named position creates memorability.
The archive exists to support a doctrine, not to cosplay momentum.
10. Tools Must Lower Comprehension Debt
Tooling should reduce cognitive load, not multiply tabs, polling, and context switching.
Auditory cues are a practical operator channel: they surface completion state quickly and improve recall under split attention. See psay Agent Notifications.
What This Site Rejects
- AI writing that sounds like sales enablement sludge.
- Benchmarks treated as truth instead of instruments.
- One-score evaluation systems presented as rigor.
- Architecture problems disguised as prompt problems.
- Personal branding with aesthetics and no hard position.
Named Frameworks
- Starfish Method: Start, Stop, Continue, Investigate, Amplify. Used to compress empirical findings without flattening them.
- Reward Rubric DSL: A machine-readable rubric for scoring agent output.
- Preference TOML: A config pattern that uses RLHF-shaped semantics for critique and selection.
Best Starting Reads
- What is Prompting: Operational Constraints
- Project AI Philosophy
- Reward Engineering for Coding Agents
- Reward Hacking in Coding Agents
- Reward Rubric DSL
- Preference TOML
Contact
For disagreement, replication attempts, or examples that break these claims: nathanib@pm.me.
If a workflow here breaks in the wild, that is more interesting than agreement. Send the failure.