StayFresh

Static archive of workflow research and patterns

April 2026

Modular Workflow Stack

The right tool for the right job at the right time. In sequence, with deliberate hand-offs and human checkpoints between layers.

A modular workflow is a decision surface with execution attached.

The Skills Stack pattern (role-based orchestration via gstack, context stability via GSD, execution via Superpowers) makes this concrete. Each layer has a job. Each layer hands off cleanly. None of them run simultaneously from a single monolithic prompt.

The Three Layers

Every durable workflow decomposes into three concerns:

LayerJobFailure Mode When Missing
OrchestrationDecide what to do next, route to the right agent, gate on human inputAgents start work before the goal is clear; you debug outputs instead of inputs
ContextKeep specifications stable across a long chain of stepsContext drift: later steps contradict earlier decisions; agents re-argue closed questions
ExecutionDo one narrow task well with minimal contextBloated prompts, multi-objective confusion, unpredictable output

The layers must be physically separate. An orchestration prompt that also does execution collapses both failure modes into one.

Layer 1: Orchestration

The orchestrator is the conductor, not a performer. Its only job is to route: given the current state and user input, which agent runs next, with what context, and who reviews the result?

Orchestrator Responsibilities

Role-Based Routing

gstack formalizes what most effective teams do informally: different decisions belong to different roles. The same principle applies to agents.

Decision TypeRoute ToWhy
Product scope, priorityProduct/CEO agentAvoids over-engineering at the execution layer
Architecture, interfacesEngineering Manager agentSeparates design from implementation concerns
UX, component structureDesigner agentKeeps visual decisions out of backend prompts
ImplementationExecution agentSingle-objective, narrow context, fast
Correctness, edge casesQA agentFresh context; no sunk-cost bias from implementation
Security, injection, authSecurity agentAdversarial lens requires explicit framing
Merge, deploy, releaseRelease agentSeparate concern; human gate before irreversible push

The orchestrator does not implement any of these roles. It knows which role applies and routes accordingly.

Layer 2: Context Stability

Long chains of agents fail from context drift. The specification decided in step 2 is forgotten by step 8. The GSD pattern addresses this by treating the spec as a first-class artifact, not a conversation thread.

What Context Stability Requires

# Context handoff pattern
# At each agent boundary, pass the spec explicitly:

SPEC: See SPEC.md at commit abc123
TASK: Implement the authentication module as defined in section 3.2
CONSTRAINTS: Do not modify the user model schema
OUTPUT: PR ready for QA agent review

# The spec is not in the prompt. It is referenced by the prompt.

This separates context (stable, versioned) from instructions (per-task, ephemeral). Token cost drops. Drift disappears. Disputes resolve against the written spec, not conversation history.

Layer 3: Execution

Execution agents are narrow by design. They receive a single objective, minimal context, and a verifiable exit condition. Width is the orchestrator's job. Depth is the execution agent's job.

# Good execution prompt
ROLE: QA agent
CONTEXT: See SPEC.md §3.2, auth module PR #47
TASK: Find edge cases not covered by the current test suite
OUTPUT: Numbered list of uncovered cases with reproduction steps
HALT: When list is complete or you have checked all spec assertions

# Bad execution prompt (orchestration collapsed into execution)
You are a full-stack engineer. Review the spec, implement auth,
write tests, check security, prepare the PR, and make sure
it matches the design. Be thorough.

Human-in-the-Loop

Human gates are not an apology for agent unreliability. They are the architecture. The workflow is designed around them.

Where to Insert Gates

StepGate TypeQuestion to the Human
Before spec is finalizedApprovalDoes this spec match your intent?
After architecture decisionApprovalDoes this design fit constraints we haven't told the agent?
After QA reportTriageWhich findings are blockers vs. accepted risk?
Before any push/deployHard gateExplicit approval; no default proceed
When agent signals uncertaintyEscalationAgent surfaces ambiguity; human resolves it

Gates must be explicit in the workflow definition. An implicit assumption that the human will "just notice" when to intervene is an absence of architecture, not a gate.

Designing for Interruption

A workflow that cannot be interrupted mid-chain is fragile. Every long chain should support:

Parallelism

Independent tasks should not run sequentially. The constraint is dependency, not caution.

When to Parallelize

# Sequential (correct: B depends on A's output)
A: Finalize spec
B: Implement auth module per spec

# Parallel (correct: no dependency between B and C)
A: Finalize spec
B: Implement auth module per spec     ← launch together
C: Write E2E test scaffold per spec   ← launch together
D: Security review of spec            ← launch together
E: Merge B+C+D results, resolve conflicts

Parallelism Boundaries

Safe to ParallelizeMust Be Sequential
Independent feature branchesSpec finalization → implementation
QA + security review of same PRImplementation → QA
Multiple execution agents on different modulesArchitecture → any implementation
Competing design proposalsHuman gate → next phase
Background context refreshMerge + conflict resolution

Parallelism multiplies throughput only when the merge step is cheap. If parallel outputs require substantial reconciliation, the cost is hidden, not eliminated. Design merge steps explicitly; they are not free.

Loops

Loops are the mechanism for refinement. They require halting conditions, not just goals.

Loop Anatomy

LOOP:
  INPUT:  Current state + failure signal
  TASK:   Fix one thing
  VERIFY: Run oracle (tests, lint, typecheck)
  HALT:   Oracle passes OR loop count exceeds N
  ON HALT EXCEEDED: Escalate to human, do not auto-proceed

Loop Types

Loop TypeTriggerHalting Condition
Fix-CI loopTest/lint failureAll checks pass
Review loopQA or human feedbackAll blockers addressed
Refinement loopOutput quality below rubric thresholdScore exceeds threshold or max iterations
Exploration loopUnknown solution spaceN candidates generated; human selects
Context-refresh loopSpec version mismatchAgent confirms spec alignment

A loop without a halting condition is a runaway. A loop that halts on "done" is a loop that never halts on time. Halting conditions must be machine-verifiable.

Composing Many Steps

A 30-step workflow is not a 30-prompt workflow. Most prompts are small. The complexity is in the graph, not the nodes.

Step Graph Properties

Workflow Definition Pattern

# Minimal workflow definition
workflow: auth-feature
spec: specs/auth-v2.md

steps:
  - id: scope
    agent: product
    input: user_request
    gate: human_approval

  - id: design
    agent: eng-manager
    input: scope.output
    gate: human_approval

  - id: implement
    agent: execution
    parallel:
      - id: impl-backend
        input: design.output
      - id: impl-tests
        input: design.output
      - id: security-review
        input: design.output

  - id: merge
    agent: eng-manager
    input: [impl-backend.output, impl-tests.output, security-review.output]

  - id: qa
    agent: qa
    input: merge.output
    loop:
      on: qa_findings
      until: no_blockers
      max: 3

  - id: release
    agent: release
    input: qa.output
    gate: human_approval  # hard gate; no default proceed

This is not a prompt. It is a schema. The prompts are inside the agent definitions, kept separate from the workflow graph. When a step fails, you debug the step definition, not the entire chain.

Token Economics at Scale

Long workflows amplify token decisions made early. A 600-token context file loaded at every step of a 30-step workflow is 18,000 tokens spent on generic context. Task-specific context passed only to the relevant step costs a fraction of that.

Rules of thumb:

Anti-Patterns

The Monolith Prompt

A single prompt that asks an agent to plan, implement, review, and ship. All three layers collapsed into one. When it fails (and it will), there is nowhere to debug.

Implicit Sequencing

Running steps in order without documenting why. When a step needs to move or be parallelized, the dependency is unknown. The sequence breaks silently.

Unbounded Loops

Loops without maximum iterations or without escalation on failure. The agent retries indefinitely. The human discovers it hours later.

Framework Stacking Without Layer Separation

Running gstack, GSD, and Skills from a single prompt collapses all three layers into one context. The frameworks are complements because they operate at different layers; running them in parallel from one context eliminates the benefit of any of them.

Parallelism Without Merge Design

Launching parallel agents without planning how their outputs reconcile. Merge conflicts in parallel agent output are harder to resolve than sequential conflicts because neither agent knows about the other's decisions.

Missing Human Gates

Automating past a decision point that requires human judgment. The workflow moves fast and lands in the wrong place. The issue is irreversibility, not speed.

The Compounding Effect

Garry Tan's reported output (10,000 lines of code and 100 pull requests per week over 50 days) is the compounding product of clean layer separation applied consistently, not any single tool or prompt.

Each layer running at its level of abstraction means:

The workflow does not scale because it runs faster. It scales because it fails locally. Failures in execution do not corrupt orchestration. Failures in orchestration do not corrupt the spec. Each layer's failure mode is contained to that layer.

Related Workflows