What is Prompting - StayFresh

April 2026

What is Prompting: Operational Constraints

Prompting is not abstract instruction-writing. Prompting is a constrained pipeline where tokenization mechanics, token budgets, and control structures form the actual boundaries within which agent behavior operates.

Agents do not see "prompts." They see token sequences. Understanding prompting means understanding the mechanics that govern those sequences.

The Pipeline

Prompts follow a deterministic 7-stage process:

Human input: raw text (words, images, other media)
Tokenization: conversion to numerical token IDs (model-specific)
Token IDs: numerical representation ready for processing
LLM processing: core computation phase
Output token IDs: numerical response
Detokenization: conversion back to text tokens
Human-readable output: final agent response

This pipeline is the mechanism. Everything that happens to the prompt happens within this process.

Three Hard Constraints

1. Tokenization (The Conversion Constraint)

Tokenization is the compilation step. Just as code must be compiled to machine instructions, prompts must be tokenized to numerical sequences.

This has three practical implications:

Token count varies by provider. The same text tokenizes differently in GPT-3, GPT-4, and Gemini due to different tokenization algorithms.
Token counting is budgetable. You can measure token usage with libraries (tiktoken for OpenAI) or API feedback (Gemini).
Token efficiency matters. More concise prompts leave room for agent responses without hitting max token limits.

For agent specification: tokenization means agents do not see structural hints in formatting. They see token sequences. Decomposing specifications into structured sequences, not prose, respects how agents actually process input.

2. Max Tokens (The Working Memory Constraint)

Every LLM has a context window limit: the maximum tokens it can process in a single interaction. This is a fixed architectural constraint, not a soft preference.

Examples:

GPT-3: 4,096 tokens
GPT-4: 8,192 or 32,768 tokens (version-dependent)
Gemini 3: 1,048,576 tokens

How it constrains agents: if max_tokens = 8,192 and your prompt uses 1,000 tokens, only 7,192 tokens remain for reasoning and output. Agents cannot reason deeper than the token budget allows. Token limits force agents to compress reasoning or fail requests that exceed capacity.

From a cost perspective: token limits directly translate to billing. Most LLM services charge (input_tokens + output_tokens) × price_per_token. Larger context windows mean higher costs if fully used.

The real insight: the context window is the agent's working memory. A 4K token limit is fundamentally more constrained than a 1M token limit. This affects what agents can hold in mind, what context they can reference, what reasoning chains they can execute.

3. Control Tokens (The Structure Constraint)

Control tokens are special tokens that organize prompt regions and guide LLM processing phases.

Examples:

<|startoftext|>: begin sequence
<|endoftext|>: end sequence
<|user|>: mark user message
<|assistant|>: mark assistant message

These tokens are handled internally by modern APIs (you don't write them explicitly in OpenAI), but understanding them shows how prompts are actually structured. Control tokens segment reasoning phases. They tell the LLM where one phase ends and another begins. This explains why conversation state is preserved in message-based prompts but not in basic text prompts.

Prompt Types as Architectural Choices

The choice of how to structure prompts directly affects agent capability.

Basic Text Prompts

"Translate 'Hello' to French"

Single-turn only
No conversation state
No access to prior messages
Best for: one-off queries

Messages Prompts

[
  { role: "user", content: "Translate 'Hello' to French" },
  { role: "assistant", content: "Bonjour" },
  { role: "user", content: "And 'goodbye'?" }
]

Multi-turn with state
Prior messages available for context
More token-expensive (full history included)
Best for: conversation, agents with memory

System Prompts

system: "You are a French translator. Be concise."
user: "Translate 'Hello'"

Sets operational boundaries
Persists across conversation
Defines agent persona and constraints
Best for: defining agent behavior globally

The choice between basic and message-based prompts determines whether an agent can maintain reasoning continuity across multiple requests. It is a fundamental architectural constraint, not a minor implementation detail.

Prompt Management (Version Control for Behavior)

As prompts evolve, they need versioning:

translation_openai_v1.0.0  # Initial version
translation_openai_v1.1.0  # Enhancement (minor version)
translation_openai_v2.0.0  # Major refactor (major version)

Why versioning matters:

A/B testing: run different prompt versions against same LLM to measure effectiveness
Rollback: revert to previous version if new version underperforms
Provider-specific optimization: same task may need different prompts for GPT-4 vs Gemini
Performance tracking: measure how changes affect output quality and token efficiency

The Design Implication

Agents cannot be understood independently of their prompting mechanism. The tokenization pipeline, token limits, and control structures are not implementation details. They are the operational constraints that determine what agents can do.

When designing agent specifications:

Respect tokenization: structure specs as token sequences, not prose
Budget tokens: reserve output space within context window limits
Use system prompts: define agent boundaries globally, not per-request
Choose prompt type carefully: messages-based for agents with memory, basic for stateless operations
Version prompts: track which prompt versions produce which behaviors

Agents do not follow instructions abstractly. They operate within tokenized, token-budgeted, control-structured pipelines. Understanding prompting means understanding these constraints as first-class design elements.

Related Principles

Context Is a Budget: token limits as cognitive constraint
Specs as Shared Reality: how specification structure shapes agent behavior
Protocol Before Personality: structure precedes persona