April 2026
What is Prompting: Operational Constraints
Prompting is not abstract instruction-writing. Prompting is a constrained pipeline where tokenization mechanics, token budgets, and control structures form the actual boundaries within which agent behavior operates.
Agents do not see "prompts." They see token sequences. Understanding prompting means understanding the mechanics that govern those sequences.
The Pipeline
Prompts follow a deterministic 7-stage process:
- Human input: raw text (words, images, other media)
- Tokenization: conversion to numerical token IDs (model-specific)
- Token IDs: numerical representation ready for processing
- LLM processing: core computation phase
- Output token IDs: numerical response
- Detokenization: conversion back to text tokens
- Human-readable output: final agent response
This pipeline is the mechanism. Everything that happens to the prompt happens within this process.
Three Hard Constraints
1. Tokenization (The Conversion Constraint)
Tokenization is the compilation step. Just as code must be compiled to machine instructions, prompts must be tokenized to numerical sequences.
This has three practical implications:
- Token count varies by provider. The same text tokenizes differently in GPT-3, GPT-4, and Gemini due to different tokenization algorithms.
- Token counting is budgetable. You can measure token usage with libraries (tiktoken for OpenAI) or API feedback (Gemini).
- Token efficiency matters. More concise prompts leave room for agent responses without hitting max token limits.
For agent specification: tokenization means agents do not see structural hints in formatting. They see token sequences. Decomposing specifications into structured sequences, not prose, respects how agents actually process input.
2. Max Tokens (The Working Memory Constraint)
Every LLM has a context window limit: the maximum tokens it can process in a single interaction. This is a fixed architectural constraint, not a soft preference.
Examples:
- GPT-3: 4,096 tokens
- GPT-4: 8,192 or 32,768 tokens (version-dependent)
- Gemini 3: 1,048,576 tokens
How it constrains agents: if max_tokens = 8,192 and your prompt uses 1,000 tokens, only 7,192 tokens remain for reasoning and output. Agents cannot reason deeper than the token budget allows. Token limits force agents to compress reasoning or fail requests that exceed capacity.
From a cost perspective: token limits directly translate to billing. Most LLM services charge (input_tokens + output_tokens) × price_per_token. Larger context windows mean higher costs if fully used.
The real insight: the context window is the agent's working memory. A 4K token limit is fundamentally more constrained than a 1M token limit. This affects what agents can hold in mind, what context they can reference, what reasoning chains they can execute.
3. Control Tokens (The Structure Constraint)
Control tokens are special tokens that organize prompt regions and guide LLM processing phases.
Examples:
<|startoftext|>: begin sequence<|endoftext|>: end sequence<|user|>: mark user message<|assistant|>: mark assistant message
These tokens are handled internally by modern APIs (you don't write them explicitly in OpenAI), but understanding them shows how prompts are actually structured. Control tokens segment reasoning phases. They tell the LLM where one phase ends and another begins. This explains why conversation state is preserved in message-based prompts but not in basic text prompts.
Prompt Types as Architectural Choices
The choice of how to structure prompts directly affects agent capability.
Basic Text Prompts
"Translate 'Hello' to French"
- Single-turn only
- No conversation state
- No access to prior messages
- Best for: one-off queries
Messages Prompts
[
{ role: "user", content: "Translate 'Hello' to French" },
{ role: "assistant", content: "Bonjour" },
{ role: "user", content: "And 'goodbye'?" }
]
- Multi-turn with state
- Prior messages available for context
- More token-expensive (full history included)
- Best for: conversation, agents with memory
System Prompts
system: "You are a French translator. Be concise."
user: "Translate 'Hello'"
- Sets operational boundaries
- Persists across conversation
- Defines agent persona and constraints
- Best for: defining agent behavior globally
The choice between basic and message-based prompts determines whether an agent can maintain reasoning continuity across multiple requests. It is a fundamental architectural constraint, not a minor implementation detail.
Prompt Management (Version Control for Behavior)
As prompts evolve, they need versioning:
translation_openai_v1.0.0 # Initial version
translation_openai_v1.1.0 # Enhancement (minor version)
translation_openai_v2.0.0 # Major refactor (major version)
Why versioning matters:
- A/B testing: run different prompt versions against same LLM to measure effectiveness
- Rollback: revert to previous version if new version underperforms
- Provider-specific optimization: same task may need different prompts for GPT-4 vs Gemini
- Performance tracking: measure how changes affect output quality and token efficiency
The Design Implication
Agents cannot be understood independently of their prompting mechanism. The tokenization pipeline, token limits, and control structures are not implementation details. They are the operational constraints that determine what agents can do.
When designing agent specifications:
- Respect tokenization: structure specs as token sequences, not prose
- Budget tokens: reserve output space within context window limits
- Use system prompts: define agent boundaries globally, not per-request
- Choose prompt type carefully: messages-based for agents with memory, basic for stateless operations
- Version prompts: track which prompt versions produce which behaviors
Agents do not follow instructions abstractly. They operate within tokenized, token-budgeted, control-structured pipelines. Understanding prompting means understanding these constraints as first-class design elements.
Related Principles
- Context Is a Budget: token limits as cognitive constraint
- Specs as Shared Reality: how specification structure shapes agent behavior
- Protocol Before Personality: structure precedes persona