The 21-Line Rule, Twenty Years Later

In short: A coding rule from a 1990s university computer science class — where the hard constraint was the size of a CRT screen — turned out to be one of the most practical pieces of advice I’ve ever applied. Twenty years later, I’ve encoded it into every AI agent in the pipeline: functions must not exceed 30 lines. But the rule alone creates a new problem. That’s where AI comes in.

Where the rule came from

Sometime in the late 1990s, in a university computer science lecture, a professor introduced what he called the 21-line function rule.

The rule was simple: every function must fit in 21 lines. If it doesn’t, it can probably be broken into smaller functions.

The reason was equally simple: CRT terminals of the era displayed around 20-something lines at a time. A 21-line function meant you could read the entire function body without scrolling. The whole thing was visible on one screen. You could reason about it completely without holding two separate views in your head.

It was a practical constraint dressed up as a design principle. And it worked.

Why small functions matter

The CRT justification sounds dated. But the underlying principle isn’t. Small functions force several good habits at once:

Single responsibility. A function that fits in 21 lines can only do a limited number of things. If you find yourself needing 80 lines, it’s almost always because the function is doing two or three things that should each be their own function.

Clear input and output. A short function tends to have a clear contract: it takes a few well-defined inputs and returns a well-defined output. Long functions often accumulate side effects, conditional mutations, and implicit dependencies that make the contract fuzzy.

Easy to test. A 10-line function that transforms one thing into another thing has a small number of meaningful test cases. A 150-line function that reads config, makes network calls, formats output, and handles errors has a combinatorial explosion of states you’d need to test to be confident.

Easy to debug. When something goes wrong, a short call stack of well-named functions is far easier to trace than a single long function with fifteen branches.

The professor was right. The CRT was just the delivery mechanism for a much more durable insight.

The downside nobody mentioned

The 21-line rule has a real cost, and it wasn’t covered in the lecture.

When you enforce small functions, you end up with a lot of them. A module that might have had 5 or 6 long functions now has 30 or 40 short ones. Each function is easy to understand in isolation. The module as a whole becomes harder to navigate.

The individual trees are clear. The forest is dense.

This is the classic trade-off in software design between locality (everything in one place, easy to follow) and modularity (small units, easy to test). The 21-line rule optimises hard for modularity. You pay in navigability.

For a single engineer working alone, this is manageable. You know the code. You know what calls what. You can hold the hierarchy in your head.

For a team — and especially for an AI agent working with unfamiliar code — this cost is real. An agent asked to extend a module with 40 functions needs to understand which ones are relevant, which call which, and which to leave alone. Without a map, it guesses. And guessing is where bugs come from.

The AI era reframes the trade-off

The 21-line rule was designed for humans reading code on a CRT. The trade-off it implied — small functions vs navigable hierarchy — assumed the hierarchy problem had to be solved by the human.

That assumption no longer holds.

An AI can read a codebase faster than any human. It can build and maintain a function hierarchy map, identify which functions call which, flag violations automatically, and surface the relevant context for any given task. The part of the cost that made small functions hard to scale — maintaining awareness of the hierarchy — is exactly the kind of mechanical bookkeeping that AI handles well.

The benefits of small functions remain human-scale: testability, debuggability, clear contracts. The cost of managing many small functions becomes machine-scale: a solvable automation problem.

The trade-off that made the 21-line rule impractical at scale dissolves in the AI era. What was a useful constraint for a student project becomes a serious quality foundation for a production system.

What we built

We updated the limit to 30 lines — modern screens are taller, and the principle matters more than the exact number — and encoded it across three layers.

Layer 1: Refactor existing agents

The first step was applying the rule to all 24 existing agent Python files in the pipeline. Several of them had functions running to 80, 100, even 150 lines. Each was split into named helpers following the _parse_xyz / _build_xyz / _validate_xyz naming convention.

The refactor enforced a constraint that was already implied by good design, but never formally codified. Every function in the agent layer now has a clear single job.

Layer 2: Inject the rule into agent role prompts

Agents write code. If the rule isn’t in their system prompt, they’ll write long functions by default — they’ve been trained on a corpus of code that has no such constraint.

A <coding_standards> block was added to the role files for all seven writing agents:

<coding_standards>
## Coding Standards

FUNCTION SIZE LIMIT: 30 lines maximum per function.

- If a function needs more than 30 lines, it is doing too much.
  Break it into named helpers with clear single responsibilities.
  Name helpers descriptively: _parse_xyz, _build_xyz, _validate_xyz.
- When you read existing Python code that violates this rule, add an
  inline comment `# VIOLATION: function_name exceeds 30 lines` at the
  offending function. Silently skip this annotation for non-Python files.

FUNCTION MAP (Python files only):
- At the end of every Python module you write or significantly modify,
  append a `# --- fn_map ---` comment block listing every function
  and the functions it calls.
  Format:
    # parent_function -> [child1, child2]
    # leaf_function -> []
  This block is used by automated tooling to verify function hierarchy.
</coding_standards>

The fn_map comment block is the AI-era answer to the hierarchy problem. When an agent writes a module, it leaves a machine-readable map of its own function structure. The next agent — or human — who reads that file has an immediate picture of what calls what.

Layer 3: Runtime validation

The third layer enforces the rule at the point where agents write code to disk.

validate_function_sizes() was added to tools/fn_map.py. It takes a list of Python files, runs the same AST-based analysis as the existing fn_map tooling, and returns violation strings:

service.py::process_request (47 lines)
utils.py::build_response (33 lines)

_after_write() was added to BaseAgent. Whenever a subclass writes Python files to disk, it calls _after_write(written_files). If violations are returned, the calling agent can inject them directly into the next prompt turn:

violations = self._after_write(written_files)
if violations:
    feedback = "The following functions exceed 30 lines:\n" + "\n".join(violations)
    revised = self.call(feedback + "\nPlease refactor them into smaller helpers.")

The loop closes: the agent writes code, the validator checks it, violations go back into the conversation as grounded, specific feedback. The agent revises. The rule is enforced in the generation loop, not just in code review.

The function hierarchy map

The fn_map deserves its own mention. It was already in the codebase — a tool that reads Python files and builds an interactive HTML visualisation of which functions call which. The 30-line project extended it with a programmatic API for violation detection.

But the fn_map concept goes deeper than tooling. The agents are now instructed to leave a # --- fn_map --- comment block at the end of every Python module they write:

# --- fn_map ---
# run -> [_load_config, _build_prompt, _call_llm, _parse_response]
# _load_config -> []
# _build_prompt -> [_format_context, _format_examples]
# _call_llm -> []
# _parse_response -> [_extract_json, _validate_schema]
# _format_context -> []
# _format_examples -> []
# _extract_json -> []
# _validate_schema -> []

This is the solution to the forest problem. A module with 40 functions becomes navigable the moment it has an explicit map of which functions are entry points, which are helpers, and which are leaves. A new agent reading this file can immediately identify what to call and what to leave alone. A human code reviewer can scan the hierarchy before reading any implementation.

The fn_map turns the cost of many small functions into a feature. You get testability AND navigability. You just have to be explicit about the structure.

The numbers

Layer What changed PR-A 24 agent files refactored, zero functions over 30 lines PR-B 7 role prompts updated with <coding_standards> block PR-C validate_function_sizes() + _after_write() hook, 18 new tests

The rule took three pull requests to fully encode. The refactor was the most mechanical. The prompt injection was the most consequential — it changes the default output of every agent going forward. The runtime validator closes the loop.

Twenty years later

The professor who taught the 21-line rule probably never imagined it would one day be encoded into an AI agent’s system prompt. The rule was designed for humans reading code on a screen that could show 21 lines at a time.

The screen constraint is gone. The principle it enforced is stronger than ever.

Small functions are easy to test, easy to debug, and easy to hand to an AI agent with a clear description of what they do. They compose cleanly. They fail in isolation rather than cascading. They document intent through their names rather than through comments inside a long body.

The downside — too many functions, hard to navigate — is a problem AI handles better than humans do. Give an AI agent a function map and it knows where it is in a module instantly. Give it a module with five 200-line functions and it has to reason about every branch to understand what’s safe to change.

The 21-line rule was ahead of its time. Not because it was designed for AI, but because it was designed for clarity — and clarity is what makes code legible to any reader, human or machine.

Thirty lines. That’s the rule now. It fits on a modern screen. It fits in an agent’s attention window. It fits in a test case. That’s three good reasons.

Code: github.com/wanleung/ai-dev-team