The Whiteboard Test

Tue, 23 Jun 2026 09:00:00 -0400

The title I wanted was “A Whiteboard Is All You Need.”

That is not true.

The whiteboard is not all you need. You also need the manual: what the board means, who can change what, and when to hand off, verify, or stop.

But the whiteboard is where I now start.

I started thinking this way because “agent context” kept feeling too slippery. Everyone says the agent needs context. Sure. But what context? The goal? The constraints? The evidence so far? The decisions already made? The shape in which people, agents, and code are supposed to communicate?

At some point I stopped asking, “What should I tell the agent?”

I started asking a different question:

If this were a human team, and the whiteboard were their only shared interface, what would need to be on it? What instructions would tell each person how to use it?

That question has been annoyingly useful.

The whiteboard is shared state. Each collaborator uses manuals that define roles, permissions, handoffs, review, and stop rules.

The Test

Imagine a room.

There are collaborators in it. Some are people. Some are agents. Some are code.

You are not allowed to keep talking to them while they work.

You get one whiteboard. You also get to hand each collaborator a short manual. The manual says which parts of the board they should watch, what each field means, when to verify, and when they should leave things alone.

Now design the system.

That is the whiteboard test.

If I cannot say what belongs on the board, I do not understand the task yet. If I cannot write the manual, I do not understand how the work should be done.

Board and Manual

The board and the manual are different things.

The board is state. It holds what matters right now: goals, constraints, decisions, evidence, partial results, risks, status, open questions, and past events that should affect the next action.

The manual is semantics and protocol. It defines what the board’s fields mean, what each actor may read or write, and when to verify, hand off, or stop.

It also makes responsibility explicit. A section can have an owner, peer collaborators, reviewers, handoff rules, or a human approval gate.

In many prototypes, we blur these together. We stuff state, policy, examples, memory, tool instructions, and vibes into one prompt and hope the model sorts it out.

Sometimes the task is small and the prompt is the whole system. But when the work gets longer, shared, or safety-sensitive, I want the split. The board tells me what the system knows. The manual tells me how each actor should behave around that knowledge.

The board holds shared state. The manual defines field meaning, roles, permissions, handoffs, review, and stop rules.

How I Use It

I usually do this as a loop.

First, I name the work. What is the task? What counts as done? What would count as failure? How will I know whether it worked?

Then I sketch the board. I list the decisions, facts, evidence, constraints, status, open questions, failure signals, and live coordination state the work depends on.

Live coordination state is not the same as ownership. waiting_on or needs_review_by can live on the board. The rule for who normally owns a section belongs in the manual.

Next I design the manual. I start with the smallest team that could plausibly work: often one agent, some code, and a human checkpoint. The manual defines field meanings, role responsibilities, read/write permissions, handoffs, review rules, and stop conditions.

Then I test the setup against those evaluations. When performance breaks down, I try to diagnose where the representation failed. Was the board missing state? Was the manual missing a rule, a definition, or a handoff? Was the work split across the wrong actors? Or is the task beyond the current model, tool, or code setup?

The fix depends on the diagnosis. After a few passes, the architecture gets less theoretical. You can see what belongs on the board, what belongs in the manual, and where the work needs another actor, tool, or checkpoint.

Not because the metaphor solves the problem.

Because it tells me what problem I am actually solving.

A four-step loop for the whiteboard test: name the work, sketch the board, write the manuals, then run and revise.

A Tiny Moderation Sketch

I recently used this metaphor while designing an agentic moderation system.

The setup was roughly this: a parent, teacher, or user sets wellbeing preferences for a human-chatbot interaction. A moderation agent watches the interaction and decides when to intervene, signal, rewrite, or escalate.

The first board might only have preferences, evidence, current state, and pending questions. The first manual might say: append evidence with citations, change state only with cited evidence, and ask the person who set the preference before rewriting it.

That is no longer just prompt design. It is deciding what shape the work should take inside the system.

The person setting preferences does not need to see the whole board. In fact, showing the whole board would be cruel. They need a small set of meaningful controls. The operator may need the richer operational surface underneath.

So the board has asymmetric visibility. Simple surface for the person. Richer state for the operator. A shared semantics layer between them so the same field does not quietly mean three different things.

This Is Not A New Trick

There is older language for this.

Long before today’s agent frameworks, AI had blackboard architectures: specialist modules coordinated through a shared workspace, usually with control logic deciding what ran next.

Social science has boundary objects: artifacts that different groups can use differently while still coordinating around the same thing. Modern agent frameworks talk about state, workflows, tool interfaces, handoffs, memory, and context engineering.

Good. I do not want this to be new. I want it to be useful.

The whiteboard test is my way of carrying that lineage into the current agent mess. Before I put context into prompts, memory, tools, or workflow state, I want to know what shared surface people, agents, and code are supposed to use to communicate.

That gives me a handle before I reach for a framework. It lets me ask small questions that are not small:

What is the board?
Who can see which part?
Who is allowed to write?
What needs a source?
What must never be summarized away?
How does the board evolve?

If those answers are fuzzy, the system will be fuzzy too. The agent may still sound confident, which is worse.

Infinite Boards Are Not Free

The board can be huge. That does not mean it should be.

An infinite whiteboard gives you a new problem: context management. Now the manual has to say what is relevant, what is stale, and what must stay exact.

This is where the metaphor stays honest. More context is not the same as better context. A messy board is just a messy prompt with better branding.

Some information should be visible all the time. Some should live in a drawer. Some should expire. Some should require a human before it changes.

The Point

Before you prompt the agent, design the board.

Before you add another agent, decide what board state it needs and what manual it follows.

Before you add memory, decide what should survive.

Before you expose controls to a user, decide what they should never have to see.

The whiteboard is not all you need.

But it is a good place to start.

Harness-Engineering on Safiware