Building a Workshop, Part 2: How to Design for a Non-Human Collaborator

Ok so having set out the problem & overall ideas in the previous blog, let's talk about actually building this. There's no silver bullet here, rather it's a series of interconnected ideas all working together. This also isn't MECE - take it as a set of interesting thoughts rather than a full end-to-end recipe

1. Give Agents Control

Humans are terrible at maintaining file systems (or at least I am). The fundamental problem with personal knowledge management is that all of them assume you'll maintain the system. And you won't. I won't. Nobody does. You set it up with good intentions, maintain it for a few weeks, and then entropy wins. So let's stop trying. Hand the whole thing over to the agent. You should only think, decide, and instruct. The agent should do everything else — filing, retrieval, cross-referencing, tracking what's stale, maintaining what depends on what.

Helpfully, when the agent maintains the system the easy path and the organized path become the same path. You don't have to file things because the agent routes them — Workshop has a unified inbox pipeline where items get classified by confidence. The agent routes high-confidence items automatically and holds ambiguous ones for review. The agent even learns routing patterns over time and proposes new rules. Organization stops being overhead you pay and starts being something that just happens. This is the only way organizational infrastructure actually survives contact with real humans.

The result is that the system compounds instead of decaying. Every session the agent runs maintenance, updates indices, refreshes staleness flags, leaves notes for the next agent about what happened and what's incomplete. The project gets better over time rather than slowly rotting. This is the opposite of every knowledge management system I've ever used.

2. How to Design for Agents

Ok so you've decided to hand over control. How do you actually make that work? This is where it gets interesting, because it requires empathy for the agent and I mean that literally - thinking about what the agent actually sees when it opens your workspace.

Picture Santa's workshop. A new elf shows up for a shift. Brilliant, hard-working, eager to help — but with zero memory of what any previous elf did. The floor is covered in unlabeled parts, half-finished toys, notes scrawled on napkins, and three conflicting gift lists. No map. No diary from the last shift. Just stuff everywhere. That's what we do to AI agents every single session. And then we're surprised when they pick up the wrong piece and start building the wrong toy.

We need to start thinking of agents as users of these systems — what does it need? what are its constraints? what makes it effective vs. confused?

Orientation

Without orientation, every session starts with a discussion between you and the agent about what's going on. And that discussion is expensive, error-prone, and scales terribly — the more projects you run, the worse it gets. Humans are bad at orienting agents because we know what we know and we get bored of typing it out. We miss things, we write it in different ways, we make off hand remarks that aren't that important but seem important, we forget updates. The agent needs to self-orient from the source of truth — the files themselves.

What works: a single entry point (config.md) that tells the agent who you are, what projects exist, what happened last session, what's been decided, what's urgent. The agent reads one file and has a complete map. From there it knows where to go for anything else.

In Workshop, session start is a 10-step self-orientation protocol. The agent loads the config, then user memory (identity, preferences, patterns, contacts, timeline, expertise, staleness tracking), then session history, stream registry, global decisions, cross-project flags, runs maintenance catch-up for anything the previous session missed, and clears its scratchpad. There's no formal briefing — the agent waits for you to speak first, then surfaces what's relevant: “Last session you were finalizing the Q2 roadmap. Your inbox has 2 items. Market research hasn't been touched in 10 days.”

This brings to a further point — as well as just leading to higher quality discussions, it is obviously better end-user experience for the agent to orient itself rather than the user. It makes everything magic again.

Bounded Attention

Your knowledge base is far too large for the agent to see all at once, but the agent needs to make decisions that are consistent with the whole thing. This is the fundamental tension — you need breadth of awareness with depth on the specific problem. Without designed loading, the agent either drowns in context (slow, expensive, loses focus) or operates on a random slice and misses critical dependencies. Tiered loading isn't a nice optimization — it's the only way to get coherent reasoning across a body of work that exceeds the context window.

In Workshop, context loading is tiered. The agent loads ~2-3K tokens of decisions-key every session, plus lightweight memory files and stream registry. Full research and analysis load on demand. Files over 50KB get distilled versions at ~30% of the original — lossy compression designed for an AI reader, not a human summary. Decisions, numbers, conclusions, and cross-references are preserved; reasoning chains are stripped. The agent loads distilled by default, pulls full files when it needs depth. Each stream template defines session-type budgets — session-type-aware loading with lightweight defaults: orientation loads a compressed context set; deep work sessions pull full files on demand. Basically UX for an AI's attention window.

The expertise tracker is another form of this — the agent knows what topics you understand deeply and what needs explanation. It doesn't waste your time or its context on things you already know.

Agent-native design is about what fills the context window and in what order. Compare what the agent loads in its first seconds.

Without Workshop: Agent loads raw files but no map, no history, no decisions, no context for what matters. Will spend the first 10 minutes asking you questions. Context window first 30 seconds: market-research.md (54%), competitor-analysis.md (15%), financial-model-notes.md (12%), meeting-notes-march.md (18%), landlord call mar15 (9%). 46% available for actual work.

With Workshop: Working by minute 1. Agent briefs you. Still asking questions at minute 5 without it. You brief the agent.

Without structure, raw files eat your context window before the agent has a map. With tiered loading, a tiny fraction of context gives full orientation.

What loads in the first 30 seconds determines whether the agent can work — or needs to ask you ten questions first.

Numbers below are illustrative.

WITHOUT WORKSHOP

54%of context used

46% free for work

market-research.md15%

competitor-analysis.md12%

financial-model-notes.md18%

meeting-notes-march.md9%

No map. No history. No decisions. Agent spends the first 10+ minutes asking you questions.

WITH WORKSHOP

9%of context used

91% free for work

config.mdorientation2%

session-log.mdcontinuity1.5%

decisions-key.mdjudgment history3%

daily-briefing.mdcurrent state1%

identity.mdwho you are0.8%

preferences.mdhow you work0.7%

Fully oriented. Knows the project, past decisions, current state, your preferences. Working by minute 1.

How it stays small: research and analysis files load on demand, not by default. Anything over 50KB gets a distilled version — decisions and key numbers preserved, reasoning chains stripped. The agent pulls depth when it needs it.

Structured Uncertainty

Your knowledge base is an evolving organism of decisions, assumptions, data and discussion. Everything is constantly going out of date. And here's the critical thing — because the agent sees the system for the first time each session, and can't load all the files at once, it's actually more in danger of acting on stale information than a human would be. A human at least has vague memories of what changed last week. The agent has nothing unless you build the structures. And because of the thinking-scaling argument from Blog 1 — we're asking the agent to hold far more state (more decisions, more assumptions, more interdependencies) than we'd ever ask of a human. All of that means it needs serious structures to avoid hallucinating, acting on old information, or projecting new assumptions onto outdated analysis.

Workshop handles this at multiple levels. Every file has YAML frontmatter with last_updated, confidence, sources, and status fields. The staleness tracker assigns formal levels — Fresh (within 14 days), Aging (15-30 days), Stale (31-60 days), Critical (60+ days) — and the agent blocks on Critical information before relying on it. Analysis files are automatically flagged stale when their cited research sources have been updated since the analysis was written. The assumptions register tracks every claim the plan depends on — with a confidence rating, what-changes-if-wrong, and how-to-validate.

The result is structured doubt. The system models its own uncertainty explicitly rather than pretending everything it knows is current. Trust comes from the agent flagging what's uncertain, not pretending everything's solid.

Self-Maintenance

Every agent session is ephemeral, but the work is continuous. If the system doesn't have protocols for the outgoing agent to prepare for the incoming one, you get entropy — skipped maintenance, lost context, broken cross-references. The Workshop degrades over time just like every other knowledge system. Self-maintenance is what makes the compounding promise from Blog 1 actually work rather than being theoretical. One elf preparing the workshop for the next elf.

Maintenance is tiered. Tier 1 runs every session — updating timestamps on modified files, syncing decisions, rewriting the project summary if direction changed. Tier 2 runs when triggered — adding new research to the catalog, checking analysis staleness against cited sources, verifying plan changes reference a decision number. Tier 3 is periodic — full staleness sweeps, token budget reviews, file map audits. Maintenance runs after each unit of work, not just at session end — because sessions can terminate without warning and the session-start protocol includes a catch-up step to detect and repair anything the previous agent missed.

The scratchpad is another self-maintenance mechanism — a running action log within each session that prevents circular patterns. Before reading a file or trying an approach, the agent checks the scratchpad to see if it already tried that and what happened. Cleared at session start, never persisted.

Cross-project flags are another piece of this — when the agent is working in one stream and notices something relevant to another, it doesn't interrupt the current work. It appends a flag (typed as dependency, decision, conflict, or fyi) that gets processed at the next session start. The system also has a priority resolution hierarchy for conflicts — stream-level decisions beat global decisions beat user preferences beat learned patterns. When two files disagree, there are explicit rules for which source is canonical and the agent rebuilds the derived version.

The result: every session leaves the workspace a little cleaner than it found it.

Ask the Agent

Something that should now be completely obvious — the agent is often better at identifying what it needs than you are, because you're not the one experiencing the friction (and also it's smarter than you). You don't know what it's like to walk into a project folder with no map. Ask it to critique its own environment. I wouldn't have thought of file maps or distilled versions of files on my own — the agent suggested both because it was the one struggling without them.

3. Knowledge Bases Should Look Like Codebases

So you've got a system the agent can operate. How should the actual information inside it be structured? My answer: look at what already works for agents. Codebases work.

The Structural Gap

Codebases have decades of accumulated discipline — clear file hierarchies, dependency management, type systems, READMEs, conventions. An agent can walk into a codebase, read the structure, understand the conventions, and start working. Knowledge bases have none of this. They're file dumps where core data is buried inside heterogeneous, multi-purpose files with no dependency tracking, no version history, and no README explaining what's here and how it connects.

Workshop revolves around templates (e.g., research vs. strategic-planning vs. greenfield coding). The comparison to code bases is like how frontend/web repos look different to AI research repos — they basically reflect how a senior engineer from each discipline might set up a new codebase. Each template gives a loose prior structure to the project (e.g., research is three layers: sources → notes → synthesis) that lets the agent have a clear expectation of how to work within this project. Having the templates form the project skeleton helps the agent from the start do best in class structuring and management, rather than building ad hoc as the project is made. Each template defines its own layer structure, file naming conventions, YAML frontmatter schema, session-type budgets, maintenance rules, and distilled file specifications. When you tell the agent you're starting something new, it picks the right template and scaffolds the project. If a research project evolves into strategic planning, the agent can migrate it, preserving all content.

Start with chaos. Structure it for agents. Then notice: you just built a codebase.

the old world

portland-foot-traffic-notes.md

oat-milk-supplier-report.md

competitor-menu-pricing (2).md

thoughts-on-second-location.md

landlord-call-mar15.md

revenue-model-v3.md

austin-vs-portland.md

ingredient-sourcing.md

pop-up-stall-idea.md

flat folder, no structure

knowledge project — built for agents

research/frozen evidence

analysis/interpretation

plan/commitments

CLAUDE.mdagent orientation

↔

codebase — standard

data/raw inputs

lib/logic

main.pyentry point

CLAUDE.md

Same file. Same purpose. Both domains. Structure your knowledge project for agents and you end up with something that looks like a codebase — because codebases were already structured for navigability. They just got there first.

The Git Problem

Codebases have version control — branching, history, blame, diffs, rollbacks. Decades of tooling for “what changed, when, why, and what does it affect?” Knowledge work has nothing equivalent. You make a decision and it lives in your head, or maybe in a doc somewhere, until you forget why you made it. This is something that is a problem right now (everyone has examples of having to backtrace “why did we do that?”) but with agents it's a much bigger problem because people work more solo and they work much faster.

Workshop's decision infrastructure is basically git for strategic thinking. It uses a three-tier pattern at both global and per-stream levels: KEY (quick-reference table, ~2K tokens, loaded every session), FULL (active decisions with full rationale, loaded on demand), and ARCHIVE (superseded decisions with “why superseded”, grep-searched when needed). Every decision gets a number, a rationale, a date, and trigger conditions for revisiting (“reconsider if Competitor X launches new product”). Decisions absorb other decisions as thinking crystallizes — four separate market-entry decisions collapse into one as strategy solidifies, with tombstones left for the absorbed entries. A quarterly compression protocol keeps things from bloating. The history is always preserved — you can trace why something was superseded. Plan files must reference a decision number to justify changes. The agent enforces this.

Then instead of trying to backtrace your decisions, you can just ask the agent — it accesses the decision tree, and reports back. Simple.

Information Primitives & the Pyramid

Building the system forced a question I hadn't expected: what actually is a file? The answer: a file is a narrative vessel, not a unit of information. The real units — the primitives — are facts, decisions, assumptions, questions, contacts, milestones, patterns. These are born inside files but live across files through citations, dependency graphs, and registers. A broader implication is that there's two sources of messiness in our knowledge bases: both the lack of structure of files within a knowledge base AND the fact that critical data, assumptions, decisions sit randomly within files that don't self-connect. Mess.

The goal is that we mirror code bases. We have a clear, extracted data layer. We have analysis layers similar to a random code file. We have top level plans similar to main files. More concretely files stop being sources of info, but instead should be narrative instruments — explaining how things are connected. Files are storytelling.

For the data, a single number from an interview enters as a data point in a research file (frozen on deposit, never edited), gets extracted into the assumptions register with a confidence rating, influences a decision about market timing, and surfaces in the investor narrative. One primitive, spanning 8+ files, tracked through the whole system. YAML frontmatter on every file — with sources, decisions, confidence, topics, and open_questions fields — is what makes this tracking machine-readable rather than implicit.

The Digestion Metaphor

Most knowledge systems are refrigerators — they keep information cold and static. Workshop is more like a digestive system. Raw research enters, gets broken down through analysis, key nutrients get extracted (decisions, assumptions, validated numbers), energy flows through the whole organism (plan updates, option re-evaluations). The four layers (research → analysis → options → plan) are the digestive tract. The dependency graph is the circulatory system. The staleness tracker is the immune system.

This is why the layers organize by purpose — evidence, interpretation, debate, commitment — not by topic. Each layer has a different rate of change and different rules. Research is frozen on deposit (evidence, never edited). Analysis interprets and must cite sources in frontmatter — if cited research is newer than the analysis's last_updated, it's automatically flagged stale. Options maintain active debates with scoring matrices and explicit trigger conditions for resolution. Plans commit, and changes require a decision number. New interpretation always goes in new files that cite the originals. The strategic-planning template codifies all of this; other templates (research, product-build, operations) have their own layer structures tuned to different kinds of work.

4. Build It With Markdown, Not Code

So how do you actually build all of this? The answer honestly surprised me: markdown files and English. That's it. No code.

Markdown as Backend

MD files are stable prompts. A mix of instructions, system designs, and contracts that form the backend defining your relationship with the agent. The config files are the operating system. The project files are the library. The corrections logs are warning signs. Session logs are commit history. The whole thing runs on text files that both you and the agent can read and edit.

Workshop has three levels of .agent/ — global (workspace-wide config, session log, stream registry, decisions), per-scope (templates for project types vs code types), and per-stream (each project's own config, index, and decisions). This layering means global rules cascade down but each stream can override with its own conventions. Everything is git-trackable. The system even has auto-heal — if expected files are missing at session start, the agent degrades gracefully, rebuilding what it can from canonical sources (e.g., rebuilding decisions-key from decisions-full, or scanning directories to rebuild the stream registry) and alerting you to what was lost. Agent-maintained, user-visible, recoverable.

Why Code Is Wrong

The work Workshop manages — thinking, analysis, strategic debate — isn't deterministic. There's no fixed pipeline. You can't hardcode a flowchart for “figure out whether to enter the Phoenix market.” Code enforces structure, and knowledge work is about judgment.

English-language protocols give the agent intent and let its intelligence figure out execution. The session start protocol doesn't say “load file X then file Y.” It says “orient yourself to the current state, load what's relevant, flag what's stale.” Different every session because the context is different. And when the system needs to evolve, you edit a markdown file. No refactoring, no broken dependencies. The agent reads the new instructions next session and behaves differently.

Code should be built and run by agents as needed, case-by-case, to solve specific tasks — not to orchestrate or manage the system itself.

From Prompts to Constitutions

There's a progression. A prompt is a one-shot instruction — useful but ephemeral. An agreement is scoped to a category of work: “here's how we handle research files.” A constitution is the full framework — what the agent decides without asking, what needs your sign-off, how conflicts get resolved, how institutional memory persists across sessions, how the system maintains and improves itself. You need the constitution.

Anthropic is the classic example here — it runs Claude under a constitution — a framework of values and authorities for situations nobody anticipated. My config file in .agent is a tiny analogous version. The agent doesn't ask me how to start a session or where to file a decision. I set some expectations, some ways of working and it operates within constitutional bounds and figures out the rest. Obviously Claude's constitution is broader and more thoughtful, but the key thing here is handing off control and accepting a level of ambiguity — accepting that the agent needs to have the freedom to make good decisions under your broad guidance without set prescription.

Three levels of instructing an agent:

Prompt — directions to one destination. One interaction. 95% human · 5% agent.

Agreement — a job description. Persistent across sessions. 55% human · 45% agent.

Constitution — founding an organization. Evolving — the agent helps improve it. 20% human · 80% agent.

A prompt gets you an output. A constitution gets you an operating partner. One scales linearly with your attention. The other scales with the agent's capability.

Three levels of instructing an agent. Each expands the agent's autonomy and reduces your friction. Click a tier to see a real example.

prompt— directions to one destination

One interaction

95% human · 5% agent

agreement— a job description

Persistent across sessions

55% human · 45% agent

constitution— founding an organization

Evolving — the agent helps improve it

20% human · 80% agent

The difference between a prompt and a constitution is the difference between giving directions and founding an organization. One scales linearly with your attention. The other scales with the agent's capability.

5. Conclusion

These are principles, not a spec. Workshop is a first prototype — there's a ton of mistakes, some I know about, some I don't. The principles matter more than my specific implementation. If you want to see how I applied them, read the Workshop implementation blog. If you want to build something better — I'd love to hear about it!