Building a Workshop, Part 3: The System in Practice

Ok so having gotten all philosophical in the previous blog, let's talk my implementation of Workshop.

For context: this is the analogy to work with:

As a last (bad) metaphor here, let's say agents are elves in Santa's workshop, and you need a new elf to use the workshop to make a certain present.
Every time you start a session, a new elf walks into Santa's workshop (here being the folder you've given it in cowork, or including whatever files you've given the chatbot). Brilliant, fast, tireless (and increasingly so) — but with zero memory of yesterday's shift. It has never seen this workshop before. What does it find?
In almost 100% of cases right now, the answer is a pile of unlabeled files on the floor. This means you the user have to spend the first 20 minutes explaining where everything is & then constantly redirect the elf during work. If instead, the answer is a well-designed workshop — clear instructions on the wall, an organized library, warning signs from previous shifts, an intake desk for new materials — the elf sees the room, reads the instructions and gets to work.
Basically — AI research, engineering and product focus has almost entirely gone to smarter elves and more machines to do things, but a smarter elf doesn't fix the fact that your Workshop is a disaster. But we could build a great workshop — a Workshop that empowers the elf to build as autonomously and effectively as possible.

Ok. Let's talk implementation. Here's the layout:

Workshop operates with interconnecting MD file systems. This is what they do:

.agent (including Claude.md and config.md) is the operating system. It tells the elf how to operate in this Workshop.
.memory tells the elf about the user — identity, preferences, patterns, contacts, timeline, expertise.
_system manages operational infrastructure — subdirectories are inbox/, coordination/, integrations/, and storage/.
projects and code are the workbenches for individual workstreams.

In this Workshop, the agent is actively onboarded to the workspace's norms, file system and expectations via immediately reading the .config (which leads it to other key files). It's fully onboarded to everything it and you need right from the beginning, and it also knows how to autonomously manage the system.

The elf walks in and reads the room

Each session, a new agent enters this workshop cold. Everything below is what it finds. Click any file to inspect.

↓

door

intake desk

inbox/ — triage

todo — tasks

integrations/ — external

user chat

CLAUDE.md

config.mdreads first

instruction board— .agent/

session-log.md — shift history

scratchpad.md — working memory

stream-registry.md — project index

decisions/ — decision register

staleness.md — freshness tracker

reference/ — procedure manual

personnel file— .memory/

identity.md — who you are

preferences.md — how you work

patterns.md — observations

contacts.md — your people

timeline.md — deadlines

expertise.md — knowledge depth

workbenches— projects/ & code/

12 templates — e.g. strategic-planning:

research/

evidence

analysis/

interpretation

options/

debates

plan/

committed

+ product-build, research, operations, code-greenfield, ...

.agent/ — per-workbench config, index, log, decisions

The user chats with the elf at the front counter. The elf reads the instruction board and personnel file for context → works at the workbenches → leaves the workshop better than it found it. Async files arrive at the intake desk.

Zero memory. Zero context. Ten steps later, the elf knows everything. Click each step to trace the journey.

Read config.md — Master protocol — what to do, in what order

reads

config.md (full protocol, 3K tokens)

learns

Session structure, how to read the room, when to ask for help

file snippet

Session start protocol: [10-step sequence for reading files in order, context gathering, orientation]

Load user profile — Identity, preferences, patterns, contacts, timeline

reads

identity.md, preferences.md, patterns.md, contacts.md

learns

Who the user is, how they work, their contacts and commitments

file snippet

identity.md: Role=Head of Operations. Org=vegan bakery. Priorities=[menu expansion, cost per cupcake, waste reduction]

Load expertise & staleness — What topics the user knows deeply; what's aged out

reads

expertise.md, staleness.md

learns

Don't explain what the user already knows; flag anything critically stale

file snippet

expertise.md: deep familiarity with unit economics, vegan supply chains. staleness.md: ingredient costs 15d (Aging), rent survey 0d (Fresh)

Read session log — What happened last time, unresolved items

reads

session-log.md (last 5 entries in full)

learns

Continuity from last session, what was in progress, what to resume

file snippet

Last session (8d ago): Portland vs. Austin decision open. Ingredient costs research flagged. Loan approval countdown active.

Scan projects — Stream registry — active projects, status, last activity

reads

stream-registry.md

learns

What projects exist, which are active, which dormant, last touch dates

file snippet

portland-location: strategic-planning, active, last touched 8d ago, status=decision-pending

Check timeline — Active milestones, countdowns, deadlines

reads

timeline.md

learns

What's time-sensitive, what's approaching, what's overdue

file snippet

Loan approval: 47d to target close. Menu expansion review: 3d overdue. Portland site visit: not yet scheduled.

Review decisions — Quick-reference index of active decisions

reads

decisions/decisions-key.md (~2K tokens)

learns

What decisions are open, which are decided, which are aging

file snippet

#R5: Portland vs. Austin. Status: open. Age: 12d. Triggers: [foot traffic <300/day, rent >$3.5k/month]

Check flags — Signals from other projects that need attention

reads

cross-project-flags.md

learns

Dependencies, blockers, or data from other workstreams

file snippet

FLAG [dependency]: austin-location depends on loan approval timing (from portland project)

Maintenance catch-up — Repair any gaps from previous session

reads

File integrity checks, missing references

learns

What needs repair, consistency status

file snippet

Checks: Decision #R5 still referenced by cost-per-cupcake.md? Yes. Staleness propagated to dependents? Yes.

Clear scratchpad — Fresh working memory for this session

reads

None (clearing old content)

learns

Now ready to work with a blank slate

file snippet

--- SESSION START: 2024-03-18 ---
[Elf ready. Zero memory. Full context loaded from files.]

ready

Awaiting your first message — context surfaces naturally as you speak

User: “Let's continue with Portland.” — Agent: “The Portland decision is open. Ingredient costs are 15 days stale. Pre-seed SAFE: 47 days to target close. Two inbox items pending. Where do you want to start?”

The elf was ready. You spoke first. Context surfaced naturally.

Even better, it can leave the workshop in a better state than they found it. End-of-shift protocols update the session log, flag incomplete work, refresh the briefing board. One agent maintaining the workshop for the next one.

The result is that starting discussions looks very different in practice. I don't need to give context, and it can operate autonomously.

Ok let's talk through some examples of where this is effective. Most basic example: integrating files or information into the system. Everyone has a problem finding information in a large knowledge base, and even worse knowing what's correct and up to date. Manual management is hard at <10 file scale, and impossible where most people work with '00s or '000s of files per project.

Here's how it should look instead. The Workshop has an intake desk. Users can drop files in, the agent classifies, routes to the right workbench, registers dependencies, flags what's now stale.

Information enters. The workshop decides where it belongs. Click a step to trace the process.

file arrives — Dropped in chat, emailed, or deposited in incoming/

detail

New file enters the system. Could be research, evidence, a thought, or context from outside.

example

2024-03-15-portland-foot-traffic-survey.md arrives in chat.

classified — What type? Which project? Confidence score assigned.

detail

The intake process analyzes: is this research, analysis, evidence, or metadata? Which project does it belong to?

example

Type: research. Project: portland-location. Confidence: 92%.

routed — High confidence → auto-route. Low → held for review.

detail

Decision point: the agent uses judgment — high-confidence items route automatically to the project's research/ folder; ambiguous items halt for human review.

example

High confidence. Auto-routed to projects/portland-location/research/.

registered — Dependencies mapped. Added to project index.

detail

System identifies: what other files depend on this? Is it research (frozen) or living analysis (traceable)?

example

Identified dependencies: cost-per-cupcake.md references this. Marked as frozen evidence.

stale items flagged — Downstream files that relied on older data get flagged.

detail

If new survey contradicts old assumptions, everything downstream becomes stale — revenue models, decisions, analyses.

example

New foot traffic data contradicts A7 ('Portland foot traffic stable'). Flags: assumptions register, cost-per-cupcake, revenue model, decision #R5.

Research is frozen on deposit — evidence, never edited. New interpretation goes in new files that cite the originals. Routing rules learn from every auto-route and human override.

In Workshop, information enters via the intake desk, gets triaged and routed to the right workbench or gets registered in the dependency map. From there the agent manages it — updating cross-references, flagging staleness, loading the right context into conversations, filing outputs from your discussions, running end-of-shift maintenance, and syncing to git. Research gets frozen on deposit (evidence, never edited). New interpretation goes in new files that cite the originals. All managed autonomously by the agent.

Let's take something else. You're talking to an agent about a problem and make a decision. What then — you just write it down and hope you remember the what and why, and then successfully waterfall it across your work? Not ideal. Here's a better way:

A decision is made. The workshop never forgets why. Click any stage to trace decision #R5: Portland vs. Austin.

Created — Decision surfaces during discussion or analysis.

content

We're opening a second vegan bakery location. Portland has higher foot traffic but less established vegan scene. Austin has lower rent but seasonal slowdown.

example from file

During project kickoff, cost-per-cupcake analysis shows both cities viable but with different risk profiles.

Numbered & indexed — Assigned an ID (e.g. #R5). Added to quick-reference index.

content

Decision #R5: Portland vs. Austin location selection

example from file

decisions/decisions-key.md entry: R5 | open | Portland vs Austin | strategic choice, impact on revenue model and growth risk

Rationale recorded — Why this call was made, what evidence supports it, what was rejected.

content

Portland rationale: foot traffic trending up (upside) but rent steep, revenue margin 28%, 8-month break-even. Austin rationale: lower rent (savings), revenue margin 24%, 12-month break-even but less traffic growth. Call: Portland chosen — foot traffic growth justifies higher rent given current market.

example from file

decisions/decisions-full.md entry: [full analysis, evidence cited, competing options considered, confidence level, dated signature]

Triggers set — 'What would change this?' — conditions that reopen the decision.

content

Triggers: [1] If Portland foot traffic drops below 300/day (market softening), [2] If Austin rent drops below $2.8k/month (better deal emerges), [3] If loan approval delayed beyond 60 days (changes financing), [4] If ingredient costs rise >15% (model breaks)

example from file

triggers.yaml: [{id: portland-traffic, threshold: '<300/day', source: survey-data}, ...]

Absorbs related — As thinking crystallizes, related decisions merge. Trail preserved.

content

Decision #R12 (wholesale café supply strategy) converges with #R5. Ingredient sourcing costs were being considered separately; now merged into revenue model of the Portland choice. Trail: 'R12 absorbed into R5 [date]: ingredient sourcing impact now part of per-cupcake margin calculation'

example from file

R5 references: → cost-per-cupcake.md → revenue-model.md. R12 historically separate, now cross-referenced with decision rationale.

Warning posted — New data hits a trigger → flag appears → decision reopens with full context.

content

Survey data arrives: 'Portland weekend foot traffic dropping.' System detects: contradicts assumption (foot traffic stable), triggers decision reopening. Warning flag: 'Decision #R5 (Portland vs Austin) — trigger fired: Portland foot traffic data trending.' Full context loaded: original rationale, evidence, competing options.

example from file

staleness.md flags R5. Briefing alerts: 'Portland decision trigger: new foot traffic data. Recommend review.' Full decision context auto-loaded for re-evaluation.

Decision #R5 is live. It has triggers. When a trigger fires, the decision reopens with full original context automatically loaded. No guessing. No re-deriving. The rationale is preserved forever.

This is Version control for strategic thinking. Every decision gets a number, a rationale, a “what would change this” trigger. Decisions absorb other decisions as thinking crystallizes. You never lose “why did we decide that?” The elf can always trace a plan back to the decision that justified it.

Now what about managing key assumptions & data? Often one new number from an interview touches many things — unit economics, the scaling playbook, and the investor narrative. When it changes, none of those files know. Normally people try to update key assumptions in your financial model or your slides. But you're likely to miss things, and you need to bring your memories of what and why things happened to make sense of the contradicting information later on. We can do better than that.

One number changes. The workshop traces every consequence. Trigger the cascade, then click each node to update it.

research/

Survey data

Customer survey shows: 'Portland weekend foot traffic dropping 20%'

new data

Foot traffic trend: stable year-round (NEW DATA — previous assumption invalid)

replaces

Foot traffic trend: stable year-round (assumption)

analysis/

Assumptions register

A7: 'Portland foot traffic stable' — contradicted by new data

before

A7: 'Portland foot traffic stable year-round'

flagged stale

A7: STALE — new survey contradicts. Weekend foot traffic dropping.

updated

A7 revised: 'Portland foot traffic dropping ~20% in summer weekends.' Evidence: customer survey (2024-03-15).

depends on

depends on: research/2024-03-15-portland-foot-traffic-survey.md

analysis/

Cost-per-cupcake model

Assumes stable foot traffic for daily volume projections

before

Revenue per cupcake = price - ingredients - (labor / daily_volume × foot_traffic_A7)

flagged stale

Revenue per cupcake: STALE — uses A7 (foot traffic stable), now contradicted

updated

Revenue per cupcake recalculated with revised A7. Daily volume down 8% from original model.

depends on

depends on: assumptions-register.md (A7)

analysis/

Revenue model

Monthly profit calculation uses cost-per-cupcake

before

Monthly profit = SUMIF(per_cupcake_revenue_stable)

flagged stale

Monthly profit: STALE — cascades from cost_per_cupcake staleness

updated

Monthly profit recalculated: $2,100 (was $2,400). Break-even pushed to month 9 (was 8).

depends on

depends on: cost-per-cupcake.md

decisions/

Portland vs. Austin

Decision #R5 — recommends Portland

before

Decision #R5: Portland recommended (monthly profit $2,400 > Austin $1,800)

flagged stale

Decision #R5: TRIGGER FIRED — stale downstream data. Recommend review.

updated

Decision #R5 reopened: Portland monthly profit now $2,100 vs Austin $1,800. Gap narrowed. Recommend re-evaluation.

depends on

depends on: revenue-model.md

staleness propagation

One new data point enters at the top. The system traces every file that depends on the old assumption and flags them all as stale.

Without this, you'd be making a Portland vs. Austin decision based on data you don't know is wrong.

Workshop tracks the things that flow through files, not just the files themselves. A finding gets extracted, confidence-rated, cross-referenced, and tracked across every file it touches. When something upstream changes, everything downstream gets flagged. Staleness tracking ensures the elf never relies on old information without knowing it's old — structured doubt. Warning signs appear automatically.

By the way — this doesn't even get into managing context. Serious analysis requires loading massive context, cross-referencing across dozens of files, and producing outputs that cite their sources and slot into the broader knowledge base. You can't just hand the elf the entire library and say “read everything.”

With the right posters (file maps, dependencies, decisions), you tell the elf what to do, not how to do it. It makes the decisions about which books to pull from the library and only takes what it needs. Further, we can easily implement a token-budget-aware context loading. Every substantial file has a ~30% distilled version — lossy compression designed for an AI reader, not a human summary. The elf loads distilled by default, pulls full files when it needs depth.

Without structure, raw files eat your context window before the agent has a map. With tiered loading, a tiny fraction of context gives full orientation.

What loads in the first 30 seconds determines whether the agent can work — or needs to ask you ten questions first.

Numbers below are illustrative.

WITHOUT WORKSHOP

54%of context used

46% free for work

market-research.md15%

competitor-analysis.md12%

financial-model-notes.md18%

meeting-notes-march.md9%

No map. No history. No decisions. Agent spends the first 10+ minutes asking you questions.

WITH WORKSHOP

9%of context used

91% free for work

config.mdorientation2%

session-log.mdcontinuity1.5%

decisions-key.mdjudgment history3%

daily-briefing.mdcurrent state1%

identity.mdwho you are0.8%

preferences.mdhow you work0.7%

Fully oriented. Knows the project, past decisions, current state, your preferences. Working by minute 1.

How it stays small: research and analysis files load on demand, not by default. Anything over 50KB gets a distilled version — decisions and key numbers preserved, reasoning chains stripped. The agent pulls depth when it needs it.

conclusion

Workshop works but it's far from perfect — it's very much ‘design fiction’. If you have better ideas for how to build this system get in touch & in the meantime go download Workshop on Github if you're interested.

Workshop v1.0 is available on GitHub — go download it if you're interested in running this yourself.