← writing

Building a Workshop, Part 3: The System in Practice

Ok so having gotten all philosophical in the previous blog, let's talk my implementation of Workshop.

For context: this is the analogy to work with:

As a last (bad) metaphor here, let's say agents are elves in Santa's workshop, and you need a new elf to use the workshop to make a certain present.

Every time you start a session, a new elf walks into Santa's workshop (here being the folder you've given it in cowork, or including whatever files you've given the chatbot). Brilliant, fast, tireless (and increasingly so) — but with zero memory of yesterday's shift. It has never seen this workshop before. What does it find?

In almost 100% of cases right now, the answer is a pile of unlabeled files on the floor. This means you the user have to spend the first 20 minutes explaining where everything is & then constantly redirect the elf during work. If instead, the answer is a well-designed workshop — clear instructions on the wall, an organized library, warning signs from previous shifts, an intake desk for new materials — the elf sees the room, reads the instructions and gets to work.

Basically — AI research, engineering and product focus has almost entirely gone to smarter elves and more machines to do things, but a smarter elf doesn't fix the fact that your Workshop is a disaster. But we could build a great workshop — a Workshop that empowers the elf to build as autonomously and effectively as possible.

Ok. Let's talk implementation. Here's the layout:

Workshop operates with interconnecting MD file systems. This is what they do:

  • .agent (including Claude.md and config.md) is the operating system. It tells the elf how to operate in this Workshop.
  • .memory tells the elf about the user — identity, preferences, patterns, contacts, timeline, expertise.
  • _system manages operational infrastructure — subdirectories are inbox/, coordination/, integrations/, and storage/.
  • projects and code are the workbenches for individual workstreams.

In this Workshop, the agent is actively onboarded to the workspace's norms, file system and expectations via immediately reading the .config (which leads it to other key files). It's fully onboarded to everything it and you need right from the beginning, and it also knows how to autonomously manage the system.

elf
The elf walks in and reads the room
Each session, a new agent enters this workshop cold. Everything below is what it finds. Click any file to inspect.
door
intake desk
inbox/triage
todotasks
integrations/external
user chat
CLAUDE.md
config.mdreads first
instruction board— .agent/
session-log.mdshift history
scratchpad.mdworking memory
stream-registry.mdproject index
decisions/decision register
staleness.mdfreshness tracker
reference/procedure manual
personnel file— .memory/
identity.mdwho you are
preferences.mdhow you work
patterns.mdobservations
contacts.mdyour people
timeline.mddeadlines
expertise.mdknowledge depth
workbenches— projects/ & code/
12 templates — e.g. strategic-planning:
research/
evidence
analysis/
interpretation
options/
debates
plan/
committed
+ product-build, research, operations, code-greenfield, ...
.agent/per-workbench config, index, log, decisions

The user chats with the elf at the front counter. The elf reads the instruction board and personnel file for context → works at the workbenches → leaves the workshop better than it found it. Async files arrive at the intake desk.

Zero memory. Zero context. Ten steps later, the elf knows everything. Click each step to trace the journey.

1
Read config.mdMaster protocol — what to do, in what order
reads
config.md (full protocol, 3K tokens)
learns
Session structure, how to read the room, when to ask for help
file snippet
Session start protocol: [10-step sequence for reading files in order, context gathering, orientation]
2
Load user profileIdentity, preferences, patterns, contacts, timeline
reads
identity.md, preferences.md, patterns.md, contacts.md
learns
Who the user is, how they work, their contacts and commitments
file snippet
identity.md: Role=Head of Operations. Org=vegan bakery. Priorities=[menu expansion, cost per cupcake, waste reduction]
3
Load expertise & stalenessWhat topics the user knows deeply; what's aged out
reads
expertise.md, staleness.md
learns
Don't explain what the user already knows; flag anything critically stale
file snippet
expertise.md: deep familiarity with unit economics, vegan supply chains. staleness.md: ingredient costs 15d (Aging), rent survey 0d (Fresh)
4
Read session logWhat happened last time, unresolved items
reads
session-log.md (last 5 entries in full)
learns
Continuity from last session, what was in progress, what to resume
file snippet
Last session (8d ago): Portland vs. Austin decision open. Ingredient costs research flagged. Loan approval countdown active.
5
Scan projectsStream registry — active projects, status, last activity
reads
stream-registry.md
learns
What projects exist, which are active, which dormant, last touch dates
file snippet
portland-location: strategic-planning, active, last touched 8d ago, status=decision-pending
6
Check timelineActive milestones, countdowns, deadlines
reads
timeline.md
learns
What's time-sensitive, what's approaching, what's overdue
file snippet
Loan approval: 47d to target close. Menu expansion review: 3d overdue. Portland site visit: not yet scheduled.
7
Review decisionsQuick-reference index of active decisions
reads
decisions/decisions-key.md (~2K tokens)
learns
What decisions are open, which are decided, which are aging
file snippet
#R5: Portland vs. Austin. Status: open. Age: 12d. Triggers: [foot traffic <300/day, rent >$3.5k/month]
8
Check flagsSignals from other projects that need attention
reads
cross-project-flags.md
learns
Dependencies, blockers, or data from other workstreams
file snippet
FLAG [dependency]: austin-location depends on loan approval timing (from portland project)
9
Maintenance catch-upRepair any gaps from previous session
reads
File integrity checks, missing references
learns
What needs repair, consistency status
file snippet
Checks: Decision #R5 still referenced by cost-per-cupcake.md? Yes. Staleness propagated to dependents? Yes.
10
Clear scratchpadFresh working memory for this session
reads
None (clearing old content)
learns
Now ready to work with a blank slate
file snippet
--- SESSION START: 2024-03-18 --- [Elf ready. Zero memory. Full context loaded from files.]
ready
Awaiting your first message — context surfaces naturally as you speak
User: “Let's continue with Portland.” — Agent: “The Portland decision is open. Ingredient costs are 15 days stale. Pre-seed SAFE: 47 days to target close. Two inbox items pending. Where do you want to start?”
The elf was ready. You spoke first. Context surfaced naturally.

Even better, it can leave the workshop in a better state than they found it. End-of-shift protocols update the session log, flag incomplete work, refresh the briefing board. One agent maintaining the workshop for the next one.

The result is that starting discussions looks very different in practice. I don't need to give context, and it can operate autonomously.

Ok let's talk through some examples of where this is effective. Most basic example: integrating files or information into the system. Everyone has a problem finding information in a large knowledge base, and even worse knowing what's correct and up to date. Manual management is hard at <10 file scale, and impossible where most people work with '00s or '000s of files per project.

Here's how it should look instead. The Workshop has an intake desk. Users can drop files in, the agent classifies, routes to the right workbench, registers dependencies, flags what's now stale.

Information enters. The workshop decides where it belongs. Click a step to trace the process.

1
file arrivesDropped in chat, emailed, or deposited in incoming/
detail
New file enters the system. Could be research, evidence, a thought, or context from outside.
example
2024-03-15-portland-foot-traffic-survey.md arrives in chat.
2
classifiedWhat type? Which project? Confidence score assigned.
detail
The intake process analyzes: is this research, analysis, evidence, or metadata? Which project does it belong to?
example
Type: research. Project: portland-location. Confidence: 92%.
3
routedHigh confidence → auto-route. Low → held for review.
detail
Decision point: the agent uses judgment — high-confidence items route automatically to the project's research/ folder; ambiguous items halt for human review.
example
High confidence. Auto-routed to projects/portland-location/research/.
4
registeredDependencies mapped. Added to project index.
detail
System identifies: what other files depend on this? Is it research (frozen) or living analysis (traceable)?
example
Identified dependencies: cost-per-cupcake.md references this. Marked as frozen evidence.
5
stale items flaggedDownstream files that relied on older data get flagged.
detail
If new survey contradicts old assumptions, everything downstream becomes stale — revenue models, decisions, analyses.
example
New foot traffic data contradicts A7 ('Portland foot traffic stable'). Flags: assumptions register, cost-per-cupcake, revenue model, decision #R5.
Research is frozen on deposit — evidence, never edited. New interpretation goes in new files that cite the originals. Routing rules learn from every auto-route and human override.

In Workshop, information enters via the intake desk, gets triaged and routed to the right workbench or gets registered in the dependency map. From there the agent manages it — updating cross-references, flagging staleness, loading the right context into conversations, filing outputs from your discussions, running end-of-shift maintenance, and syncing to git. Research gets frozen on deposit (evidence, never edited). New interpretation goes in new files that cite the originals. All managed autonomously by the agent.

Let's take something else. You're talking to an agent about a problem and make a decision. What then — you just write it down and hope you remember the what and why, and then successfully waterfall it across your work? Not ideal. Here's a better way:

A decision is made. The workshop never forgets why. Click any stage to trace decision #R5: Portland vs. Austin.

1
CreatedDecision surfaces during discussion or analysis.
content
We're opening a second vegan bakery location. Portland has higher foot traffic but less established vegan scene. Austin has lower rent but seasonal slowdown.
example from file
During project kickoff, cost-per-cupcake analysis shows both cities viable but with different risk profiles.
2
Numbered & indexedAssigned an ID (e.g. #R5). Added to quick-reference index.
content
Decision #R5: Portland vs. Austin location selection
example from file
decisions/decisions-key.md entry: R5 | open | Portland vs Austin | strategic choice, impact on revenue model and growth risk
3
Rationale recordedWhy this call was made, what evidence supports it, what was rejected.
content
Portland rationale: foot traffic trending up (upside) but rent steep, revenue margin 28%, 8-month break-even. Austin rationale: lower rent (savings), revenue margin 24%, 12-month break-even but less traffic growth. Call: Portland chosen — foot traffic growth justifies higher rent given current market.
example from file
decisions/decisions-full.md entry: [full analysis, evidence cited, competing options considered, confidence level, dated signature]
4
Triggers set'What would change this?' — conditions that reopen the decision.
content
Triggers: [1] If Portland foot traffic drops below 300/day (market softening), [2] If Austin rent drops below $2.8k/month (better deal emerges), [3] If loan approval delayed beyond 60 days (changes financing), [4] If ingredient costs rise >15% (model breaks)
example from file
triggers.yaml: [{id: portland-traffic, threshold: '<300/day', source: survey-data}, ...]
5
Absorbs relatedAs thinking crystallizes, related decisions merge. Trail preserved.
content
Decision #R12 (wholesale café supply strategy) converges with #R5. Ingredient sourcing costs were being considered separately; now merged into revenue model of the Portland choice. Trail: 'R12 absorbed into R5 [date]: ingredient sourcing impact now part of per-cupcake margin calculation'
example from file
R5 references: → cost-per-cupcake.md → revenue-model.md. R12 historically separate, now cross-referenced with decision rationale.
6
Warning postedNew data hits a trigger → flag appears → decision reopens with full context.
content
Survey data arrives: 'Portland weekend foot traffic dropping.' System detects: contradicts assumption (foot traffic stable), triggers decision reopening. Warning flag: 'Decision #R5 (Portland vs Austin) — trigger fired: Portland foot traffic data trending.' Full context loaded: original rationale, evidence, competing options.
example from file
staleness.md flags R5. Briefing alerts: 'Portland decision trigger: new foot traffic data. Recommend review.' Full decision context auto-loaded for re-evaluation.

Decision #R5 is live. It has triggers. When a trigger fires, the decision reopens with full original context automatically loaded. No guessing. No re-deriving. The rationale is preserved forever.

This is Version control for strategic thinking. Every decision gets a number, a rationale, a “what would change this” trigger. Decisions absorb other decisions as thinking crystallizes. You never lose “why did we decide that?” The elf can always trace a plan back to the decision that justified it.

Now what about managing key assumptions & data? Often one new number from an interview touches many things — unit economics, the scaling playbook, and the investor narrative. When it changes, none of those files know. Normally people try to update key assumptions in your financial model or your slides. But you're likely to miss things, and you need to bring your memories of what and why things happened to make sense of the contradicting information later on. We can do better than that.

One number changes. The workshop traces every consequence. Trigger the cascade, then click each node to update it.

research/
Survey data
Customer survey shows: 'Portland weekend foot traffic dropping 20%'
new data
Foot traffic trend: stable year-round (NEW DATA — previous assumption invalid)
replaces
Foot traffic trend: stable year-round (assumption)
analysis/
Assumptions register
A7: 'Portland foot traffic stable' — contradicted by new data
before
A7: 'Portland foot traffic stable year-round'
flagged stale
A7: STALE — new survey contradicts. Weekend foot traffic dropping.
updated
A7 revised: 'Portland foot traffic dropping ~20% in summer weekends.' Evidence: customer survey (2024-03-15).
depends on
depends on: research/2024-03-15-portland-foot-traffic-survey.md
analysis/
Cost-per-cupcake model
Assumes stable foot traffic for daily volume projections
before
Revenue per cupcake = price - ingredients - (labor / daily_volume × foot_traffic_A7)
flagged stale
Revenue per cupcake: STALE — uses A7 (foot traffic stable), now contradicted
updated
Revenue per cupcake recalculated with revised A7. Daily volume down 8% from original model.
depends on
depends on: assumptions-register.md (A7)
analysis/
Revenue model
Monthly profit calculation uses cost-per-cupcake
before
Monthly profit = SUMIF(per_cupcake_revenue_stable)
flagged stale
Monthly profit: STALE — cascades from cost_per_cupcake staleness
updated
Monthly profit recalculated: $2,100 (was $2,400). Break-even pushed to month 9 (was 8).
depends on
depends on: cost-per-cupcake.md
decisions/
Portland vs. Austin
Decision #R5 — recommends Portland
before
Decision #R5: Portland recommended (monthly profit $2,400 > Austin $1,800)
flagged stale
Decision #R5: TRIGGER FIRED — stale downstream data. Recommend review.
updated
Decision #R5 reopened: Portland monthly profit now $2,100 vs Austin $1,800. Gap narrowed. Recommend re-evaluation.
depends on
depends on: revenue-model.md
staleness propagation
One new data point enters at the top. The system traces every file that depends on the old assumption and flags them all as stale.
Without this, you'd be making a Portland vs. Austin decision based on data you don't know is wrong.

Workshop tracks the things that flow through files, not just the files themselves. A finding gets extracted, confidence-rated, cross-referenced, and tracked across every file it touches. When something upstream changes, everything downstream gets flagged. Staleness tracking ensures the elf never relies on old information without knowing it's old — structured doubt. Warning signs appear automatically.

By the way — this doesn't even get into managing context. Serious analysis requires loading massive context, cross-referencing across dozens of files, and producing outputs that cite their sources and slot into the broader knowledge base. You can't just hand the elf the entire library and say “read everything.”

With the right posters (file maps, dependencies, decisions), you tell the elf what to do, not how to do it. It makes the decisions about which books to pull from the library and only takes what it needs. Further, we can easily implement a token-budget-aware context loading. Every substantial file has a ~30% distilled version — lossy compression designed for an AI reader, not a human summary. The elf loads distilled by default, pulls full files when it needs depth.

Without structure, raw files eat your context window before the agent has a map. With tiered loading, a tiny fraction of context gives full orientation.
What loads in the first 30 seconds determines whether the agent can work — or needs to ask you ten questions first.
Numbers below are illustrative.
WITHOUT WORKSHOP
54%of context used
46% free for work
market-research.md15%
competitor-analysis.md12%
financial-model-notes.md18%
meeting-notes-march.md9%
No map. No history. No decisions. Agent spends the first 10+ minutes asking you questions.
WITH WORKSHOP
9%of context used
91% free for work
config.mdorientation2%
session-log.mdcontinuity1.5%
decisions-key.mdjudgment history3%
daily-briefing.mdcurrent state1%
identity.mdwho you are0.8%
preferences.mdhow you work0.7%
Fully oriented. Knows the project, past decisions, current state, your preferences. Working by minute 1.
How it stays small: research and analysis files load on demand, not by default. Anything over 50KB gets a distilled version — decisions and key numbers preserved, reasoning chains stripped. The agent pulls depth when it needs it.

conclusion

Workshop works but it's far from perfect — it's very much ‘design fiction’. If you have better ideas for how to build this system get in touch & in the meantime go download Workshop on Github if you're interested.

Workshop v1.0 is available on GitHub — go download it if you're interested in running this yourself.