Building a Workshop, Part 1: Smarter Elves Don't Fix a Broken Workshop

The agent ecosystem has solved autonomous action for coders. For knowledge workers, nobody's built the equivalent — the layer where an agent actually maintains your knowledge base as a living system. This is about why that gap matters more than it looks, and the prototype I built to start filling it.

1. ben complains

Since January, I now spend most of my working day in Claude Cowork. There are a few main things I do with it:

Do collaborative research work on product ideas, markets — generally a combination of discussing a problem, setting off deep research agents to think about it, and then coming back to iterate
Run large scale analysis work (e.g., segment these target customers, analyse them, and write custom emails)
Prototyping designs for websites & similar
Build research-based products (blog coming soon)

I use Cowork when I'm thinking about a project, not a one off problem, and obviously when the project is non-coding focused (though Cowork can do ad-hoc coding). Cowork beats claude.ai because it has better skills (for things like powerpoint, excel etc.) and more importantly you can give it access to a local file base. The file base was by far the most important to me when I started working in it because it appeared to start challenging the central irritation I was having when power-using claude.ai — that I kept having to upload files time and time again. I could just say — go look in this file, and read stuff. Much better.

After power-using Cowork for a few months, though, I was still frustrated. The reason is very simple and very selfish — just as I felt like constantly uploading files to claude.ai was stupid, I was still doing vast amounts of tasks that were annoying friction — like explaining the situation, picking files to read, getting files in the right structure etc.

Let's be clear: when I use Cowork there's really three things I want to do: think, plan, and instruct the agent on next steps. Literally everything else I have to do (manually giving the agent context, managing my file system, doing my own research and analysis etc.) is wasted time and effort (giving context) or work that it's immediately obvious that agents would be better at (managing files).

The problem was compounded as my projects got larger. I spent ~a month deep researching a new industry (let's say Vegan Bakeries), primarily using Deep Research, and quickly got to 30+ research reports analyzing the industry in different ways, and working on each other. The result was a gargantuan mess of information, that as new files got added was often out of date, contained bad decisions, and would completely kill the context of any agent that tried to work on it, so I had to tell it to only look at certain files at a time.

This last bit was a small symptom of a larger problem. Despite Cowork having access to all my files it still didn't know the project, or know me. It knew my name and job, but it didn't have the deep reservoir of understanding that a true coworker would have. This basically brought me to the belief that the industry's current concept for ‘memory’ is woefully unambitious. In the industry memory is generally something you gather, summarize, and inject into the prompt in every situation. That means its definitionally limited in scope to generalized info useful for every conversation, and it is forced to be small to not kill context.

But memory isn't just knowing your name, it's knowing everything — everything about your work projects, all your contacts, your style preferences etc., and to do that the agent needs to see everything (everything), store it cleanly and access it as needed, somehow without killing the context window. Practically, this means it needs to be able to access the correct information on demand during conversations in bite-sized chunks, but we'll get there in a second.

So a month in, I'm frustrated, and have loosely identified the problem set:

Doing things manually is pissing me off
As I'm powerusing the app, the larger projects are becoming impossible to manage, with vast internal problems, and no way for Cowork to manage them in context
It feels dumb and doesn't get stuff because its memory is unbelievably shallow

Basically — Cowork doesn't feel like magic and AI should always feel like magic.

2. why this matters

Ok, but is this actually that big a deal in the grand scheme of things, or is this Ben ranting and raving.

I'd say it's extremely important. The way I work — basically talking to agents all day — is going to become the ubiquitous form of knowledge work in the next few years. The industry has basically solved agents ‘doing things’ and so we will become managers at increasingly abstract levels. There's an open question then — how is this serious thinking-based knowledge work actually going to work in the age of agents? What should it feel like? What would the product look like?

As far as I can tell — there isn't a good answer yet. Partly this is because engineers build for coders. The messy thinking of knowledge work just isn't as prevalent in coding because code bases are hyper structured, rigorously maintained and documented as part of the work. But for knowledge work files aren't the product — decisions are. So the file systems — the knowledge bases — almost invariably look like your average person's Downloads folder — random file dumps. AI tools are increasingly sophisticated, but they've stalled on optimizing within set project systems (because that's where coding mostly needs help). Knowledge work needs something different — a more sophisticated environment to optimize for thinking and discussion where the agent itself operates the workspace.

So where is everyone else on this? The agent ecosystem is almost entirely focused on action. Gstack builds 18-specialist engineering teams. Gastown runs 20-30 parallel coding agents. Manus does sandboxed task execution. These are impressive, but they're all optimized for doing things — shipping code, running workflows, executing tasks. For code, the industry has marched steadily from chat (ChatGPT) to embedded (Copilot) to project-level (Cursor / Claude Code) to the beginnings of operating workspaces. For knowledge work, we're stuck. We have chat (claude.ai). We have embedded (M365 Copilot, Cursor-for-X). We have early project-level tools (Cowork). But the operating workspace column — where the agent actually maintains the system — is empty.

The model is already smart. But every time you use it without a persistent environment, costs stack up — some tax your effort, others degrade the output. Here's what you're paying.

friction— your effort
quality loss— output degradation

Hover a bar to see what's costing you

Smarter models raise the capability line. Better environments close the gap beneath it. Everyone is building smarter agents. The bigger unlock is removing the friction that wastes what they already have.

Where the AI lives determines what it can do. Click any cell to learn more.

chat

embedded in file

working in project

operating workspace

depth →

code

ChatGPT

"fix this function"

Copilot

line completions

Cursor / Claude Code

file + project context

Workshop

barely

knowledge

Claude.ai / ChatGPT

upload & explain

M365 Copilot / Cursor-for-X

AI-in-the-doc

Cowork

project-level, early

Workshop

maintains the system

this post

Code filled in left to right. The workspace column is still empty.

Knowledge work has the embedded layer. But nobody jumped to workspace-level. Until now.

OpenClaw is the most interesting comparison because it's genuinely trying something new — a personal AI agent with persistent memory and multi-app access. It remembers your conversations, it can act across your tools, and it's the closest thing to the new paradigm people are excited about. But it's an interaction layer, not an organization layer. It doesn't manage your knowledge base as a living system — it doesn't track what's stale, propagate changes across files when an assumption shifts, maintain decision history with rationale and trigger conditions, or autonomously restructure your project as it evolves. It's a very good memory-enabled assistant. What I'm describing is something different — a system where the agent doesn't just remember, it maintains.

As a (bad) metaphor — let's say agents are elves in Santa's workshop, and you need a new elf to use the workshop to make a certain present.

Every time you start a session, a new elf walks into Santa's workshop (here being the folder you've given it in Cowork, or including whatever files you've given the chatbot). Brilliant, fast, tireless (and increasingly so) — but with zero memory of yesterday's shift. It has never seen this workshop before. What does it find?

In almost 100% of cases right now, the answer is a pile of unlabeled files on the floor. This means you the user have to spend the first 20 minutes explaining where everything is & then constantly redirect the elf during work. If instead, the answer is a well-designed workshop — clear instructions on the wall, an organized library, warning signs from previous shifts, an intake desk for new materials — the elf sees the room, reads the instructions and gets to work.

Basically — AI research, engineering and product focus has almost entirely gone to smarter elves and more machines to do things, but a smarter elf doesn't fix the fact that your Workshop is a disaster. We could build a great Workshop — one that empowers the elf to build as autonomously and effectively as possible.

And the reason this matters so much is that it changes what a single person can do. Knowledge work has never been scalable. Not really. The core bottleneck has always been one person's ability to hold complexity in their head — how many decisions, assumptions, and interdependencies can you track before things fall through the cracks? Your biological RAM fills up fast.

When the agent handles all the process and detail overhead — decisions, assumptions, data tracking, cross-referencing, staleness — your working memory gets freed up entirely. What's left is pure thinking, deciding, and instructing, and because your head isn't saturated with tracking overhead you can actually get into flow on the hard problems. One person with a well-built Workshop can maintain analytical depth across a body of work that would previously have required a team, because the system holds the state and the person supplies the judgment.

Now scale that up. Right now there's friction not just within one person's work, but across people, teams, and departments. Critical information is buried three degrees of separation away in tools and files you don't even know exist. Now imagine deploying agents into this — elves running around a warehouse with no instructions on the wall, no filing system, no shared decisions log. This is how you get really bad results really quickly. Not because the agents are dumb, but because agents are only as good as the context and direction they receive.

This is actually a better framing for how we should think about agent management going forward. The industry is obsessed with evals and guardrails at the agent level — checking individual outputs, constraining individual actions. But agents are good at doing what they're told. The problem is the environment they're operating in. Given how powerful the models now are, we should be doing evals and management at the system level — reviewing the config files, the decision logs, the dependency graphs, the session histories. This is where errors will live and iterations & improvements will be made. Just like a post-mortem in organizations is generally more about the system than one person's mistake. A well-built Workshop doesn't need guardrails on every action. The guardrails live within the constitutional instruction files and will be iterated on in those files.

When agents manage and connect information across an organization — maintaining living knowledge infrastructure at team and company scale — you get real-time access to the full analytical depth of the entire organization. An agent that can surface how a supply chain change affects your forecast, because it maintains connections between procurement data, financial models, and strategic plans that no human team could hold simultaneously. Every person gets access to the full thinking power of the whole organization's information, not just what they personally know or can find.

That's the vision. Not AI that does your work — AI that maintains your institutional knowledge so well that the quality of every decision, at every level, step-changes upward. And right now, nobody's building it.

3. i made workshop

So I built it. Or at least — a prototype of it.

Workshop is my open source implementation of the operating workspace paradigm and you can find it fully opensource here. The basic vision is that it's a set of skill files that give Cowork the following (and more):

Direction
- Overall config for how to work in the system
- Instructions for maintenance and update of files & system
File Orientation
- File maps + dependency graphs
- Distilled versions of files
- File staleness trackers
Memory
- Much more sophisticated view of the user
- Record of what happened in previous sessions & decisions made

What does this look like in practice? Things I used to spend time on, I just tell Cowork to go do it, and it can complete things fully autonomously. It's fucking great — some examples below:

problem

solution

Files accumulate but don't compound

Autonomous file management

Classifies, routes, registers dependencies, flags staleness

Survey results arrive in chat. Classified as research, routed to Portland project, dependencies mapped, stale assumptions flagged downstream.

Every session starts from zero

Self-orienting agent

10-step startup. The agent reads the room and briefs you

New session. The agent has never seen this workspace. Ten steps later: 'Portland location decision open. Ingredient costs 15 days stale. Loan approval in 47 days.' You said nothing.

Information goes stale silently

Structured doubt

Staleness auto-propagates. Warning signs appear before you act

New survey shows weekend foot traffic actually dropping in Portland. Cost-per-cupcake model, revenue model, and Portland decision all flagged stale. You see the warning before you act on old numbers.

Decisions lose their rationale

Version-controlled decisions

Numbered, with rationale and 'what would change this' triggers

Three months later, Portland's weekend foot traffic drops below 300/day. Trigger fires. Decision #R5 reopens with original rationale, evidence, and competing options intact.

You are the integration layer

Dependency tracking

Upstream changes flag every downstream file automatically

Rent projection changes from $2,500 to $3,200. System traces three downstream files — cost-per-cupcake, revenue model, Portland decision — and flags all three.

What zero friction looks like in practice. Click an example to see the before and after — and count the steps that disappeared.

You onboard the agent

7 steps · 15-20 min

The agent onboards itself

1 step · < 1 min

You are the integration layer

8 steps · 1-2 hours

The system handles integration

2 steps · 5 min

Decisions lose their rationale

6 steps · recurring cost

Decisions are version-controlled

1 step · automatic

In each case: you told the agent what to do, not how. The environment gave it everything it needed to figure out the how.

* Time estimates are illustrative, based on the author's experience. Actual times will vary.

I've written another blog that deep dives on how Workshop functions — go read that here. And I've written a separate blog on the design principles and theory behind building systems like this — read that here.

4. how to think about building this

Building Workshop taught me a lot about what this new paradigm requires. I won't go deep here — that's what the design principles blog is for — but the core beliefs that emerged are worth laying out briefly.

Build for agents, not yourself. This is the fundamental inversion. The entire tradition of personal knowledge management — Notion, Obsidian, PARA, Zettelkasten — assumes you'll maintain the system. That's insane in the new world. Systems built for humans decay because humans stop maintaining them. If instead every file, every protocol, every piece of metadata is designed for the agent to read, navigate, and maintain, the system maintains itself. This requires empathy for the agent, and I mean that literally — you need to think about what it needs to do great work.

Use language, not code. Workshop runs on markdown files and English-language protocols. No databases, no middleware. The work it manages — thinking, analysis, strategic debate — isn't deterministic, so the system can't be deterministic either. English-language protocols give the agent intent and let its intelligence figure out execution. Code would be a straitjacket. And when the system needs to evolve, you edit a markdown file. The agent reads the new instructions next session and behaves differently.

Write constitutions, not prompts. To actually hand over control, you need more than good prompts or even good agreements scoped to categories of work. You need a constitutional document — what the agent decides without asking, what needs your sign-off, how conflicts get resolved, how institutional memory persists, how the system maintains and improves itself. Anthropic runs Claude under a constitution. My config file in .agent is a tiny analogous version.

Knowledge bases should work like codebases. Codebases have decades of accumulated discipline — file hierarchies, dependency management, version control, READMEs, blame history. An agent can walk into a codebase, read the structure, and start working. Knowledge bases are almost always just file dumps. The gap is enormous, and closing it is what makes the Workshop actually function.

Each of these deserves a longer treatment, and I've written that here.

5. conclusion

To clarify things ahead of time — I think of Workshop as a prototype — maybe even ‘design fiction’, a concept introduced to me in Maggie Appleton's response to Gastown. It's a piece of software that no one really thinks is literally the future product spec, but which might gesture inefficiently in that direction. In creating Workshop (& writing this blog!) I thought a lot about this problem and I am a million percent sure someone else can do a better implementation of this paradigm. Hopefully though it's helpful and interesting as a step forward in the right direction.

Overall I really just want Cowork and other tools to do this natively — so that I can have my fully autonomous assistant that works like magic embedded in the orchestration layer that makes it happen. As advice, building smarter agents is great, but while we're building God and deploying agents autonomously, as an industry we should also be spending time building environments worthy of how smart they already are (and also give better product experience).

What makes a good environment will change, particularly as the agents get smarter. You might not need distilled files when context windows hit 10 million tokens. You might not need file maps when agents can hold an entire knowledge base in memory. But the design discipline — empathy for non-human collaborators, constitutional governance, agent-maintained environments — only becomes more important as agents get more capable because we are in parallel giving them more control and autonomy anyway. A more capable agent in a poorly designed environment can do much more damage much faster.

More selfishly, better environments mean more autonomous agents, which mean you can launch and manage more at once because they just work to your specifications better, without needing the handholding. I can run several Cowork instances like engineers run several Code instances — much more fast and effective than otherwise.

Regardless though this has been a fun project. If you want to look into my implementation of Workshop — read here or visit the Github here. And if this sounds fun and you want to chat — feel free to get in touch! I'm always interested in meeting new people :)