(parentheticals)

Making Claude My Second Brain's Second Brain

Thu, 09 Apr 2026 12:00:00 +0800

There’s something satisfying about asking an AI a question about your own work and getting back an answer grounded in your context. Not the internet’s context. Yours.

It’s 8:45 AM. I have a product sync at 9, a 1:1 with a new team lead at 10, and a candidate interview at 11. I haven’t prepared for any of them. I open Claude and run my morning brief. It pulls the agenda from my calendar, searches my Obsidian vault for notes on each attendee, checks recent Confluence pages and email threads, and hands me three meeting briefs. The product sync brief shows what was carried over from last week. The 1:1 brief has the team lead’s recent projects and open questions from our last conversation. The interview brief pulls the candidate’s resume highlights alongside our rubric and even suggests a few questions about their specific experience.

None of these lives in one place. It’s scattered across a calendar, an inbox, a wiki, and a thousand-odd markdown files. Claude stitched it together because it knows where to look and what to look for.

How? Honestly, the interesting part isn’t the AI. It’s the notes.

The “RAG is dead” misread

In early April 2026, Andrej Karpathy published his LLM Wiki approach: treat your knowledge base as a persistent, compounding artifact maintained by an LLM. Raw sources go in, the model synthesizes them into structured markdown pages with summaries, and the wiki grows richer over time. The post went viral.

The popular takeaway was: context windows are big enough now, just throw your docs in. RAG is dead.

It isn’t.

Searching a few hundred Markdown files in a personal vault is fundamentally different from running a production chatbot over millions of documents. At personal scale, sure, you can stuff things into a context window. At production scale, the latency and token cost kill you. The people declaring RAG dead are generalizing from a toy setup.

But Karpathy’s instinct is right, and the interesting question is why flat RAG feels inadequate. Traditional RAG treats your knowledge as chunks in a bag. Every document gets split, embedded, and thrown into a vector store. At query time, you retrieve the top-K nearest chunks and hope for the best. There’s no structure. A chunk about a person and a chunk about their project sit in the same flat index with no connection between them. It’s like a library where every book has been shredded, and the pages shuffled together.

The answer isn’t to remove retrieval. It’s to build on top of it. Add entities. Add relationships. Give the retrieval layer something to grab onto beyond raw text similarity. At personal scale, QMD with keyword and vector search is plenty. At larger scale, you still need a proper vector database, but the entity layer works either way.

I went deep on this in a previous post on building persistent AI memory with SurrealDB, where entity graphs and vector search work together. What follows here is the simpler, more practical version: how I structure an Obsidian vault so that an AI agent can actually use it.

From PARA to knowledge graph

I started with PARA (Projects, Areas, Resources, Archive), the standard Obsidian organizational system. It lasted about three months.

PARA is a filing taxonomy. It tells you where to put things, not how to find them. When I asked Claude to pull context for a meeting, it had to know that the project lived under Projects, the attendee’s notes were in Areas, the relevant RFC was in Resources, and last quarter’s decision was in Archive. Four different places, organized by lifecycle stage rather than by what the information actually is. For a human clicking through folders, fine. For an agent trying to assemble context programmatically, a nightmare.

I switched to organizing by entity type: People, Teams, Projects, Services. Each entity gets a canonical page. The structure is flat, with no nesting beyond the top-level type directories.

Entities are nodes, not files in a taxonomy. A person’s page links to their team, their projects, and the services they own. A project page links back to its people, its services, and its dependencies. The relationships are explicit, not implied by which folder something landed in.

What entity pages look like

People have the richest frontmatter:

---
Organization: "[[Acme Corp]]"
Title: Senior Engineer
Team: "[[Platform Team]]"
aliases: [jsmith, John]
---

Sections: Work (current role, projects, focus areas), Collaborators (linked people with context), Notes & Observations, Personal (background, former companies).

Projects follow a single generic structure: Summary, Goals, Documentation, People, Timeline. Minimal frontmatter, usually just aliases. The same skeleton works whether it’s a product initiative or a technical migration.

Services range from a short page (overview, limitations, relationship to similar services) to a detailed reference (capabilities, deployment options, compliance, pricing). I also keep pages for external products and services, stuff like Datadog or Terraform or whatever vendor we’re evaluating. I summarize the internet research and add notes on how we actually use it, what’s weird about our setup, and what broke. When Claude reasons about a tool, it gets our context, not the marketing page. Minimal frontmatter across all of these.

Common patterns matter more than specifics. Every entity page has YAML frontmatter with aliases for flexible linking. Every page uses H2 sections as a consistent skeleton. Every page uses [[wiki-links]] for cross-references.

These consistent shapes are what make agent search reliable. When every person page has a “Work” section, and every project page has a “People” section, grep and semantic search hit predictably. The agent doesn’t need to understand your filing system. It just needs to know what to search for.

Amortized computation

Every bit of preprocessing you do on your vault pays off at query time, on every future query. You’re building a database index, except the database is your notes, and the queries come from an LLM.

Entity pages are the obvious example: instead of the agent re-deriving “who is this person and what do they work on” from scattered notes each time, there’s a canonical page to land on. But the same principle applies to images without alt text (invisible to an LLM, so I enrich them with descriptions), static documents from Google Drive or PDFs (converted to markdown and saved in the vault, one format, one search index), and raw meeting transcripts (long and noisy, so a processed summary in the daily note is cheaper to retrieve and more useful when found).

LLMs can work without this structure. They just burn more tokens and take longer to get worse results.

The full loop

Vault is structured. Here’s what a typical day looks like.

Morning brief

A scheduled skill runs against my calendar, pulls today’s events, and for each meeting searches the vault, recent emails, and Confluence for relevant context. Out comes a per-meeting brief in my daily note.

Different meeting types get different templates. A recurring status sync shows what was discussed last time and what’s still open. A 1:1 pulls up the person’s page, their recent activity, and any open threads between us. An interview pulls the candidate’s profile alongside our rubric.

It’s not always right. Sometimes it surfaces stale context or misses something recent. But it’s a better starting point than walking in cold, and I can skim and correct in a minute.

Capture

During meetings, I use Granola for transcription. After each meeting, the transcript gets summarized and inserted into the daily note.

Then Claude does post-processing. The most valuable part is name resolution: the transcription says “John mentioned the migration timeline,” but which John? Claude checks the entity graph. It knows the meeting was with the Platform Team, searches for people linked to that team, finds Jonathan Smith (Senior Engineer, Platform) and John Park (Data Analyst, Finance), and picks the right one based on context. When phonetic matching is needed, the transcriber hears “Sean,” but the team has a “Shawn,” it handles that too.

Graph update

After capture, Claude updates the entity graph. Mentioned people get their pages updated, a project status changed, someone took on a new responsibility, or a decision was made. Projects and services get the same treatment.

Unrecognized entities get flagged. If the transcript mentions a name that doesn’t match anyone in the vault, Claude asks: new person, or mistranscription? Usually, it can be told from context. “The new contractor on the data team” is probably someone new. “Michele from infrastructure” is probably Michael with a transcription error.

The updates don’t need to be 100% correct. I review them, fix what’s wrong, and move on. Over time, the signal-to-noise ratio improves naturally as correct information gets reinforced and errors get corrected. It’s more wiki than database. Eventual consistency through volume and curation.

Enrichment

The last stage runs asynchronously. Image attachments get enriched with alt text. External documents get converted to markdown. New entity pages get their skeleton filled in.

Under the hood

Structure without search is a library with no catalog.

I use QMD, a local search engine that indexes markdown files with both keyword and vector search. It connects to Claude via a skill, a reusable prompt that teaches Claude how to search and what the vault contains. I prefer skills over MCP servers for this: simpler to maintain, version-controlled as markdown, no running process.

flowchart LR
    V["Obsidian Vault\na thousand+ markdown files"] --> Q["QMD Index\nkeyword + vector"]
    Q --> S["Skill\nsearch instructions"]
    S --> C["Claude\nquery + reasoning"]

One thing worth calling out: Obsidian’s link graph, the backlinks and outbound links that make it powerful for human navigation, doesn’t matter much for agents. Humans click links to traverse relationships. Agents don’t click anything. They grep. They search semantically. The entity structure matters because it creates consistent search targets. This plays to coding agents’ strengths. They’re already wired to search for things, not browse for them.

For meeting capture, I reverse-engineered Granola’s local cache to get the transcripts into my pipeline rather than going through their API. Local caches are often simpler and safer than API access, which might trigger rate limits or security flags. Use existing integrations when they exist; when they don’t, look at what’s already on disk. Profile photos get pulled from Google Meet screenshots and processed through an automated pipeline (resize, crop, adjust) then attached to the person’s page. Small touch, but it makes the vault more navigable when I’m the one browsing.

I built all of this as skills first, single-purpose prompts that each handle one step. As individual skills matured, I promoted them to specialized sub-agents that run in parallel, each in its own context window. Cheaper, faster, and easier to debug than cramming everything into one session. If I were starting over, I’d do it the same way: skills first, agents later.

What this changes

My notes used to be a write-only archive. I’d take notes in meetings, file them somewhere reasonable, and never look at them again. Buried under months of accumulated markdown. I used to spend hours cleaning up notes, adding links, and maintaining structure. That overhead is gone now.

The vault is a working memory. Every meeting makes it a little smarter. The morning brief surfaces context I’d forgotten I had. The post-processing catches connections I’d have missed.

It’s not perfect. Name resolution still trips up on mispronounced foreign names, some entity pages drift out of date, and the whole thing requires enough structure that you can’t just dump files in and expect magic. But it’s a different relationship with notes. They’re not records of what happened. They’re context for what’s about to happen.

The gap between “AI can read my files” and “AI understands my work” turns out to be mostly a data structure problem. Context windows and retrieval get you partway there. The rest is in how you organize the vault.

Facts, Episodes, and a Knowledge Graph: Building Persistent AI Memory with SurrealDB

Thu, 12 Mar 2026 12:00:00 +0800

Every time you open a new chat with an AI assistant, it has forgotten everything about you. Your name, your preferences, what you were working on last week, the decision you made yesterday — gone. You start from scratch every single time.

This bothered me enough that I built my own. Krill is a local-first AI assistant, and its memory system is the part I’ve spent the most time thinking about. Here’s how it works.

The Shape of the Problem

The CoALA framework applies cognitive science’s memory taxonomy to language agents: semantic memory (facts and concepts), episodic memory (past experiences), and procedural memory (how to do things). That framing maps cleanly onto what I wanted to build: an assistant that knows facts about you, remembers what happened between you, and understands how things relate.

In concrete terms, I wanted five things:

Persistence: facts survive session restarts
Relevance: surface the right information, not everything at once
Freshness: older or contradicted facts should matter less over time
Relationships: “you use Neovim” and “you edit code daily” are connected, not isolated
Local-first: no cloud sync, no third-party data store, user owns their data

These requirements pull against each other in annoying ways. Keeping everything conflicts with expiring things. Being comprehensive conflicts with being selective. The interesting design work is in reconciling them.

Extracting Facts from Conversation

After every conversation turn, I fire off an asynchronous LLM call to extract structured facts from what was just said. It runs in the background; the user never waits for it.

The extraction prompt asks for facts in a specific format. Each fact comes back with a content string (“user prefers Python over JavaScript”), a category (one of identity, preferences, personal, work, other), a confidence level (explicit, implied, or inferred), and boolean flags for decision_point and emotional_signal that I use later for episode synthesis.

The confidence levels matter. An explicit statement like “I prefer Python” gets 0.9. Something implied by context gets 0.7. A weak inference gets 0.5. Facts below 0.5 are dropped immediately.

Deduplication

Before writing a new fact, I embed it with a 384-dimensional sentence embedding model and compare it against existing facts via cosine similarity. Three zones, three outcomes:

Similarity	Action
> 0.85	Reinforce — increment count, update timestamp, add session to source list
0.4 – 0.85	Conflict check — ask the LLM: does this contradict?
< 0.4	Write — clearly new information

flowchart TD
    A([New fact]) --> B[Embed into vector]
    B --> C[Cosine similarity search\nagainst existing facts]
    C --> D{Similarity score?}
    D -->|"> 0.85"| E[Reinforce\nincrement count · update timestamp]
    D -->|"0.4 – 0.85"| F[Conflict check\nLLM: does this contradict?]
    D -->|"< 0.4"| G([Write new fact])
    F -->|Yes| H[Supersede old fact]
    H --> G
    F -->|No| G

The conflict check is where things get interesting. A hard threshold can’t handle nuanced contradictions. “Uses Mac” and “switched to Linux for work” aren’t duplicates, but they’re in tension. I pass both facts to the LLM and ask: does the new fact contradict the old one, and if so which should survive? The answer depends on context only the LLM can evaluate.

Not All Facts Are Equal

A fact mentioned once in passing shouldn’t carry the same weight as one repeated a dozen times over six months. MemoryBank applies Ebbinghaus’s forgetting curve to LLM memory, where memories decay over time and get reinforced on re-encounter. I took the same idea and made it explicit with a trust score:

trust = confidence × recency × boost

confidence: the extraction-time score (0.5, 0.7, or 0.9)
recency: decays with a 90-day half-life — 0.5^(days_since_reinforcement / 90)
boost: log-scale reward for repeated confirmation — min(1.0 + 0.1 × log₂(1 + reinforcement_count), 1.5)

A brand-new medium-confidence fact starts at 0.7. That same fact, reinforced five times over two months, climbs toward 1.0. A fact not mentioned for a year drops toward irrelevance. Trust scores also gate conflict resolution: if an existing fact has a trust score more than 1.5× higher than the incoming one, the old fact survives. You don’t replace a well-established belief with a low-confidence passing mention.

Getting the Right Facts into Context

Writing facts is the easy part. The hard part is retrieval: given what the user is saying right now, which facts should the assistant know about?

I inject memory in two layers. First, category summaries: for each category (identity, preferences, work, etc.), I maintain an LLM-synthesized narrative that gets rewritten whenever five or more new facts arrive. These go in first and give broad strokes. Then the top five facts retrieved by semantic similarity to the current user message, re-ranked by trust score, for specific detail.

Semantic search alone has a well-known failure mode: it finds facts that are topically nearby but misses exact keyword matches. If the user mentions a specific name or acronym, a dense embedding may not surface the right fact. I combine vector search with full-text search and merge the two ranked lists using reciprocal rank fusion (RRF). Neither list needs to be perfect — RRF is robust to noise in either source, and facts that rank highly in both float to the top.

When the memory store grows large, a lightweight LLM selector runs first to filter candidates, ranking by trust and picking only the ones actually relevant to the current query rather than blindly taking the top-N by embedding distance.

Two Harder Problems

Facts answer “what do I know about you.” But two questions remain: “what happened between us?” and “what is related to X?” Those need different structures.

Episodic Memory: What Happened

Imagine you spent a session debugging a tricky race condition and ultimately decided to rewrite a module. The facts extracted might be: “user is debugging a concurrency bug,” “user works on Krill,” “user prefers explicit concurrency over async abstractions.” Accurate, but they’ve lost the event. The diagnosis, the pivot, the three hours of back-and-forth that led to the decision.

Semantic search on those facts won’t recover it. If you ask about that module next week, the search finds facts close to your query. It won’t surface “we debugged a race condition and decided to rewrite” because that sentence doesn’t exist in the store. It lived between the facts.

The Generative Agents paper showed how to handle this: record experiences as a memory stream, then synthesize them into higher-level summaries. I apply the same idea to a personal assistant. When a session produces enough signal (at least three facts, or a decision_point flag), I fire an LLM pass over the session’s conversation history and produce a short narrative. Something like: “Debugged a race condition in the gateway’s async loop. After ruling out the scheduler and lock ordering, found the issue was a missing await on a shared resource. Decided to rewrite the module with explicit locking rather than patching the existing code.”

That narrative gets stored separately from facts and injected as “Relevant Experiences.” Unlike facts, episodes are surfaced by recency as well as semantic similarity, because the most recent session is almost always relevant even if the topic has shifted.

The decision_point flag does real work here on two levels. At extraction time it’s a boolean on individual facts, marking that something was a consequential choice. This lets a session trigger episode synthesis even if it produced only one or two facts — a session where you made one important architectural decision is worth narrating even if not much else was said. In the synthesized episode, decision_point becomes a string: the single most important choice or pivot from the session. This gives the assistant a specific handle on what was decided, not just that something happened.

The second problem is structural. Semantic search retrieves facts that are textually similar to the current query. It fails when relevance is relational rather than lexical.

Suppose you ask about your deployment pipeline. Semantic search finds deployment-related facts. It won’t find Adam. But suppose the memory contains “Worked on Terraform with Adam.” The graph has encoded Adam as a person with a worked_on relation to Terraform, and Terraform with an implements relation to the deployment pipeline. Expand one hop from “deployment pipeline” and Terraform surfaces. Expand one more and Adam surfaces. Neither connection is a text match to your query; it’s pure graph structure.

graph LR
    DP["**Deployment Pipeline**\n〈project〉"]
    TF["Terraform\n〈tool〉"]
    AD["Adam\n〈person〉"]

    TF -->|implements| DP
    AD -->|worked_on| TF

    style DP fill:#4a90d9,color:#fff,stroke:#2c6fad
    style TF fill:#7cb8e8,color:#222,stroke:#4a90d9
    style AD fill:#b3d8f5,color:#222,stroke:#7cb8e8

Every five new facts, I trigger a graph extraction pass. The LLM reads the accumulated facts and produces entities typed as person, tool, concept, place, organization, or project, plus directed typed relations: uses, works_at, collaborates_with, created, skilled_in, and so on. The relations aren’t just labels — each edge carries the confidence of the source fact, the timestamp it was extracted, and which session it came from. That means when I traverse the graph to answer a query, I can weight edges by how reliable and recent they are, not just whether they exist.

One thing that surprised me: entities need deduplication just like facts. “Neovim”, “nvim”, and “that editor I use” all refer to the same thing. I run the same cosine similarity check against existing entities (threshold 0.85) and collapse them to a single node. Without this, the graph fragments across surface forms and loses its connective value.

Relations also drift. The LLM invents its own types when the seed taxonomy doesn’t quite fit: employs, works_with, relies_on alongside the canonical uses. I run a periodic normalization pass that merges near-synonym types back to seeds via another LLM call, keeping the graph navigable.

Prior Art

I’m not working in a vacuum. MemGPT, now productized as Letta, pioneered one approach: agents explicitly manage their own memory via tool calls, deciding what to store and retrieve. It’s an interesting design, but it puts memory management in the conversation layer. I went the opposite direction: memory extraction is a background harness service, invisible to the agent, so the conversation stays focused on the task.

Mem0 is the closest open-source analogue to what I built: fact extraction, deduplication, and retrieval as a standalone memory layer, well-adopted and actively developed. The main differences are that I add trust scoring with reinforcement (near-duplicate facts strengthen existing atoms rather than being silently dropped), episodic synthesis, and a knowledge graph, and keep everything local.

Zep’s Graphiti takes a different angle on the knowledge graph: facts have temporal validity windows (valid from time T, superseded at time T+N) rather than a continuous decay score. Different formalization of the same intuition. Worth reading their paper.

Why SurrealDB

I needed document storage, vector similarity search, full-text search, and graph edges in a single database that runs embedded with no server process. That last requirement eliminates most options.

Typically this means three or four systems: a document store, Qdrant for vectors, Elasticsearch for full-text, Neo4j for the graph. Coordinating them locally is painful. SurrealDB handles all of them. It runs in embedded mode (surrealkv://) with the entire memory system as a single file on disk.

Semantic search runs against a MTREE vector index with cosine similarity on 384-dim embeddings. Full-text search runs against a SEARCH index on the same table, and the two result sets are merged via RRF before re-ranking by trust.

Knowledge graph edges are TYPE RELATION tables, where each RELATE statement creates a first-class edge record that can carry arbitrary metadata — confidence, timestamp, source session — not just a label. Traversal is also bidirectional: ->references-> follows edges forward, <-references follows them in reverse. That means I can ask “what does Terraform implement?” and “what uses Terraform?” from the same edge table without duplicating data.

Facts, episodes, entity nodes, and category summaries all land in the same atom table but carry completely different metadata shapes — a fact has confidence and category, an entity has entity_type and aliases, an episode has narrative and decision_point. SurrealDB’s schemaless documents mean each atom carries exactly the fields it needs, without padding every row with null columns or splitting atom types across separate tables.

SurrealDB is younger and less battle-tested than Postgres, and SurrealQL has a learning curve. But for local-first where deployment simplicity matters more than horizontal scaling, it’s been the right call.

What I Learned

Add trust scoring early. I bolted it on later and it touched everything: ranking, conflict gating, expiry. The formula takes microseconds; there’s no good reason to wait.

Use an LLM for conflict resolution. My first instinct was a rule: if similarity is above some threshold and the new fact contradicts the old one, overwrite. But “contradicts” isn’t binary. “Switched to Linux for work” and “uses Mac” are both true in different contexts. A rule gets this wrong; the LLM generally doesn’t.

Episodic memory and facts serve different cognitive roles and you can’t derive one from the other. Facts are what the assistant knows about you; episodes are what happened between you. I assumed episodes would fall out of the fact store naturally. They don’t. They need their own extraction pipeline, their own storage, their own retrieval logic. Conflating them is a design mistake.

Category summaries saved me when the fact store got large. Semantic retrieval alone misses context that isn’t textually close to the query. Summaries give a fallback that’s always relevant regardless of what was asked.

Keep the database layer dumb. Every time I put logic there I regretted it. The intelligence is in the pipeline.

Agents and Spaces: A Minimal Architecture for Multi-Agent Coordination

Mon, 16 Feb 2026 16:00:00 +0800

I’ve been building a local-first personal AI assistant, something in the vein of Open Interpreter and OpenClaw, and I hit a wall that I think a lot of people are hitting right now.

The single-agent loop works surprisingly well. User says something, LLM reasons, tools execute, repeat. Wrap it in while true and you’ve got the Ralph Wiggum loop. You can get a lot done this way. But a personal assistant that’s actually useful needs to do many things at once. Research a topic while drafting a document. Process emails while you’re chatting. Run a scheduled task overnight and have the results ready in the morning. One agent, one context window, one turn at a time. It doesn’t scale.

So, multi-agent, obviously, but the more I looked at existing frameworks, the less I liked what I saw.

What’s broken

Here are real problems I ran into:

Cross-channel continuity. You mention a project in chat. An email arrives about the same project. The chat agent and the email agent have no shared context. In frameworks like CrewAI or AutoGen, agents communicate through prescribed channels, but there’s no unified content store where both can discover they’re working on the same thing.

Background work blocks the conversation. Ask your assistant to research something that takes 30 seconds. In a single-agent loop, you’re staring at a spinner. Can’t ask a follow-up, can’t switch topics. The context window is busy. Multi-agent helps, but most frameworks want you to declare the topology upfront. “This is the research agent, this is the chat agent, here’s how they talk.” What if the next question doesn’t need research?

Deferred actions lose their context. “Remind me tomorrow at 9am” is easy to schedule but hard to execute well. The reminder needs to carry the conversational context it was created in, like a closure capturing its environment. But the cron trigger and the chat agent are separate systems. The deferred action fires in a vacuum.

Trust is all-or-nothing. You want a research agent that can browse the web but can’t see your private files. Most frameworks either give agents full access to everything (YOLO mode), or make you build separate permission systems per tool.

Agent roles are premature abstractions. Frameworks ask you to define types upfront: Researcher, Writer, Reviewer, Coder. But real tasks don’t decompose neatly into roles. “Plan my trip to Tokyo” needs web research, calendar access, budget calculations, and document drafting, in an order that depends on what the research turns up. A fixed role graph can’t adapt.

Any personal assistant that handles more than one thing at a time hits all of these.

The bitter lesson, applied

Rich Sutton’s Bitter Lesson: methods that leverage general computation consistently outperform methods that encode human knowledge. The history of AI is littered with hand-crafted heuristics that worked until general-purpose approaches scaled past them.

Multi-agent systems are repeating this mistake. Current frameworks encode coordination strategies (“this agent is the planner, that one is the executor, they communicate through this protocol”) as if we know what optimal coordination looks like. We don’t. LLMs are improving fast enough that any fixed topology will be obsolete within a year.

So I went the other direction: minimal concepts, zero prescribed workflows, coordination that emerges from how agents use the building blocks.

The building blocks

The architecture has three concepts.

Agents

An agent is an LLM with a context window, some tools, and access to shared content.

Component	What it is
Model	Which LLM (Sonnet, Haiku, Opus, a local model)
Instructions	How to behave, what to prioritize, what format to use
Tools	What the agent can do (web search, file access, code execution)
Session	The context window: working memory and turn history
Connected spaces	Shared content stores the agent can read from and write to

No classes, no role enums. What makes an agent a “research agent” vs. a “writing agent” is the instructions it got and the spaces it can see.

The session is the agent’s identity. Creating a session = the agent is born. Wiping it = the agent is dead. Same instructions, fresh session, different agent. And spaces outlive agents. Whatever an agent wrote to a space is still there after it’s gone.

Agents can spawn other agents, passing along a task, instructions, and explicit access boundaries. The spawner controls what the new agent can see and do. Everything else (how to decompose the work, whether to spawn further, how to coordinate) is the new agent’s problem.

Spaces

A space is a shared, access-controlled content store. Agents read and write atoms, small units of content (a task, a message, a research finding, a document section) that carry metadata and can reference each other. Content is semantically searchable. Changes trigger notifications to subscribers.

There is no separate messaging system. A task assignment is an atom update. A status report is a new atom. A question to a collaborator is a message in a shared space. All communication is content in spaces.

Conventions

Everything above the raw mechanics is a convention, a usage pattern taught through instructions, not enforced by code.

A “task list” isn’t a special type of space. It’s a regular space where agents follow the task list convention: atoms have status/owner metadata, blocked_by references encode dependencies, agents scan for unblocked pending tasks. A “chat” is a space where atoms have role metadata and a UI renders them. A “knowledge base” is a space where atoms carry confidence scores and supports/contradicts references.

Same spaces, same operations, different conventions.

graph TB
    subgraph CON["Conventions (taught, not enforced)"]
        TL["Task Lists"] ~~~ CH["Channels"] ~~~ CT["Chats"] ~~~ DOC["Documents"]
    end

    subgraph CORE["Core"]
        S["Spaces<br/><small>shared content + permissions +<br/>search + subscriptions</small>"]
        A["Agents<br/><small>LLM + instructions + tools +<br/>session + connected spaces</small>"]
    end

    A -->|"spawn"| A
    A <-->|"read / write<br/>atoms"| S
    CON -.-|"patterns built on"| CORE

    style A fill:#4a6fa5,color:#fff
    style S fill:#e8d44d,color:#333
    style CON fill:#6b8cae,color:#fff
    style CORE fill:#f5f5f5,color:#333
    style TL fill:#93afc5,color:#333
    style CH fill:#93afc5,color:#333
    style CT fill:#93afc5,color:#333
    style DOC fill:#93afc5,color:#333

An agent that can create other agents and share content stores with them can express any coordination pattern (pipelines, fan-out, debates, priority queues) without the architecture prescribing any of them.

Spawning and trust

When an agent spawns another, it defines the trust boundary:

spawn(
    task:         "What to do" (natural language),
    instructions: "How to behave" (reference to an instruction space),
    spaces:       { space_id: permission },
    tools:        [...],
    secrets:      [...],
    model:        "which LLM"
)

The spawner says what the new agent can access and gives it a task, with no role assignment, topology declaration, or workflow step number.

Trust narrows monotonically

Privileges can only narrow down the spawn tree, never widen. Spaces, tools, secrets all narrow monotonically. No privilege escalation without going back to a coordinator.

graph TD
    C["Coordinator<br/><small>all tools, all spaces, all secrets</small>"]
    R["Research Agent<br/><small>web tools, research space</small>"]
    W["Writer Agent<br/><small>file tools, doc space</small>"]
    S1["Sub-researcher<br/><small>web tools, research space (read only)</small>"]
    S2["Sub-researcher<br/><small>web tools, research space (read only)</small>"]

    C -->|"spawn<br/>(narrows access)"| R
    C -->|"spawn<br/>(narrows access)"| W
    R -->|"spawn<br/>(narrows further)"| S1
    R -->|"spawn<br/>(narrows further)"| S2

    style C fill:#4a6fa5,color:#fff
    style R fill:#6b8cae,color:#fff
    style W fill:#6b8cae,color:#fff
    style S1 fill:#93afc5,color:#333
    style S2 fill:#93afc5,color:#333

Every spawn is a trust boundary. Give a research agent web access but no PII access, a data diode where information flows one direction. The spawner defines the boundary; the spawned agent operates within it.

But sometimes a child needs more

Real tasks don’t always fit the initial access grant. A research agent discovers it needs to read a private document. A code agent realizes it needs database credentials.

The mechanism: ask the coordinator. Every sub-agent has a DM space shared with its coordinator. It writes a request. The coordinator evaluates whether the agent’s task justifies the access and grants or denies. Privileges widen only through an explicit grant from an agent with sufficient permission, like a manager approving an access request.

No special permission API needed, just agents communicating through a space.

Identity is not a type

Agent identity is shaped by instructions, not declared. The instructions field references a space containing behavioral guidance. This could be a well-known default (“research-guidelines”), a fork customized for this task, something created from scratch, or (if the agent has write access) refined by the agent itself over time.

Pre-defined agent templates become seed content, not a locked-in registry. Fork “research-guidelines,” tweak it for biotech, spawn a specialized agent. All at runtime, no code changes.

Inside a space

Atoms

The unit of content is an atom: a payload (content) with metadata (key-value pairs like status, owner, priority) and typed references to other atoms (blocked_by, supersedes, supports). Atoms are versioned and semantically searchable. The system stores and traverses references but doesn’t interpret their semantics. That’s the convention layer’s job.

Operations are the basics: put, get, update, search, deprecate, history. Every mutation includes a comment, a semantic description like a git commit message. Comments are what gets published to subscribers, not atom content. An agent sees “added findings from 3 review sites” and decides whether to pull the full atom.

Permissions

Four levels, each implying the ones above it (grant ⊃ write ⊃ append ⊃ read):

read: See content, search, subscribe
append: Add new atoms (can’t modify existing)
write: Full mutation (update, replace, refine)
grant: Share access with other agents

New spaces start private, and only grant holders can share them.

Search

Semantic and structured, composable:

search(
    query="frontend tasks",                        # semantic
    metadata={"status": "pending"},                # exact match
    references={"blocked_by": {"status": "done"}}  # reference state
)

One call: find atoms matching “frontend tasks” where status is pending and all blockers are done. This is what makes conventions practical. You can query a task list for available work without multiple round trips.

How spaces reach agents

Spaces connect to an agent’s session in two ways. Injected spaces are always in context: instructions, personal memories, convention descriptions. Queried spaces are searched on demand: task lists, knowledge bases, research findings. The distinction matters because sessions are bounded but spaces are unbounded. You can’t inject everything.

How does an agent know a convention? Each space carries a description in its metadata: “This is a task list. Atoms have status, owner, and priority. Use blocked_by references for dependencies.” When an agent connects, that description gets injected. The agent learns how to use the space from the space itself.

Content is coordination

There is no separate messaging primitive. Everything is content in spaces.

graph LR
    subgraph " "
        direction LR
        A1["Research<br/>Agent"] -->|put| S1(["Findings<br/>Space"])
        S1 -->|notification| A2["Writer<br/>Agent"]
        A2 -->|put| S2(["Document<br/>Space"])
        S2 -->|notification| A3["Review<br/>Agent"]
        A3 -->|update| S2
    end

    style S1 fill:#e8d44d,color:#333
    style S2 fill:#e8d44d,color:#333
    style A1 fill:#6b8cae,color:#fff
    style A2 fill:#6b8cae,color:#fff
    style A3 fill:#6b8cae,color:#fff

When a research agent writes findings to a shared space, that is the communication. Other agents subscribed to the space see the update and react. No separate “hey, I’m done” message needed, because the distinction between “communication” and “work product” collapses.

Every space change triggers notifications to subscribers. When one arrives, the system runs an LLM turn. The agent sees the comment and decides how to react. This is reactive activation from blackboard architectures (Hearsay-II, 1980) without a control shell. The system delivers events; agents decide what matters.

Coordinators are topological, not assigned

The system needs entry points, places where external input enters the agent ecosystem. I call these coordinators, but it’s misleading if you think of it as a role.

A coordinator is any agent connected to a space with an external interface. TUI writes to a chat space? That agent is a coordinator. Webhook writes to a webhook space? Also a coordinator. The topology determines it.

graph LR
    UI["TUI / Web UI"] --> CS(["Chat Space"])
    WH["Webhook"] --> WS(["Webhook Space"])
    CR["Cron"] --> CRS(["Cron Space"])

    CS --> UA["User Agent"]
    WS --> WA["Webhook Agent"]
    CRS --> CA["Cron Agent"]

    UA <-->|read/write| BUS(["Coordinator Bus"])
    WA <-->|read/write| BUS
    CA <-->|read/write| BUS

    style CS fill:#e8d44d,color:#333
    style WS fill:#e8d44d,color:#333
    style CRS fill:#e8d44d,color:#333
    style BUS fill:#d4a44d,color:#333
    style UA fill:#4a6fa5,color:#fff
    style WA fill:#4a6fa5,color:#fff
    style CA fill:#4a6fa5,color:#fff
    style UI fill:#888,color:#fff
    style WH fill:#888,color:#fff
    style CR fill:#888,color:#fff

Coordinators can grant access, share spaces across agent boundaries, and manage subagent lifecycles. They share a coordinator bus for cross-coordinator communication.

“Remind me tomorrow at 9am”

This sounds trivial but exposes every seam in a multi-agent system. It crosses coordinator boundaries, requires scheduling, needs shared context.

You type “remind me tomorrow at 9am to review the Q3 report”
Chat UI writes to the chat space
The user-agent recognizes a cross-coordinator request, writes to the coordinator bus: “set reminder: 9am tomorrow, context: Q3 report review”
The cron-agent (subscribed to the bus) picks it up, sets the trigger, writes back an acknowledgment
User-agent confirms: “Done, I’ll remind you tomorrow at 9am”
Next morning: cron fires, cron-agent writes the reminder to the bus with the original context
User-agent picks it up: “Time to review the Q3 report”

Two coordinators talking through a space, using the same operations as everything else.

Conventions

I mentioned eight conventions so far. Here they are:

Convention	Analogous to	Key pattern
Task List	GitHub Issues	Status/owner metadata + dependency refs
Channel	Slack channel	Append-only message stream
Chat	Chat UI	Bidirectional, role-tagged turns
Document	Google Docs	Progressive refinement with versioning
Scratchpad	Notepad	Private working memory
Prompt Library	Template registry	Reusable instruction + convention templates
Knowledge Base	Team wiki	Curated facts with evidence links
Personal Memory	mem0	Auto-extracted facts, injected into context

New conventions emerge by writing new descriptions, not new code.

Code as the interaction model

Most agent frameworks give agents a fixed set of named tools like search, read_file, send_email, and the agent picks which to call. This works for simple things. But what if you need to search, filter results, then conditionally update three items based on what you found? That’s three tool calls with branching logic in between. State management across turns. A new tool for every new operation.

NanoClaw takes one approach: keep the codebase small enough that the agent can rewrite its own tools. But there’s a more general version of the same insight: let agents write code as their primary way of acting.

CodeAct (Wang et al., 2024) showed that agents expressing actions as executable Python achieve up to 20% higher success rates than structured tool calls. Hugging Face’s smolagents makes this production-grade. RLMs (Zhang et al., 2025) take it further: agents think in code, managing their own context programmatically and scaling to 10M+ tokens.

Spaces are the external environment, and code is how agents interact with them. Give agents a space capability object in a sandboxed executor. Convention descriptions include code recipes. Agents adapt and execute them:

# Fan-out research with dynamic decomposition
findings = space.search(query="initial landscape analysis")
segments = extract_segments(findings[0].content)

for segment in segments:
    coordinator.spawn(
        task=f"Deep research on {segment}",
        instructions="research-guidelines",
        spaces={"findings": "append"},
        tools=["web_search", "web_fetch"]
    )

New conventions don’t require new tools, only new code recipes. A framework with fixed tools is limited by what the designers anticipated. A framework with code and an environment is limited by what the agents can express. That ceiling rises with every model generation.

Where this goes

The architecture is designed but not yet implemented, though a few scenarios show what it enables:

A morning briefing, assembled overnight. Cron-agent fires at 6am, spawns research agents for your calendar, news, overnight emails. Each writes to a shared briefing space. Synthesis agent compiles. When you open the chat at 8am, it’s waiting.

Or adaptive task decomposition. You ask for a competitive analysis. The coordinator spawns one research agent, reads the initial findings, realizes the landscape is broader than expected, spawns three more. The task graph grew dynamically based on what the first agent found. Next time, same request might need one researcher. The architecture doesn’t care.

Correctness is almost trivial here. The interesting part starts when you let agents loose on it.

References

Sutton, R. (2019). The Bitter Lesson.
Wang, X. et al. (2024). Executable Code Actions Elicit Better LLM Agents. ICML 2024.
Zhang, T. et al. (2025). Recursive Language Models. MIT.
Prime Intellect. (2026). The Paradigm of 2026.
Hugging Face. (2025). Introducing smolagents.
Erman, L.D. et al. (1980). The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty. ACM Computing Surveys, 12(2).

(parentheticals)

Making Claude My Second Brain's Second Brain

The “RAG is dead” misread

From PARA to knowledge graph

What entity pages look like

Amortized computation

The full loop

Morning brief

Capture

Graph update

Enrichment

Under the hood

What this changes

Facts, Episodes, and a Knowledge Graph: Building Persistent AI Memory with SurrealDB

The Shape of the Problem

Extracting Facts from Conversation

Deduplication

Not All Facts Are Equal

Getting the Right Facts into Context

Two Harder Problems

Episodic Memory: What Happened

Knowledge Graph: What’s Related

Prior Art

Why SurrealDB

What I Learned

Agents and Spaces: A Minimal Architecture for Multi-Agent Coordination

What’s broken

The bitter lesson, applied

The building blocks

Agents

Spaces

Conventions

Spawning and trust

Trust narrows monotonically

But sometimes a child needs more

Identity is not a type

Inside a space

Atoms

Permissions

Search

How spaces reach agents

Content is coordination

Coordinators are topological, not assigned

“Remind me tomorrow at 9am”

Conventions

Code as the interaction model

Where this goes

References