<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>(parentheticals)</title>
    <description>Thoughts and musings on software engineering, data science, and technology. Views expressed here are my own and do not represent that of my employers or organizations.
</description>
    <link>http://thirteen37.github.io/</link>
    <atom:link href="http://thirteen37.github.io/feed.xml" rel="self" type="application/rss+xml" />
    <pubDate>Fri, 10 Apr 2026 15:25:40 +0800</pubDate>
    <lastBuildDate>Fri, 10 Apr 2026 15:25:40 +0800</lastBuildDate>
    <generator>Jekyll v3.10.0</generator>
    
      <item>
        <title>Making Claude My Second Brain&apos;s Second Brain</title>
        <description>&lt;p&gt;There’s something satisfying about asking an AI a question about your own work and getting back an answer grounded in &lt;em&gt;your&lt;/em&gt; context. Not the internet’s context. Yours.&lt;/p&gt;

&lt;p&gt;It’s 8:45 AM. I have a product sync at 9, a 1:1 with a new team lead at 10, and a candidate interview at 11. I haven’t prepared for any of them. I open Claude and run my morning brief. It pulls the agenda from my calendar, searches my Obsidian vault for notes on each attendee, checks recent Confluence pages and email threads, and hands me three meeting briefs. The product sync brief shows what was carried over from last week. The 1:1 brief has the team lead’s recent projects and open questions from our last conversation. The interview brief pulls the candidate’s resume highlights alongside our rubric and even suggests a few questions about their specific experience.&lt;/p&gt;

&lt;p&gt;None of these lives in one place. It’s scattered across a calendar, an inbox, a wiki, and a thousand-odd markdown files. Claude stitched it together because it knows where to look and what to look for.&lt;/p&gt;

&lt;p&gt;How? Honestly, the interesting part isn’t the AI. It’s the notes.&lt;/p&gt;

&lt;h2 id=&quot;the-rag-is-dead-misread&quot;&gt;The “RAG is dead” misread&lt;/h2&gt;

&lt;p&gt;In early April 2026, Andrej Karpathy published his &lt;a href=&quot;https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f&quot;&gt;LLM Wiki&lt;/a&gt; approach: treat your knowledge base as a persistent, compounding artifact maintained by an LLM. Raw sources go in, the model synthesizes them into structured markdown pages with summaries, and the wiki grows richer over time. The post went viral.&lt;/p&gt;

&lt;p&gt;The popular takeaway was: context windows are big enough now, just throw your docs in. RAG is dead.&lt;/p&gt;

&lt;p&gt;It isn’t.&lt;/p&gt;

&lt;p&gt;Searching a few hundred Markdown files in a personal vault is fundamentally different from running a production chatbot over millions of documents. At personal scale, sure, you can stuff things into a context window. At production scale, the latency and token cost kill you. The people declaring RAG dead are generalizing from a toy setup.&lt;/p&gt;

&lt;p&gt;But Karpathy’s instinct is right, and the interesting question is &lt;em&gt;why&lt;/em&gt; flat RAG feels inadequate. Traditional RAG treats your knowledge as chunks in a bag. Every document gets split, embedded, and thrown into a vector store. At query time, you retrieve the top-K nearest chunks and hope for the best. There’s no structure. A chunk about a person and a chunk about their project sit in the same flat index with no connection between them. It’s like a library where every book has been shredded, and the pages shuffled together.&lt;/p&gt;

&lt;p&gt;The answer isn’t to remove retrieval. It’s to build on top of it. Add entities. Add relationships. Give the retrieval layer something to grab onto beyond raw text similarity. At personal scale, &lt;a href=&quot;https://github.com/tobi/qmd&quot;&gt;QMD&lt;/a&gt; with keyword and vector search is plenty. At larger scale, you still need a proper vector database, but the entity layer works either way.&lt;/p&gt;

&lt;p&gt;I went deep on this in a &lt;a href=&quot;/2026/03/12/how-i-built-ai-memory/&quot;&gt;previous post&lt;/a&gt; on building persistent AI memory with SurrealDB, where entity graphs and vector search work together. What follows here is the simpler, more practical version: how I structure an Obsidian vault so that an AI agent can actually use it.&lt;/p&gt;

&lt;h2 id=&quot;from-para-to-knowledge-graph&quot;&gt;From PARA to knowledge graph&lt;/h2&gt;

&lt;p&gt;I started with &lt;a href=&quot;https://fortelabs.com/blog/para/&quot;&gt;PARA&lt;/a&gt; (Projects, Areas, Resources, Archive), the standard Obsidian organizational system. It lasted about three months.&lt;/p&gt;

&lt;p&gt;PARA is a filing taxonomy. It tells you where to &lt;em&gt;put&lt;/em&gt; things, not how to &lt;em&gt;find&lt;/em&gt; them. When I asked Claude to pull context for a meeting, it had to know that the project lived under Projects, the attendee’s notes were in Areas, the relevant RFC was in Resources, and last quarter’s decision was in Archive. Four different places, organized by lifecycle stage rather than by what the information actually &lt;em&gt;is&lt;/em&gt;. For a human clicking through folders, fine. For an agent trying to assemble context programmatically, a nightmare.&lt;/p&gt;

&lt;p&gt;I switched to organizing by entity type: People, Teams, Projects, Services. Each entity gets a canonical page. The structure is flat, with no nesting beyond the top-level type directories.&lt;/p&gt;

&lt;p&gt;Entities are nodes, not files in a taxonomy. A person’s page links to their team, their projects, and the services they own. A project page links back to its people, its services, and its dependencies. The relationships are explicit, not implied by which folder something landed in.&lt;/p&gt;

&lt;h3 id=&quot;what-entity-pages-look-like&quot;&gt;What entity pages look like&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;People&lt;/strong&gt; have the richest frontmatter:&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nn&quot;&gt;---&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;Organization&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;[[Acme&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Corp]]&quot;&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;Title&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Senior Engineer&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;Team&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;[[Platform&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Team]]&quot;&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;aliases&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;jsmith&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;John&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;nn&quot;&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Sections: Work (current role, projects, focus areas), Collaborators (linked people with context), Notes &amp;amp; Observations, Personal (background, former companies).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Projects&lt;/strong&gt; follow a single generic structure: Summary, Goals, Documentation, People, Timeline. Minimal frontmatter, usually just aliases. The same skeleton works whether it’s a product initiative or a technical migration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Services&lt;/strong&gt; range from a short page (overview, limitations, relationship to similar services) to a detailed reference (capabilities, deployment options, compliance, pricing). I also keep pages for external products and services, stuff like Datadog or Terraform or whatever vendor we’re evaluating. I summarize the internet research and add notes on how we actually use it, what’s weird about our setup, and what broke. When Claude reasons about a tool, it gets &lt;em&gt;our&lt;/em&gt; context, not the marketing page. Minimal frontmatter across all of these.&lt;/p&gt;

&lt;p&gt;Common patterns matter more than specifics. Every entity page has YAML frontmatter with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aliases&lt;/code&gt; for flexible linking. Every page uses H2 sections as a consistent skeleton. Every page uses &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[[wiki-links]]&lt;/code&gt; for cross-references.&lt;/p&gt;

&lt;p&gt;These consistent shapes are what make agent search &lt;em&gt;reliable&lt;/em&gt;. When every person page has a “Work” section, and every project page has a “People” section, grep and semantic search hit predictably. The agent doesn’t need to understand your filing system. It just needs to know what to search for.&lt;/p&gt;

&lt;h2 id=&quot;amortized-computation&quot;&gt;Amortized computation&lt;/h2&gt;

&lt;p&gt;Every bit of preprocessing you do on your vault pays off at query time, on every future query. You’re building a database index, except the database is your notes, and the queries come from an LLM.&lt;/p&gt;

&lt;p&gt;Entity pages are the obvious example: instead of the agent re-deriving “who is this person and what do they work on” from scattered notes each time, there’s a canonical page to land on. But the same principle applies to images without alt text (invisible to an LLM, so I enrich them with descriptions), static documents from Google Drive or PDFs (converted to markdown and saved in the vault, one format, one search index), and raw meeting transcripts (long and noisy, so a processed summary in the daily note is cheaper to retrieve and more useful when found).&lt;/p&gt;

&lt;p&gt;LLMs &lt;em&gt;can&lt;/em&gt; work without this structure. They just burn more tokens and take longer to get worse results.&lt;/p&gt;

&lt;h2 id=&quot;the-full-loop&quot;&gt;The full loop&lt;/h2&gt;

&lt;p&gt;Vault is structured. Here’s what a typical day looks like.&lt;/p&gt;

&lt;h3 id=&quot;morning-brief&quot;&gt;Morning brief&lt;/h3&gt;

&lt;p&gt;A scheduled skill runs against my calendar, pulls today’s events, and for each meeting searches the vault, recent emails, and Confluence for relevant context. Out comes a per-meeting brief in my daily note.&lt;/p&gt;

&lt;p&gt;Different meeting types get different templates. A recurring status sync shows what was discussed last time and what’s still open. A 1:1 pulls up the person’s page, their recent activity, and any open threads between us. An interview pulls the candidate’s profile alongside our rubric.&lt;/p&gt;

&lt;p&gt;It’s not always right. Sometimes it surfaces stale context or misses something recent. But it’s a better starting point than walking in cold, and I can skim and correct in a minute.&lt;/p&gt;

&lt;h3 id=&quot;capture&quot;&gt;Capture&lt;/h3&gt;

&lt;p&gt;During meetings, I use &lt;a href=&quot;https://www.granola.so/&quot;&gt;Granola&lt;/a&gt; for transcription. After each meeting, the transcript gets summarized and inserted into the daily note.&lt;/p&gt;

&lt;p&gt;Then Claude does post-processing. The most valuable part is name resolution: the transcription says “John mentioned the migration timeline,” but &lt;em&gt;which&lt;/em&gt; John? Claude checks the entity graph. It knows the meeting was with the Platform Team, searches for people linked to that team, finds Jonathan Smith (Senior Engineer, Platform) and John Park (Data Analyst, Finance), and picks the right one based on context. When phonetic matching is needed, the transcriber hears “Sean,” but the team has a “Shawn,” it handles that too.&lt;/p&gt;

&lt;h3 id=&quot;graph-update&quot;&gt;Graph update&lt;/h3&gt;

&lt;p&gt;After capture, Claude updates the entity graph. Mentioned people get their pages updated, a project status changed, someone took on a new responsibility, or a decision was made. Projects and services get the same treatment.&lt;/p&gt;

&lt;p&gt;Unrecognized entities get flagged. If the transcript mentions a name that doesn’t match anyone in the vault, Claude asks: new person, or mistranscription? Usually, it can be told from context. “The new contractor on the data team” is probably someone new. “Michele from infrastructure” is probably Michael with a transcription error.&lt;/p&gt;

&lt;p&gt;The updates don’t need to be 100% correct. I review them, fix what’s wrong, and move on. Over time, the signal-to-noise ratio improves naturally as correct information gets reinforced and errors get corrected. It’s more wiki than database. Eventual consistency through volume and curation.&lt;/p&gt;

&lt;h3 id=&quot;enrichment&quot;&gt;Enrichment&lt;/h3&gt;

&lt;p&gt;The last stage runs asynchronously. Image attachments get enriched with alt text. External documents get converted to markdown. New entity pages get their skeleton filled in.&lt;/p&gt;

&lt;h2 id=&quot;under-the-hood&quot;&gt;Under the hood&lt;/h2&gt;

&lt;p&gt;Structure without search is a library with no catalog.&lt;/p&gt;

&lt;p&gt;I use &lt;a href=&quot;https://github.com/tobi/qmd&quot;&gt;QMD&lt;/a&gt;, a local search engine that indexes markdown files with both keyword and vector search. It connects to Claude via a skill, a reusable prompt that teaches Claude how to search and what the vault contains. I prefer skills over MCP servers for this: simpler to maintain, version-controlled as markdown, no running process.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;flowchart LR
    V[&quot;Obsidian Vault\na thousand+ markdown files&quot;] --&amp;gt; Q[&quot;QMD Index\nkeyword + vector&quot;]
    Q --&amp;gt; S[&quot;Skill\nsearch instructions&quot;]
    S --&amp;gt; C[&quot;Claude\nquery + reasoning&quot;]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;One thing worth calling out: Obsidian’s link graph, the backlinks and outbound links that make it powerful for human navigation, doesn’t matter much for agents. Humans click links to traverse relationships. Agents don’t click anything. They grep. They search semantically. The entity structure matters because it creates consistent search targets. This plays to coding agents’ strengths. They’re already wired to search for things, not browse for them.&lt;/p&gt;

&lt;p&gt;For meeting capture, I reverse-engineered Granola’s local cache to get the transcripts into my pipeline rather than going through their API. Local caches are often simpler and safer than API access, which might trigger rate limits or security flags. Use existing integrations when they exist; when they don’t, look at what’s already on disk. Profile photos get pulled from Google Meet screenshots and processed through an automated pipeline (resize, crop, adjust) then attached to the person’s page. Small touch, but it makes the vault more navigable when &lt;em&gt;I’m&lt;/em&gt; the one browsing.&lt;/p&gt;

&lt;p&gt;I built all of this as skills first, single-purpose prompts that each handle one step. As individual skills matured, I promoted them to specialized sub-agents that run in parallel, each in its own context window. Cheaper, faster, and easier to debug than cramming everything into one session. If I were starting over, I’d do it the same way: skills first, agents later.&lt;/p&gt;

&lt;h2 id=&quot;what-this-changes&quot;&gt;What this changes&lt;/h2&gt;

&lt;p&gt;My notes used to be a write-only archive. I’d take notes in meetings, file them somewhere reasonable, and never look at them again. Buried under months of accumulated markdown. I used to spend hours cleaning up notes, adding links, and maintaining structure. That overhead is gone now.&lt;/p&gt;

&lt;p&gt;The vault is a working memory. Every meeting makes it a little smarter. The morning brief surfaces context I’d forgotten I had. The post-processing catches connections I’d have missed.&lt;/p&gt;

&lt;p&gt;It’s not perfect. Name resolution still trips up on mispronounced foreign names, some entity pages drift out of date, and the whole thing requires enough structure that you can’t just dump files in and expect magic. But it’s a different relationship with notes. They’re not records of what happened. They’re context for what’s about to happen.&lt;/p&gt;

&lt;p&gt;The gap between “AI can read my files” and “AI understands my work” turns out to be mostly a data structure problem. Context windows and retrieval get you partway there. The rest is in how you organize the vault.&lt;/p&gt;
</description>
        <pubDate>Thu, 09 Apr 2026 12:00:00 +0800</pubDate>
        <link>http://thirteen37.github.io/engineering/2026/04/09/obsidian-claude.html</link>
        <guid isPermaLink="true">http://thirteen37.github.io/engineering/2026/04/09/obsidian-claude.html</guid>
        
        <category>ai</category>
        
        <category>obsidian</category>
        
        <category>knowledge-management</category>
        
        <category>llm</category>
        
        <category>claude</category>
        
        
        <category>engineering</category>
        
      </item>
    
      <item>
        <title>Facts, Episodes, and a Knowledge Graph: Building Persistent AI Memory with SurrealDB</title>
        <description>&lt;p&gt;Every time you open a new chat with an AI assistant, it has forgotten everything about you. Your name, your preferences, what you were working on last week, the decision you made yesterday — gone. You start from scratch every single time.&lt;/p&gt;

&lt;p&gt;This bothered me enough that I built my own. Krill is a local-first AI assistant, and its memory system is the part I’ve spent the most time thinking about. Here’s how it works.&lt;/p&gt;

&lt;h2 id=&quot;the-shape-of-the-problem&quot;&gt;The Shape of the Problem&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://arxiv.org/abs/2309.02427&quot;&gt;CoALA framework&lt;/a&gt; applies cognitive science’s memory taxonomy to language agents: semantic memory (facts and concepts), episodic memory (past experiences), and procedural memory (how to do things). That framing maps cleanly onto what I wanted to build: an assistant that knows facts about you, remembers what happened between you, and understands how things relate.&lt;/p&gt;

&lt;p&gt;In concrete terms, I wanted five things:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Persistence&lt;/strong&gt;: facts survive session restarts&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Relevance&lt;/strong&gt;: surface the right information, not everything at once&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Freshness&lt;/strong&gt;: older or contradicted facts should matter less over time&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Relationships&lt;/strong&gt;: “you use Neovim” and “you edit code daily” are connected, not isolated&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Local-first&lt;/strong&gt;: no cloud sync, no third-party data store, user owns their data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These requirements pull against each other in annoying ways. Keeping everything conflicts with expiring things. Being comprehensive conflicts with being selective. The interesting design work is in reconciling them.&lt;/p&gt;

&lt;h2 id=&quot;extracting-facts-from-conversation&quot;&gt;Extracting Facts from Conversation&lt;/h2&gt;

&lt;p&gt;After every conversation turn, I fire off an asynchronous LLM call to extract structured facts from what was just said. It runs in the background; the user never waits for it.&lt;/p&gt;

&lt;p&gt;The extraction prompt asks for facts in a specific format. Each fact comes back with a content string (“user prefers Python over JavaScript”), a category (one of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;identity&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;preferences&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;personal&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;work&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;other&lt;/code&gt;), a confidence level (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;explicit&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;implied&lt;/code&gt;, or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inferred&lt;/code&gt;), and boolean flags for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decision_point&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;emotional_signal&lt;/code&gt; that I use later for episode synthesis.&lt;/p&gt;

&lt;p&gt;The confidence levels matter. An explicit statement like “I prefer Python” gets 0.9. Something implied by context gets 0.7. A weak inference gets 0.5. Facts below 0.5 are dropped immediately.&lt;/p&gt;

&lt;h3 id=&quot;deduplication&quot;&gt;Deduplication&lt;/h3&gt;

&lt;p&gt;Before writing a new fact, I embed it with a 384-dimensional sentence embedding model and compare it against existing facts via cosine similarity. Three zones, three outcomes:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Similarity&lt;/th&gt;
      &lt;th&gt;Action&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&amp;gt; 0.85&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Reinforce&lt;/strong&gt; — increment count, update timestamp, add session to source list&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;0.4 – 0.85&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Conflict check&lt;/strong&gt; — ask the LLM: does this contradict?&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&amp;lt; 0.4&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;Write&lt;/strong&gt; — clearly new information&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;flowchart TD
    A([New fact]) --&amp;gt; B[Embed into vector]
    B --&amp;gt; C[Cosine similarity search\nagainst existing facts]
    C --&amp;gt; D{Similarity score?}
    D --&amp;gt;|&quot;&amp;gt; 0.85&quot;| E[Reinforce\nincrement count · update timestamp]
    D --&amp;gt;|&quot;0.4 – 0.85&quot;| F[Conflict check\nLLM: does this contradict?]
    D --&amp;gt;|&quot;&amp;lt; 0.4&quot;| G([Write new fact])
    F --&amp;gt;|Yes| H[Supersede old fact]
    H --&amp;gt; G
    F --&amp;gt;|No| G
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The conflict check is where things get interesting. A hard threshold can’t handle nuanced contradictions. “Uses Mac” and “switched to Linux for work” aren’t duplicates, but they’re in tension. I pass both facts to the LLM and ask: does the new fact contradict the old one, and if so which should survive? The answer depends on context only the LLM can evaluate.&lt;/p&gt;

&lt;h2 id=&quot;not-all-facts-are-equal&quot;&gt;Not All Facts Are Equal&lt;/h2&gt;

&lt;p&gt;A fact mentioned once in passing shouldn’t carry the same weight as one repeated a dozen times over six months. &lt;a href=&quot;https://arxiv.org/abs/2305.10250&quot;&gt;MemoryBank&lt;/a&gt; applies Ebbinghaus’s forgetting curve to LLM memory, where memories decay over time and get reinforced on re-encounter. I took the same idea and made it explicit with a trust score:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;trust = confidence × recency × boost
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;confidence&lt;/strong&gt;: the extraction-time score (0.5, 0.7, or 0.9)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;recency&lt;/strong&gt;: decays with a 90-day half-life — &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0.5^(days_since_reinforcement / 90)&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;boost&lt;/strong&gt;: log-scale reward for repeated confirmation — &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;min(1.0 + 0.1 × log₂(1 + reinforcement_count), 1.5)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/images/trust-decay.png&quot; alt=&quot;Trust score decay curves for three fact types over 365 days, showing high-confidence facts decaying slowly, medium-confidence facts holding steady with reinforcement events, and low-confidence facts dropping below the pruning threshold&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A brand-new medium-confidence fact starts at 0.7. That same fact, reinforced five times over two months, climbs toward 1.0. A fact not mentioned for a year drops toward irrelevance. Trust scores also gate conflict resolution: if an existing fact has a trust score more than 1.5× higher than the incoming one, the old fact survives. You don’t replace a well-established belief with a low-confidence passing mention.&lt;/p&gt;

&lt;h2 id=&quot;getting-the-right-facts-into-context&quot;&gt;Getting the Right Facts into Context&lt;/h2&gt;

&lt;p&gt;Writing facts is the easy part. The hard part is retrieval: given what the user is saying right now, which facts should the assistant know about?&lt;/p&gt;

&lt;p&gt;I inject memory in two layers. First, category summaries: for each category (identity, preferences, work, etc.), I maintain an LLM-synthesized narrative that gets rewritten whenever five or more new facts arrive. These go in first and give broad strokes. Then the top five facts retrieved by semantic similarity to the current user message, re-ranked by trust score, for specific detail.&lt;/p&gt;

&lt;p&gt;Semantic search alone has a well-known failure mode: it finds facts that are topically nearby but misses exact keyword matches. If the user mentions a specific name or acronym, a dense embedding may not surface the right fact. I combine vector search with full-text search and merge the two ranked lists using reciprocal rank fusion (RRF). Neither list needs to be perfect — RRF is robust to noise in either source, and facts that rank highly in both float to the top.&lt;/p&gt;

&lt;p&gt;When the memory store grows large, a lightweight LLM selector runs first to filter candidates, ranking by trust and picking only the ones actually relevant to the current query rather than blindly taking the top-N by embedding distance.&lt;/p&gt;

&lt;h2 id=&quot;two-harder-problems&quot;&gt;Two Harder Problems&lt;/h2&gt;

&lt;p&gt;Facts answer “what do I know about you.” But two questions remain: “what happened between us?” and “what is related to X?” Those need different structures.&lt;/p&gt;

&lt;h3 id=&quot;episodic-memory-what-happened&quot;&gt;Episodic Memory: What Happened&lt;/h3&gt;

&lt;p&gt;Imagine you spent a session debugging a tricky race condition and ultimately decided to rewrite a module. The facts extracted might be: “user is debugging a concurrency bug,” “user works on Krill,” “user prefers explicit concurrency over async abstractions.” Accurate, but they’ve lost the event. The diagnosis, the pivot, the three hours of back-and-forth that led to the decision.&lt;/p&gt;

&lt;p&gt;Semantic search on those facts won’t recover it. If you ask about that module next week, the search finds facts close to your query. It won’t surface “we debugged a race condition and decided to rewrite” because that sentence doesn’t exist in the store. It lived between the facts.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://arxiv.org/abs/2304.03442&quot;&gt;Generative Agents paper&lt;/a&gt; showed how to handle this: record experiences as a memory stream, then synthesize them into higher-level summaries. I apply the same idea to a personal assistant. When a session produces enough signal (at least three facts, or a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decision_point&lt;/code&gt; flag), I fire an LLM pass over the session’s conversation history and produce a short narrative. Something like: “Debugged a race condition in the gateway’s async loop. After ruling out the scheduler and lock ordering, found the issue was a missing await on a shared resource. Decided to rewrite the module with explicit locking rather than patching the existing code.”&lt;/p&gt;

&lt;p&gt;That narrative gets stored separately from facts and injected as “Relevant Experiences.” Unlike facts, episodes are surfaced by recency as well as semantic similarity, because the most recent session is almost always relevant even if the topic has shifted.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decision_point&lt;/code&gt; flag does real work here on two levels. At extraction time it’s a boolean on individual facts, marking that something was a consequential choice. This lets a session trigger episode synthesis even if it produced only one or two facts — a session where you made one important architectural decision is worth narrating even if not much else was said. In the synthesized episode, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decision_point&lt;/code&gt; becomes a string: the single most important choice or pivot from the session. This gives the assistant a specific handle on what was decided, not just that something happened.&lt;/p&gt;

&lt;h3 id=&quot;knowledge-graph-whats-related&quot;&gt;Knowledge Graph: What’s Related&lt;/h3&gt;

&lt;p&gt;The second problem is structural. Semantic search retrieves facts that are textually similar to the current query. It fails when relevance is relational rather than lexical.&lt;/p&gt;

&lt;p&gt;Suppose you ask about your deployment pipeline. Semantic search finds deployment-related facts. It won’t find Adam. But suppose the memory contains “Worked on Terraform with Adam.” The graph has encoded Adam as a person with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;worked_on&lt;/code&gt; relation to Terraform, and Terraform with an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;implements&lt;/code&gt; relation to the deployment pipeline. Expand one hop from “deployment pipeline” and Terraform surfaces. Expand one more and Adam surfaces. Neither connection is a text match to your query; it’s pure graph structure.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;graph LR
    DP[&quot;**Deployment Pipeline**\n〈project〉&quot;]
    TF[&quot;Terraform\n〈tool〉&quot;]
    AD[&quot;Adam\n〈person〉&quot;]

    TF --&amp;gt;|implements| DP
    AD --&amp;gt;|worked_on| TF

    style DP fill:#4a90d9,color:#fff,stroke:#2c6fad
    style TF fill:#7cb8e8,color:#222,stroke:#4a90d9
    style AD fill:#b3d8f5,color:#222,stroke:#7cb8e8
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Every five new facts, I trigger a graph extraction pass. The LLM reads the accumulated facts and produces entities typed as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;person&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tool&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;concept&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;place&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;organization&lt;/code&gt;, or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;project&lt;/code&gt;, plus directed typed relations: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uses&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;works_at&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;collaborates_with&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;created&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;skilled_in&lt;/code&gt;, and so on. The relations aren’t just labels — each edge carries the confidence of the source fact, the timestamp it was extracted, and which session it came from. That means when I traverse the graph to answer a query, I can weight edges by how reliable and recent they are, not just whether they exist.&lt;/p&gt;

&lt;p&gt;One thing that surprised me: entities need deduplication just like facts. “Neovim”, “nvim”, and “that editor I use” all refer to the same thing. I run the same cosine similarity check against existing entities (threshold 0.85) and collapse them to a single node. Without this, the graph fragments across surface forms and loses its connective value.&lt;/p&gt;

&lt;p&gt;Relations also drift. The LLM invents its own types when the seed taxonomy doesn’t quite fit: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;employs&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;works_with&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;relies_on&lt;/code&gt; alongside the canonical &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uses&lt;/code&gt;. I run a periodic normalization pass that merges near-synonym types back to seeds via another LLM call, keeping the graph navigable.&lt;/p&gt;

&lt;h2 id=&quot;prior-art&quot;&gt;Prior Art&lt;/h2&gt;

&lt;p&gt;I’m not working in a vacuum. &lt;a href=&quot;https://arxiv.org/abs/2310.08560&quot;&gt;MemGPT&lt;/a&gt;, now productized as &lt;a href=&quot;https://www.letta.com&quot;&gt;Letta&lt;/a&gt;, pioneered one approach: agents explicitly manage their own memory via tool calls, deciding what to store and retrieve. It’s an interesting design, but it puts memory management in the conversation layer. I went the opposite direction: memory extraction is a background harness service, invisible to the agent, so the conversation stays focused on the task.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://mem0.ai&quot;&gt;Mem0&lt;/a&gt; is the closest open-source analogue to what I built: fact extraction, deduplication, and retrieval as a standalone memory layer, well-adopted and actively developed. The main differences are that I add trust scoring with reinforcement (near-duplicate facts strengthen existing atoms rather than being silently dropped), episodic synthesis, and a knowledge graph, and keep everything local.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/getzep/graphiti&quot;&gt;Zep’s Graphiti&lt;/a&gt; takes a different angle on the knowledge graph: facts have temporal validity windows (valid from time T, superseded at time T+N) rather than a continuous decay score. Different formalization of the same intuition. Worth reading their &lt;a href=&quot;https://arxiv.org/abs/2501.13956&quot;&gt;paper&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;why-surrealdb&quot;&gt;Why SurrealDB&lt;/h2&gt;

&lt;p&gt;I needed document storage, vector similarity search, full-text search, and graph edges in a single database that runs embedded with no server process. That last requirement eliminates most options.&lt;/p&gt;

&lt;p&gt;Typically this means three or four systems: a document store, Qdrant for vectors, Elasticsearch for full-text, Neo4j for the graph. Coordinating them locally is painful. &lt;a href=&quot;https://surrealdb.com&quot;&gt;SurrealDB&lt;/a&gt; handles all of them. It runs in embedded mode (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;surrealkv://&lt;/code&gt;) with the entire memory system as a single file on disk.&lt;/p&gt;

&lt;p&gt;Semantic search runs against a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MTREE&lt;/code&gt; vector index with cosine similarity on 384-dim embeddings. Full-text search runs against a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SEARCH&lt;/code&gt; index on the same table, and the two result sets are merged via RRF before re-ranking by trust.&lt;/p&gt;

&lt;p&gt;Knowledge graph edges are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TYPE RELATION&lt;/code&gt; tables, where each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RELATE&lt;/code&gt; statement creates a first-class edge record that can carry arbitrary metadata — confidence, timestamp, source session — not just a label. Traversal is also bidirectional: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-&amp;gt;references-&amp;gt;&lt;/code&gt; follows edges forward, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;-references&lt;/code&gt; follows them in reverse. That means I can ask “what does Terraform implement?” and “what uses Terraform?” from the same edge table without duplicating data.&lt;/p&gt;

&lt;p&gt;Facts, episodes, entity nodes, and category summaries all land in the same &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;atom&lt;/code&gt; table but carry completely different metadata shapes — a fact has &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;confidence&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;category&lt;/code&gt;, an entity has &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;entity_type&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aliases&lt;/code&gt;, an episode has &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;narrative&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;decision_point&lt;/code&gt;. SurrealDB’s schemaless documents mean each atom carries exactly the fields it needs, without padding every row with null columns or splitting atom types across separate tables.&lt;/p&gt;

&lt;p&gt;SurrealDB is younger and less battle-tested than Postgres, and SurrealQL has a learning curve. But for local-first where deployment simplicity matters more than horizontal scaling, it’s been the right call.&lt;/p&gt;

&lt;h2 id=&quot;what-i-learned&quot;&gt;What I Learned&lt;/h2&gt;

&lt;p&gt;Add trust scoring early. I bolted it on later and it touched everything: ranking, conflict gating, expiry. The formula takes microseconds; there’s no good reason to wait.&lt;/p&gt;

&lt;p&gt;Use an LLM for conflict resolution. My first instinct was a rule: if similarity is above some threshold and the new fact contradicts the old one, overwrite. But “contradicts” isn’t binary. “Switched to Linux for work” and “uses Mac” are both true in different contexts. A rule gets this wrong; the LLM generally doesn’t.&lt;/p&gt;

&lt;p&gt;Episodic memory and facts serve different cognitive roles and you can’t derive one from the other. Facts are what the assistant knows about you; episodes are what happened between you. I assumed episodes would fall out of the fact store naturally. They don’t. They need their own extraction pipeline, their own storage, their own retrieval logic. Conflating them is a design mistake.&lt;/p&gt;

&lt;p&gt;Category summaries saved me when the fact store got large. Semantic retrieval alone misses context that isn’t textually close to the query. Summaries give a fallback that’s always relevant regardless of what was asked.&lt;/p&gt;

&lt;p&gt;Keep the database layer dumb. Every time I put logic there I regretted it. The intelligence is in the pipeline.&lt;/p&gt;
</description>
        <pubDate>Thu, 12 Mar 2026 12:00:00 +0800</pubDate>
        <link>http://thirteen37.github.io/engineering/2026/03/12/how-i-built-ai-memory.html</link>
        <guid isPermaLink="true">http://thirteen37.github.io/engineering/2026/03/12/how-i-built-ai-memory.html</guid>
        
        <category>ai</category>
        
        <category>memory</category>
        
        <category>architecture</category>
        
        <category>llm</category>
        
        <category>surrealdb</category>
        
        
        <category>engineering</category>
        
      </item>
    
      <item>
        <title>Agents and Spaces: A Minimal Architecture for Multi-Agent Coordination</title>
        <description>&lt;p&gt;I’ve been building a local-first personal AI assistant, something in
the vein of &lt;a href=&quot;https://www.openinterpreter.com/&quot;&gt;Open Interpreter&lt;/a&gt; and
&lt;a href=&quot;https://docs.openclaw.ai&quot;&gt;OpenClaw&lt;/a&gt;, and I hit a wall that I think
a lot of people are hitting right now.&lt;/p&gt;

&lt;p&gt;The single-agent loop works surprisingly well. User says something, LLM
reasons, tools execute, repeat. Wrap it in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;while true&lt;/code&gt; and you’ve got
the &lt;a href=&quot;https://ghuntley.com/ralph/&quot;&gt;Ralph Wiggum loop&lt;/a&gt;. You can get a lot
done this way. But a personal assistant that’s actually &lt;em&gt;useful&lt;/em&gt; needs
to do many things at
once. Research a topic while drafting a document. Process emails while
you’re chatting. Run a scheduled task overnight and have the results
ready in the morning. One agent, one context window, one turn at a
time. It doesn’t scale.&lt;/p&gt;

&lt;p&gt;So, multi-agent, obviously, but the more I looked at existing
frameworks, the less I liked what I saw.&lt;/p&gt;

&lt;h2 id=&quot;whats-broken&quot;&gt;What’s broken&lt;/h2&gt;

&lt;p&gt;Here are real problems I ran into:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-channel continuity.&lt;/strong&gt; You mention a project in chat. An email
arrives about the same project. The chat agent and the email agent have
no shared context. In frameworks like
&lt;a href=&quot;https://www.crewai.com/&quot;&gt;CrewAI&lt;/a&gt; or
&lt;a href=&quot;https://microsoft.github.io/autogen/&quot;&gt;AutoGen&lt;/a&gt;, agents communicate
through prescribed channels, but there’s no unified content store where
both can discover they’re working on the same thing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Background work blocks the conversation.&lt;/strong&gt; Ask your assistant to
research something that takes 30 seconds. In a single-agent loop,
you’re staring at a spinner. Can’t ask a follow-up, can’t switch
topics. The context window is busy. Multi-agent helps, but most
frameworks want you to declare the topology upfront. “This is the
research agent, this is the chat agent, here’s how they talk.” What if
the next question &lt;em&gt;doesn’t&lt;/em&gt; need research?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deferred actions lose their context.&lt;/strong&gt; “Remind me tomorrow at 9am” is
easy to &lt;em&gt;schedule&lt;/em&gt; but hard to &lt;em&gt;execute well&lt;/em&gt;. The reminder needs to
carry the conversational context it was created in, like a closure
capturing its environment. But the cron trigger and the chat agent are
separate systems. The deferred action fires in a vacuum.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trust is all-or-nothing.&lt;/strong&gt; You want a research agent that can browse
the web but can’t see your private files. Most frameworks either give
agents full access to everything (YOLO mode), or make you build
separate permission systems per tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent roles are premature abstractions.&lt;/strong&gt; Frameworks ask you to define
types upfront: Researcher, Writer, Reviewer, Coder. But real tasks
don’t decompose neatly into roles. “Plan my trip to Tokyo” needs web
research, calendar access, budget calculations, and document drafting,
in an order that depends on what the research turns up. A fixed role
graph can’t adapt.&lt;/p&gt;

&lt;p&gt;Any personal assistant that handles more than one thing at a time hits
all of these.&lt;/p&gt;

&lt;h2 id=&quot;the-bitter-lesson-applied&quot;&gt;The bitter lesson, applied&lt;/h2&gt;

&lt;p&gt;Rich Sutton’s
&lt;a href=&quot;http://www.incompleteideas.net/IncsightBrief.pdf&quot;&gt;Bitter Lesson&lt;/a&gt;:
methods that leverage general computation consistently outperform
methods that encode human knowledge. The history of AI is littered with
hand-crafted heuristics that worked until general-purpose approaches
scaled past them.&lt;/p&gt;

&lt;p&gt;Multi-agent systems are repeating this mistake. Current frameworks
encode coordination strategies (“this agent is the planner, that one
is the executor, they communicate through this protocol”) as if we
know what optimal coordination looks like. We don’t. LLMs are improving
fast enough that any fixed topology will be obsolete within a year.&lt;/p&gt;

&lt;p&gt;So I went the other direction: minimal concepts, zero prescribed
workflows, coordination that emerges from how agents use the building
blocks.&lt;/p&gt;

&lt;h2 id=&quot;the-building-blocks&quot;&gt;The building blocks&lt;/h2&gt;

&lt;p&gt;The architecture has three concepts.&lt;/p&gt;

&lt;h3 id=&quot;agents&quot;&gt;Agents&lt;/h3&gt;

&lt;p&gt;An agent is an LLM with a context window, some tools, and access to
shared content.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Component&lt;/th&gt;
      &lt;th&gt;What it is&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Which LLM (Sonnet, Haiku, Opus, a local model)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Instructions&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;How to behave, what to prioritize, what format to use&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Tools&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;What the agent can do (web search, file access, code execution)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Session&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;The context window: working memory and turn history&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Connected spaces&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Shared content stores the agent can read from and write to&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;No classes, no role enums. What makes an agent a “research agent” vs. a
“writing agent” is the instructions it got and the spaces it can see.&lt;/p&gt;

&lt;p&gt;The session &lt;em&gt;is&lt;/em&gt; the agent’s identity. Creating a session = the agent is born. Wiping
it = the agent is dead. Same instructions, fresh session, different
agent. And spaces outlive agents. Whatever an agent wrote to a space is
still there after it’s gone.&lt;/p&gt;

&lt;p&gt;Agents can &lt;strong&gt;spawn&lt;/strong&gt; other agents, passing along a task, instructions,
and explicit access boundaries. The spawner controls what the new agent
can see and do. Everything else (how to decompose the work, whether to
spawn further, how to coordinate) is the new agent’s problem.&lt;/p&gt;

&lt;h3 id=&quot;spaces&quot;&gt;Spaces&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;space&lt;/strong&gt; is a shared, access-controlled content store. Agents read
and write &lt;strong&gt;atoms&lt;/strong&gt;, small units of content (a task, a message, a
research finding, a document section) that carry metadata and can
reference each other. Content is semantically searchable. Changes
trigger notifications to subscribers.&lt;/p&gt;

&lt;p&gt;There is no separate messaging system. A task assignment is an atom
update. A status report is a new atom. A question to a collaborator is
a message in a shared space. All communication is content in spaces.&lt;/p&gt;

&lt;h3 id=&quot;conventions&quot;&gt;Conventions&lt;/h3&gt;

&lt;p&gt;Everything above the raw mechanics is a &lt;strong&gt;convention&lt;/strong&gt;, a usage
pattern taught through instructions, not enforced by code.&lt;/p&gt;

&lt;p&gt;A “task list” isn’t a special type of space. It’s a regular space where
agents follow the task list convention: atoms have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;status&lt;/code&gt;/&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;owner&lt;/code&gt;
metadata, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;blocked_by&lt;/code&gt; references encode dependencies, agents scan for
unblocked pending tasks. A “chat” is a space where atoms have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;role&lt;/code&gt;
metadata and a UI renders them. A “knowledge base” is a space where
atoms carry &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;confidence&lt;/code&gt; scores and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;supports&lt;/code&gt;/&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;contradicts&lt;/code&gt; references.&lt;/p&gt;

&lt;p&gt;Same spaces, same operations, different conventions.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;graph TB
    subgraph CON[&quot;Conventions (taught, not enforced)&quot;]
        TL[&quot;Task Lists&quot;] ~~~ CH[&quot;Channels&quot;] ~~~ CT[&quot;Chats&quot;] ~~~ DOC[&quot;Documents&quot;]
    end

    subgraph CORE[&quot;Core&quot;]
        S[&quot;Spaces&amp;lt;br/&amp;gt;&amp;lt;small&amp;gt;shared content + permissions +&amp;lt;br/&amp;gt;search + subscriptions&amp;lt;/small&amp;gt;&quot;]
        A[&quot;Agents&amp;lt;br/&amp;gt;&amp;lt;small&amp;gt;LLM + instructions + tools +&amp;lt;br/&amp;gt;session + connected spaces&amp;lt;/small&amp;gt;&quot;]
    end

    A --&amp;gt;|&quot;spawn&quot;| A
    A &amp;lt;--&amp;gt;|&quot;read / write&amp;lt;br/&amp;gt;atoms&quot;| S
    CON -.-|&quot;patterns built on&quot;| CORE

    style A fill:#4a6fa5,color:#fff
    style S fill:#e8d44d,color:#333
    style CON fill:#6b8cae,color:#fff
    style CORE fill:#f5f5f5,color:#333
    style TL fill:#93afc5,color:#333
    style CH fill:#93afc5,color:#333
    style CT fill:#93afc5,color:#333
    style DOC fill:#93afc5,color:#333
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;An agent that can create other agents and share content stores with them
can express any coordination pattern (pipelines, fan-out, debates,
priority queues) without the architecture prescribing any of them.&lt;/p&gt;

&lt;h2 id=&quot;spawning-and-trust&quot;&gt;Spawning and trust&lt;/h2&gt;

&lt;p&gt;When an agent spawns another, it defines the trust boundary:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;spawn(
    task:         &quot;What to do&quot; (natural language),
    instructions: &quot;How to behave&quot; (reference to an instruction space),
    spaces:       { space_id: permission },
    tools:        [...],
    secrets:      [...],
    model:        &quot;which LLM&quot;
)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The spawner says what the new agent can access and gives it a task,
with no role assignment, topology declaration, or workflow step number.&lt;/p&gt;

&lt;h3 id=&quot;trust-narrows-monotonically&quot;&gt;Trust narrows monotonically&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Privileges can only narrow down the spawn tree, never widen.&lt;/strong&gt;
Spaces, tools, secrets all narrow monotonically. No privilege
escalation without going back to a coordinator.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;graph TD
    C[&quot;Coordinator&amp;lt;br/&amp;gt;&amp;lt;small&amp;gt;all tools, all spaces, all secrets&amp;lt;/small&amp;gt;&quot;]
    R[&quot;Research Agent&amp;lt;br/&amp;gt;&amp;lt;small&amp;gt;web tools, research space&amp;lt;/small&amp;gt;&quot;]
    W[&quot;Writer Agent&amp;lt;br/&amp;gt;&amp;lt;small&amp;gt;file tools, doc space&amp;lt;/small&amp;gt;&quot;]
    S1[&quot;Sub-researcher&amp;lt;br/&amp;gt;&amp;lt;small&amp;gt;web tools, research space (read only)&amp;lt;/small&amp;gt;&quot;]
    S2[&quot;Sub-researcher&amp;lt;br/&amp;gt;&amp;lt;small&amp;gt;web tools, research space (read only)&amp;lt;/small&amp;gt;&quot;]

    C --&amp;gt;|&quot;spawn&amp;lt;br/&amp;gt;(narrows access)&quot;| R
    C --&amp;gt;|&quot;spawn&amp;lt;br/&amp;gt;(narrows access)&quot;| W
    R --&amp;gt;|&quot;spawn&amp;lt;br/&amp;gt;(narrows further)&quot;| S1
    R --&amp;gt;|&quot;spawn&amp;lt;br/&amp;gt;(narrows further)&quot;| S2

    style C fill:#4a6fa5,color:#fff
    style R fill:#6b8cae,color:#fff
    style W fill:#6b8cae,color:#fff
    style S1 fill:#93afc5,color:#333
    style S2 fill:#93afc5,color:#333
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Every spawn is a trust boundary. Give a research agent web access but no
PII access, a data diode where information flows one direction. The
spawner defines the boundary; the spawned agent operates within it.&lt;/p&gt;

&lt;h3 id=&quot;but-sometimes-a-child-needs-more&quot;&gt;But sometimes a child needs more&lt;/h3&gt;

&lt;p&gt;Real tasks don’t always fit the initial access grant. A research agent
discovers it needs to read a private document. A code agent realizes it
needs database credentials.&lt;/p&gt;

&lt;p&gt;The mechanism: ask the coordinator. Every sub-agent has a DM space
shared with its coordinator. It writes a request. The coordinator
evaluates whether the agent’s task justifies the access and grants or
denies. Privileges widen only through an explicit grant from an agent
with sufficient permission, like a manager approving an access request.&lt;/p&gt;

&lt;p&gt;No special permission API needed, just agents communicating through a space.&lt;/p&gt;

&lt;h3 id=&quot;identity-is-not-a-type&quot;&gt;Identity is not a type&lt;/h3&gt;

&lt;p&gt;Agent identity is shaped by instructions, not declared. The
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;instructions&lt;/code&gt; field references a space containing behavioral guidance.
This could be a well-known default (“research-guidelines”), a fork
customized for this task, something created from scratch, or (if the
agent has write access) &lt;strong&gt;refined by the agent itself&lt;/strong&gt; over time.&lt;/p&gt;

&lt;p&gt;Pre-defined agent templates become seed content, not a locked-in
registry. Fork “research-guidelines,” tweak it for biotech, spawn a
specialized agent. All at runtime, no code changes.&lt;/p&gt;

&lt;h2 id=&quot;inside-a-space&quot;&gt;Inside a space&lt;/h2&gt;

&lt;h3 id=&quot;atoms&quot;&gt;Atoms&lt;/h3&gt;

&lt;p&gt;The unit of content is an &lt;strong&gt;atom&lt;/strong&gt;: a payload (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;content&lt;/code&gt;) with
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;metadata&lt;/code&gt; (key-value pairs like status, owner, priority) and typed
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;references&lt;/code&gt; to other atoms (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;blocked_by&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;supersedes&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;supports&lt;/code&gt;).
Atoms are versioned and semantically searchable. The system stores and
traverses references but doesn’t interpret their semantics. That’s the
convention layer’s job.&lt;/p&gt;

&lt;p&gt;Operations are the basics: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;put&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;get&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;update&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;search&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;deprecate&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;history&lt;/code&gt;. Every mutation includes a &lt;strong&gt;comment&lt;/strong&gt;, a
semantic description like a git commit message. Comments are what gets
published to subscribers, not atom content. An agent sees “added
findings from 3 review sites” and decides whether to pull the full atom.&lt;/p&gt;

&lt;h3 id=&quot;permissions&quot;&gt;Permissions&lt;/h3&gt;

&lt;p&gt;Four levels, each implying the ones above it
(&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;grant&lt;/code&gt; ⊃ &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;write&lt;/code&gt; ⊃ &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;append&lt;/code&gt; ⊃ &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;read&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;read&lt;/strong&gt;: See content, search, subscribe&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;append&lt;/strong&gt;: Add new atoms (can’t modify existing)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;write&lt;/strong&gt;: Full mutation (update, replace, refine)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;grant&lt;/strong&gt;: Share access with other agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;New spaces start private, and only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;grant&lt;/code&gt; holders can share them.&lt;/p&gt;

&lt;h3 id=&quot;search&quot;&gt;Search&lt;/h3&gt;

&lt;p&gt;Semantic and structured, composable:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;search(
    query=&quot;frontend tasks&quot;,                        # semantic
    metadata={&quot;status&quot;: &quot;pending&quot;},                # exact match
    references={&quot;blocked_by&quot;: {&quot;status&quot;: &quot;done&quot;}}  # reference state
)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;One call: find atoms matching “frontend tasks” where status is pending
and all blockers are done. This is what makes conventions practical.
You can query a task list for available work without multiple round
trips.&lt;/p&gt;

&lt;h3 id=&quot;how-spaces-reach-agents&quot;&gt;How spaces reach agents&lt;/h3&gt;

&lt;p&gt;Spaces connect to an agent’s session in two ways. &lt;strong&gt;Injected&lt;/strong&gt; spaces
are always in context: instructions, personal memories, convention
descriptions. &lt;strong&gt;Queried&lt;/strong&gt; spaces are searched on demand: task lists,
knowledge bases, research findings. The distinction matters because
sessions are bounded but spaces are unbounded. You can’t inject
everything.&lt;/p&gt;

&lt;p&gt;How does an agent &lt;em&gt;know&lt;/em&gt; a convention? Each space carries a description
in its metadata: “This is a task list. Atoms have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;status&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;owner&lt;/code&gt;,
and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;priority&lt;/code&gt;. Use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;blocked_by&lt;/code&gt; references for dependencies.” When an
agent connects, that description gets injected. The agent learns how to
use the space from the space itself.&lt;/p&gt;

&lt;h2 id=&quot;content-is-coordination&quot;&gt;Content is coordination&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;There is no separate messaging primitive.&lt;/strong&gt; Everything is content in
spaces.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;graph LR
    subgraph &quot; &quot;
        direction LR
        A1[&quot;Research&amp;lt;br/&amp;gt;Agent&quot;] --&amp;gt;|put| S1([&quot;Findings&amp;lt;br/&amp;gt;Space&quot;])
        S1 --&amp;gt;|notification| A2[&quot;Writer&amp;lt;br/&amp;gt;Agent&quot;]
        A2 --&amp;gt;|put| S2([&quot;Document&amp;lt;br/&amp;gt;Space&quot;])
        S2 --&amp;gt;|notification| A3[&quot;Review&amp;lt;br/&amp;gt;Agent&quot;]
        A3 --&amp;gt;|update| S2
    end

    style S1 fill:#e8d44d,color:#333
    style S2 fill:#e8d44d,color:#333
    style A1 fill:#6b8cae,color:#fff
    style A2 fill:#6b8cae,color:#fff
    style A3 fill:#6b8cae,color:#fff
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When a research agent writes findings to a shared space, that &lt;em&gt;is&lt;/em&gt; the
communication. Other agents subscribed to the space see the update and
react. No separate “hey, I’m done” message needed, because the distinction
between “communication” and “work product” collapses.&lt;/p&gt;

&lt;p&gt;Every space change triggers notifications to subscribers. When one
arrives, the system runs an LLM turn. The agent sees the comment and
decides how to react. This is reactive activation from
&lt;a href=&quot;https://en.wikipedia.org/wiki/Blackboard_(design_pattern)&quot;&gt;blackboard architectures&lt;/a&gt;
(Hearsay-II, 1980) without a control shell. The system delivers
events; agents decide what matters.&lt;/p&gt;

&lt;h2 id=&quot;coordinators-are-topological-not-assigned&quot;&gt;Coordinators are topological, not assigned&lt;/h2&gt;

&lt;p&gt;The system needs entry points, places where external input enters the
agent ecosystem. I call these &lt;strong&gt;coordinators&lt;/strong&gt;, but it’s misleading if
you think of it as a role.&lt;/p&gt;

&lt;p&gt;A coordinator is any agent connected to a space with an external
interface. TUI writes to a chat space? That agent is a coordinator.
Webhook writes to a webhook space? Also a coordinator. The topology
determines it.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-mermaid&quot;&gt;graph LR
    UI[&quot;TUI / Web UI&quot;] --&amp;gt; CS([&quot;Chat Space&quot;])
    WH[&quot;Webhook&quot;] --&amp;gt; WS([&quot;Webhook Space&quot;])
    CR[&quot;Cron&quot;] --&amp;gt; CRS([&quot;Cron Space&quot;])

    CS --&amp;gt; UA[&quot;User Agent&quot;]
    WS --&amp;gt; WA[&quot;Webhook Agent&quot;]
    CRS --&amp;gt; CA[&quot;Cron Agent&quot;]

    UA &amp;lt;--&amp;gt;|read/write| BUS([&quot;Coordinator Bus&quot;])
    WA &amp;lt;--&amp;gt;|read/write| BUS
    CA &amp;lt;--&amp;gt;|read/write| BUS

    style CS fill:#e8d44d,color:#333
    style WS fill:#e8d44d,color:#333
    style CRS fill:#e8d44d,color:#333
    style BUS fill:#d4a44d,color:#333
    style UA fill:#4a6fa5,color:#fff
    style WA fill:#4a6fa5,color:#fff
    style CA fill:#4a6fa5,color:#fff
    style UI fill:#888,color:#fff
    style WH fill:#888,color:#fff
    style CR fill:#888,color:#fff
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Coordinators can grant access, share spaces across agent boundaries,
and manage subagent lifecycles. They share a &lt;strong&gt;coordinator bus&lt;/strong&gt; for cross-coordinator communication.&lt;/p&gt;

&lt;h3 id=&quot;remind-me-tomorrow-at-9am&quot;&gt;“Remind me tomorrow at 9am”&lt;/h3&gt;

&lt;p&gt;This sounds trivial but exposes every seam in a multi-agent system. It
crosses coordinator boundaries, requires scheduling, needs shared
context.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;You type “remind me tomorrow at 9am to review the Q3 report”&lt;/li&gt;
  &lt;li&gt;Chat UI writes to the &lt;strong&gt;chat space&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;The &lt;strong&gt;user-agent&lt;/strong&gt; recognizes a cross-coordinator request, writes to
the &lt;strong&gt;coordinator bus&lt;/strong&gt;: “set reminder: 9am tomorrow, context: Q3
report review”&lt;/li&gt;
  &lt;li&gt;The &lt;strong&gt;cron-agent&lt;/strong&gt; (subscribed to the bus) picks it up, sets the
trigger, writes back an acknowledgment&lt;/li&gt;
  &lt;li&gt;User-agent confirms: “Done, I’ll remind you tomorrow at 9am”&lt;/li&gt;
  &lt;li&gt;Next morning: cron fires, cron-agent writes the reminder to the bus
with the original context&lt;/li&gt;
  &lt;li&gt;User-agent picks it up: “Time to review the Q3 report”&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two coordinators talking through a space, using the same operations as
everything else.&lt;/p&gt;

&lt;h2 id=&quot;conventions-1&quot;&gt;Conventions&lt;/h2&gt;

&lt;p&gt;I mentioned eight conventions so far. Here they are:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Convention&lt;/th&gt;
      &lt;th&gt;Analogous to&lt;/th&gt;
      &lt;th&gt;Key pattern&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Task List&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;GitHub Issues&lt;/td&gt;
      &lt;td&gt;Status/owner metadata + dependency refs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Channel&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Slack channel&lt;/td&gt;
      &lt;td&gt;Append-only message stream&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Chat&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Chat UI&lt;/td&gt;
      &lt;td&gt;Bidirectional, role-tagged turns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Document&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Google Docs&lt;/td&gt;
      &lt;td&gt;Progressive refinement with versioning&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Scratchpad&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Notepad&lt;/td&gt;
      &lt;td&gt;Private working memory&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Prompt Library&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Template registry&lt;/td&gt;
      &lt;td&gt;Reusable instruction + convention templates&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Knowledge Base&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Team wiki&lt;/td&gt;
      &lt;td&gt;Curated facts with evidence links&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Personal Memory&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;mem0&lt;/td&gt;
      &lt;td&gt;Auto-extracted facts, injected into context&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;New conventions emerge by writing new descriptions, not new code.&lt;/p&gt;

&lt;h2 id=&quot;code-as-the-interaction-model&quot;&gt;Code as the interaction model&lt;/h2&gt;

&lt;p&gt;Most agent frameworks give agents a fixed set of named tools like
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;search&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;read_file&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;send_email&lt;/code&gt;, and the agent picks which to
call. This works for simple things. But what if you need to search,
filter results, then conditionally update three items based on what you
found? That’s three tool calls with branching logic in between. State
management across turns. A new tool for every new operation.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/qwibitai/nanoclaw&quot;&gt;NanoClaw&lt;/a&gt; takes one approach:
keep the codebase small enough that the agent can rewrite its own tools.
But there’s a more general version of the same insight: let agents write
code as their primary way of acting.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2402.01030&quot;&gt;CodeAct&lt;/a&gt; (Wang et al., 2024) showed
that agents expressing actions as executable Python achieve up to 20%
higher success rates than structured tool calls. Hugging Face’s
&lt;a href=&quot;https://github.com/huggingface/smolagents&quot;&gt;smolagents&lt;/a&gt; makes this
production-grade. &lt;a href=&quot;https://arxiv.org/abs/2512.24601&quot;&gt;RLMs&lt;/a&gt; (Zhang et
al., 2025) take it further: agents &lt;em&gt;think&lt;/em&gt; in code, managing their own
context programmatically and
&lt;a href=&quot;https://www.primeintellect.ai/blog/rlm&quot;&gt;scaling to 10M+ tokens&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spaces are the external environment, and code is how agents interact
with them.&lt;/strong&gt; Give agents a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;space&lt;/code&gt; capability object in a sandboxed
executor. Convention descriptions include code recipes. Agents adapt
and execute them:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# Fan-out research with dynamic decomposition
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;findings&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;space&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;initial landscape analysis&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;segments&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;extract_segments&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;findings&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;content&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;segment&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;segments&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;coordinator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;spawn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Deep research on &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;segment&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;instructions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;research-guidelines&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;spaces&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;findings&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;append&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;tools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;web_search&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;web_fetch&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;New conventions don’t require new tools, only new code recipes. A
framework with fixed tools is limited by what the designers anticipated.
A framework with code and an environment is limited by what the agents
can express. That ceiling rises with every model generation.&lt;/p&gt;

&lt;h2 id=&quot;where-this-goes&quot;&gt;Where this goes&lt;/h2&gt;

&lt;p&gt;The architecture is designed but not yet implemented, though a few scenarios show what it enables:&lt;/p&gt;

&lt;p&gt;A morning briefing, assembled overnight. Cron-agent fires at 6am,
spawns research agents for your calendar, news, overnight emails. Each
writes to a shared briefing space. Synthesis agent compiles. When you
open the chat at 8am, it’s waiting.&lt;/p&gt;

&lt;p&gt;Or adaptive task decomposition. You ask for a competitive analysis.
The coordinator spawns one research agent, reads the initial findings,
realizes the landscape is broader than expected, spawns three more.
The task graph grew dynamically based on what the first agent found.
Next time, same request might need one researcher. The architecture
doesn’t care.&lt;/p&gt;

&lt;p&gt;Correctness is almost trivial here. The interesting part starts when
you let agents loose on it.&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;

&lt;ol&gt;
  &lt;li&gt;Sutton, R. (2019). &lt;a href=&quot;http://www.incompleteideas.net/IncsightBrief.pdf&quot;&gt;The Bitter Lesson&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Wang, X. et al. (2024). &lt;a href=&quot;https://arxiv.org/abs/2402.01030&quot;&gt;Executable Code Actions Elicit Better LLM Agents&lt;/a&gt;. &lt;em&gt;ICML 2024&lt;/em&gt;.&lt;/li&gt;
  &lt;li&gt;Zhang, T. et al. (2025). &lt;a href=&quot;https://arxiv.org/abs/2512.24601&quot;&gt;Recursive Language Models&lt;/a&gt;. MIT.&lt;/li&gt;
  &lt;li&gt;Prime Intellect. (2026). &lt;a href=&quot;https://www.primeintellect.ai/blog/rlm&quot;&gt;The Paradigm of 2026&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Hugging Face. (2025). &lt;a href=&quot;https://huggingface.co/blog/smolagents&quot;&gt;Introducing smolagents&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Erman, L.D. et al. (1980). The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty. &lt;em&gt;ACM Computing Surveys&lt;/em&gt;, 12(2).&lt;/li&gt;
&lt;/ol&gt;
</description>
        <pubDate>Mon, 16 Feb 2026 16:00:00 +0800</pubDate>
        <link>http://thirteen37.github.io/engineering/2026/02/16/agents-and-spaces.html</link>
        <guid isPermaLink="true">http://thirteen37.github.io/engineering/2026/02/16/agents-and-spaces.html</guid>
        
        <category>ai</category>
        
        <category>agents</category>
        
        <category>architecture</category>
        
        <category>llm</category>
        
        
        <category>engineering</category>
        
      </item>
    
  </channel>
</rss>
