Deep feature tour¶
The README tells the story; this page is the engineering-level companion for the skeptical reader. Below is the depth behind the headline subsystems — exact models, algorithms, crypto, schema versions, tool counts, and the deliberate fail-open / fail-closed asymmetries that make Cos more than a kanban with an LLM bolted on. Every headline number here is verified against source.
Triage & intelligence¶
On-device semantic search across the whole board, plus the natural-language surfaces an agent drives.
- On-device semantic embeddings via model2vec (
minishlab/potion-base-8M, 256-d, no torch). A distilled static token-embedding model:encode()is a token-lookup + pooling, not a transformer forward pass — warm load ~0.2 s, near-instant inference, ~30 MB one-time download, no GPU. Vectors are L2-normalized so the index's dot-product metric is exactly cosine. - ANN index via turbovec (Google TurboQuant, 4-bit SIMD quantization, Rust-backed).
turbovec.IdMapIndex(dim=256, bit_width=4)compresses each vector to 4 bits/dimension for SIMD top-K. To absorb quantization error, every query over-fetchestopn=max(50,k)candidates, then re-ranks with an exact scorer before slicing tok. A stable string-id ↔ uint64 map survives index rebuilds. - Hybrid scoring that beats fuzzy cosine when you mean it. On top of raw cosine: exact-id
+5.0, id-substring+3.0(soCASE-1still matches insideCASE-11), title/name+2.0, token-Jaccard×1.0, weak-substring+0.5— and awhy[]array reports which signals fired. Typing a case id or a client's name always jumps the obvious hit to #1. - Fail-OPEN by design — search never darkens the board. The deliberate inverse of the guard. If the sidecar is absent, cold, crashed, hijacked by a foreign process, or returns a garbage 200, the board falls through (both the
fetchand theres.json()parse live in onetrywith an 800 msAbortSignal.timeout) to an in-process keyword scan that re-applies the same filters and a JS mirror of the same hybrid scorer, still returns HTTP 200 with an identical envelope, and tags itselfengine:'keyword'. The board works with no sidecar and nouvinstalled. - Content-digest cache key, not
db.version. The in-memory index rebuilds exactly when content changes, keyed onblake2bof the sorted per-docid:hashlines — deliberately notdb.version(which can decrease or repeat after a migration or.bakrecovery and would serve stale vectors forever). Embedder change and archive-scope change fold into the same key. - Double fallback so it runs anywhere NumPy runs. No model → a deterministic
blake2bchar 3–5-gram hashing embedder (same 256-d unit-norm output, process-stable unlike Python's saltedhash()); no turbovec wheel → an exact NumPy brute-force index (matmul+argsort). The test suite parametrizes both backends. - Batch multi-query dedupe-before-create. An agent fires up to
MAX_QUERIES=32queries at once (the person, the topic, the deliverable); the card recurring across the most query angles is the existing matter — enforcing "one matter, one card." Records are rebuilt server-side from the in-hand DB, never the sidecar's projection, so a stale index can't poison results. Mandated in the agent tool descriptions ("ALWAYS SEARCH BEFORE create_case"). - One endpoint, four streams. Cases, tasks (
<caseId>::<tid>), messages, and reminders are searched together; every hit flags itsnature. Reminders have no archive so done/dismissed ones stay findable; the case-lane status filter exempts them (different status space). - Natural-language command palette (Cmd/Ctrl+K, three modes). JUMP (text → case), SPOTLIGHT (semantic search with keyword fallback), and COMMAND (a small grammar:
move <case> to <lane>,archive,add task … to …,complete …,create <work|life> case, with lane synonyms likewip → in_progress). An unmatched command throws aNoChangesentinel that aborts the write so a no-op never bumpsdb.versionor fires a spurious live refresh.
See also: Search reference.
Safety & privacy¶
A fail-closed injection gate, unspoofable trust derivation, and defense on the rendering side too.
- Fail-CLOSED prompt-injection guard — the inverse posture of search. Untrusted email/tool-output is screened before the triage agent loads it as context. When the classifier is unreachable, scans return a non-error verdict (
UNAVAILABLE — treat as UNTRUSTED) rather than a false all-clear — deliberately notisError, because an error invites a blind retry while an explicit verdict forces the safe branch. A 2xx with unparseable JSON counts as offline. The fail-closedapi()(4 s timeout) is not shared from the common toolkit precisely because the asymmetry is security-critical. - Default classifier: Meta
Llama-Prompt-Guard-2-86M(86 M-param mDeBERTa, 8 languages: en/fr/de/es/it/pl/pt/ru, gated under the Llama license, threshold 0.5). Measured separation: a benign French mail scores ~0.0008 vs EN/DE/FR injections at ~0.999. Presets also shipqualifire/prompt-injection-sentinel(ModernBERT-large, English-only) and aheuristic-onlymode; any raw HF id is a passthrough escape hatch. - Label-aware malicious-class resolution — no hardcoded index. A 3-stage resolver reads the model's own
id2label(keyword match → binary "the other label" fallback →LABEL_1last resort) so swapping classifiers needs no code change, and it never inverts. - Dependency-free heuristic fallback that self-reports as degraded. No torch/transformers/network: calibrated regex/keyword detectors (override prompts, DAN/developer-mode personas, system-prompt exfiltration, credential leakage, send-to-URL channels) with explicit score bands. The name
heuristic-fallbackis the sole signal the board/MCP read as DEGRADED — a silent heuristic when you wanted the model is treated as "not done." - Overlapping-window scanning — flag if ANY window is malicious. The real model windows on its own tokenizer (
MAX_TOKENS=512,WINDOW_OVERLAP=64) so a split-across-paragraphs injection lands wholly inside one window; the heuristic windows on blank-line paragraphs. The verdict takes the max malicious score across all windows, decomposed into named segments (subject, body#1…). - Auto-derived, unspoofable trust from genuine two-way correspondence. A correspondent becomes
trustedonly via (A) handshake — they wrote in and you replied; (B) direct 1:1 — soletoof a no-cc outbound; or (C) origination — recipient of a thread you started. A reply-all to someone else's thread never blanket-trusts the room. A message counts as outbound only whenoutbound===true(set solely from the Gmail SENT scan) and itsfromresolves to the principal — so a spoofedFrom: <you>can never mint trust. The agent never calls a trust verb; an unconfigured principal trusts nobody. - Display-name spoof rejection in the trust-key extractor. The one place an attacker-controlled header becomes a trust key actively refuses the classic deception: a header like
ceo@corp.com <attacker@evil.com>(display name itself contains@) is rejected outright (returns null), as are stray-bracket malformed forms. This deterministic header parsing is the substrate the whole "unspoofable trust" story rests on. - Sender whitelist is a second axis, never a bypass. Content scan ("is this text injecting?") and sender knowledge ("do we know this sender?") are orthogonal — a trusted sender's mail is still scanned, so a compromised/forwarding account can't smuggle an injection through. The trust push fails open on the write side only (a missed push leaves a sender at the more-cautious "unknown"); the content gate stays fail-closed.
- Three never-conflated scan outcomes. A real verdict, a DISABLED passthrough (master toggle off — a deliberate user choice, non-error, no quarantine record), and an UNREACHABLE fail-closed verdict are kept rigorously distinct so "user chose OFF" is never confused with "gate silently failed."
- Master ON/OFF toggle with a board-enforced deps gate. One switch in
/security; a fresh machine ships OFF. The sidecar always permits the flip (a deps-short model just scans degraded, never a false all-clear); the board disables turning ON until a network-free probe ({torch, transformers, modelCached, hfToken, ready}) reports ready, surfacing a copy-paste setup command. Atomic read-modify-write config store so neither key clobbers the other. - Quarantine: content-hashed dedup, release ≠ dismiss, self-draining TTL. Quarantine ids are
Q-+blake2b(from\nsubject\nbody)so a retried injection upserts a counter instead of flooding the queue (bodies capped at 16 000 chars). Clicking Release re-admits the exact Gmail thread to triage and trusts the sender without re-scanning (which would loop forever); Dismiss is inert. Released records auto-purge on a TTL (default 7 days, lock-free fast path, lexicographic UTC-timestamp compare, settable live) so the replay queue can't grow unbounded — with no scheduler. - Untrusted-markdown safe renderer — the rendering-side complement to the guard. All agent/email-derived prose renders through a hardened component (react-markdown + remark-gfm) that treats content as untrusted: no
rehype-raw(inline<script>/<img onerror>render as inert text), default scheme-stripping ofjavascript:/data:, GFM task-list checkboxes are read-only, and images render as links, never inline<img>so the board never auto-fetches a remote tracking pixel. Links openrel="noreferrer noopener".
See also: Guard.
Data & integrity¶
A zero-dependency JSON store with serious correctness properties — no database to run.
- Zero-dependency JSON store with crash-safe atomic writes. The entire board (cases, messages, events, reminders, priorities, labels) lives in one
cases.json, written via POSIXtmp + rename(a reader sees the complete old or new file, never a partial). Each write copies to a.bak;readDB()migrates+validates on read and transparently falls back to.bakon any parse/validation failure. Pure Nodefs/promises— no deps. - Promise-chain write mutex — no lost updates, no colliding ids. A single module-level promise chain funnels every read-modify-write through
mutate(), serializing concurrent agent + UI writes; an aborted body never reacheswriteDB(transactional). Proven by a regression test that fires 25 parallelcreate_case+ 25add_taskand asserts exact count growth and unique ids. OptionalexpectedVersiongives optimistic concurrency (409on conflict). - Additive schema versioning with migrate-on-read (v8).
migrate()upgrades any older file with no migration script: v4 added events, v5 reminders, v6 reminder labels/tasks +message.reminderId, v7 priorities +case.starred, v8message.url— each an optional that rides through verbatim, so every prior version still reads with no transform. An independent zero-dep linter (board-lint.mjs) re-asserts 20+ structural invariants (id shapes, enum validity, bidirectional message↔case linkage, the full hierarchy contract) as a second source of truth. - Initiative > Workstream > Case hierarchy as a flat tree, one invariant checker. All three tiers are
CaseRecords in one array and one id space (tier =kind, place =parentId);hierarchyViolation()is the single source of truth (no cycles, depth ≤ 3, tier rules) that the store throws on and the lint independently re-derives. Containers roll up progress (done/total leaves, summed tasks, distinct rolled-up message counts) over their non-archived descendants. - Five kanban lanes from one enum.
urgent · todo · in_progress · waiting_for_input · done—VALID_CASE_STATUSis the runtime guard, mirrored byte-for-byte in the MCP, lint, and Tailwind. Lane (workflow state) is deliberately distinct from Priority P0–P3 (importance). - Soft-delete Trash with lazy retention auto-purge. One soft Delete verb (sets
archivedAt); there is no hard-delete HTTP path (the old one orphaned emails and caused re-triage duplicates).sweepExpiredTrash()runs insidemutate()on every write and purges trash older than the retention window (default 30 days;≤0disables) viacleanCases, the sole permanent-removal primitive — which keeps any email still referenced by a surviving case or reminder. Reminders get a parallel two-stage lifecycle (soft-delete done/dismissed idle >7 d, then purge >30 d). - Board-local generational backup ring (distinct from the off-site pipeline). Every write also drops
data/backups/cases-<ISO>.jsonand prunes with a 3-tier time-based policy: keep all <36 h, newest-per-calendar-day for 30 days, floor of 50 snapshots. This deliberately replaced a count cap whose fatal flaw let a write-burst purge all real history in seconds. - Per-case append-only audit trail (human vs agent vs system). Every write is attributed (
resolveActor()defaults tohuman, upgrades toagentonly on an explicit header/flag, and never trusts a body claimingsystem), capped at the last 50 entries, withdescribeCaseChange()rendering human-readable diffs (todo→done,restored). This is the load-bearing guardrail behind the agent's "never undo the user's manual board edits" contract. - Catalog-backed label taxonomy with a reject-unknown contract. Structured, color-coded labels drawn from 20+ installable role packs (Manager, Founder, PM, Sales, IT, Legal…) over a fixed 12-color palette — distinct from freeform tags. Each
LabelDefcarries a first-classdescription(the field agents read to decide when a label applies); an unknown label id is rejected inside the lock, forcing agents tolist_labelsfirst.
Surfaces¶
The writable board plus its time/focus and knowledge surfaces — every one of them live.
- Live SSE board feed (
fs.watch+ 150 ms debounce).GET /api/streamwatchescases.jsonand pushes onechangeevent per write to every open client — the single producer behind every "optimistic + live refetch" behavior. It does a cheap raw parse fordb.versiononly (returns-1on a transient mid-write read so clients skip rather than flap), emits an initialhello{version}and a 25 s heartbeat, and setsx-accel-buffering:noto defeat proxy buffering.prefs.jsonis deliberately split out of the versioned store so a sort toggle never bumps the version, never fires this event, and never churns the backup ring. - Optimistic board with undo stack and race-safe revert. Every drag, lane move, archive, star, and bulk edit applies instantly; on failure it re-pulls authoritative state via
refetch()rather than restoring a stale render snapshot (which could resurrect a card another writer removed). Cmd/Ctrl+Z and a toast Undo replay an inverse op. Intra-lane reorder uses a fractional-position bisect (one write for the moved card) with a rebase seeding path and explicit non-finite-position guards. - Agent-native authoring — no manual "new case" composer. The
+New Casebutton,New▾menu, and per-lane quick-add were all removed on purpose socases.jsondoesn't bloat with blank spawns. Cases arrive only from the agent, inbox triage, or the command palette: the human curates and approves, the agent authors. propose → approve/reject → commitpipeline. An agent stages a mutation (propose(verb, target?, payload, summary)→P-<n>indb.pending[]); the propose-time route validates the verb against the exactCOMMITTABLE_VERBSset so a bad verb fails loudly up front. On approve,commitVerb()re-validates the payload and runs the same store helpers the normal routes use — decision + commit + status-flip in onemutate()critical section, so a throw aborts everything and the proposal stays pending. Re-deciding a settled proposal →409.- Inbox triage as one pure selector.
selectInboxMessagescomposes read-state and from/to/cc substring filters with two mutually-exclusive ordering modes (semantic-relevance vs date), all precedence in one unit-testable function. The non-obvious bit: the currently-open message is exempt from the read filter only, so auto-marking it read under an "unread" view doesn't make it vanish under the cursor. - Unified color-coded activity/audit feed. One reverse-chronological stream flattening case audit rows + synthesized reminder/event lifecycle rows, with a 31-verb → 11-category Tailwind color map. Clicking a row opens that subject's detail drawer in place over the feed (no navigation), with focus restored on close. A fixed SSR
nowis threaded through every relative-time call to prevent hydration drift. - Entity-360 pre-call brief pages. Resolving a person/company/topic name assembles one SSR brief in parallel: their vault knowledge page (over the same
/api/vaultthe agent uses), every non-archived case whosevaultLinksinclude them, and a flattened relationship timeline (top 15). 404-safe ("Nothing on file"). Turns per-casevaultLinksinto a reverse index with no separate datastore. - Safe Gmail deep-linking back to source. Two import-free leaf modules gate untrusted link strings:
normalizeMessageUrlaccepts only absolutehttp(s)(dropsjavascript:/data:/relative), andmessageDeepLinkprefers the structuredurlbut otherwise mines the body with a Gmail-thread-anchored regex so a marketing footer URL is never mistaken for the message's own link. - Calendar, reminders, and priorities — three time/focus surfaces, single-field linking. An appointment, nudge, or starred favorite ties to work via exactly one field (
event.caseId/reminder.caseId/case.starred); the reverse side is always derived by filtering, never a parallel array that can drift.caseIdis validated against a real record inside the lock (closing the read-then-write TOCTOU). The calendar is its own stdio MCP (port 8003, 6 tools); reminder (8) and priority (5) verbs ride the existing board MCP with no new port.get_prioritieslets the agent read the user's starred nodes + free-text notes so it triages toward what the user actually cares about. UTC-anchored due-buckets (Overdue/Today/Soon/Later) read correctly west of UTC. needsAttentionself-auditing triage tray. A four-bucket projection over live cases — overdue, aging-waiting (idle >3 d), untriaged (todo, no tasks, no priority), unlinked (not done, no knowledge attached) — plus a separate 5-daywaiting_for_inputSLA breach rendered as a "Waiting Nd" chip. The board's "what have I neglected" layer.- Board templates engine.
apply_template/list_templatesinstantiate a pre-shaped case (status, priority, tags, seed-task checklist — e.g. a 6-step "Plugin onboarding checklist") through the same locked create path the agent uses, with every override validated against the enums and seed tasks store-minted.
See also: Board, Calendar, Reminders, Activity, Priorities.
Knowledge vault¶
A local-first, private LLM-Wiki — the only MCP that actually runs an LLM, and the one with the cleverest sandboxing.
- LLM-Wiki re-synthesis engine (embedded Agent SDK). Unlike the thin fetch-wrapper MCPs, the vault server embeds
@anthropic-ai/claude-agent-sdkand runs a full headless Claude Codequery()session per tool call (claude-sonnet-4-6), cwd-scoped to the vault root. Implementing Karpathy's LLM-Wiki, every ingest re-synthesizes (rewrites, never appends) the 10–15 pages a substantive source touches, so back-links and consistency compound instead of accreting an append-only log. Two tools only:ingest+query. - Quadruple anti-recursion firewall. Because the vault is itself bridged in the repo's
.mcp.json, a naive inner session would re-mount this server and recurse forever. Four independent safeguards prevent it:mcpServers:{}+strictMcpConfig:true(inner agent mounts no MCP, can't read any.mcp.json),disallowedToolshard-denying the vault tools,settingSources:['project']pinned explicitly, andcwd=COS_VAULT_DIR. Plus an in-process FIFO semaphore (COS_VAULT_CONCURRENCY=2) caps concurrent subprocesses. - Knowledge-only editorial contract. The vault keeps timeless who/what/why and refuses any to-do, status, deadline, or checkbox (those go to the board). Enforced top to bottom: a "knowledge librarian" ingest persona, a query skill that declines pure open-work questions with a board pointer, and a lint that hunts stray
- [ ]checkboxes and cross-domain leaks — flagging, never auto-restructuring. - Domain-split wiki with a strong thematic index. Hard-partitioned
work/,life/,shared/trees that never bleed (only truly-dual entities live once inshared/). Each domain is navigated through an index whose top-level sections are themes grouping concepts/entities/sources — a retrieval design, not a flat alphabetical list; every page sits under exactly one theme. - Bidirectional vault↔board cross-linking + entity resolution. A case's
vaultLinksnames the pages it draws on; each page carries the inversecases:frontmatter (the two arrays are lint-checked inverses), recorded by reference only — the ingest agent has no board tool. A globalaliases.mdcollapses one person's email, spoken name, and board name onto a single canonical wiki entity. - Local-first & private: gitignored, off-site-encrypted durability.
.gitignorepatterns auto-ignore every realvault/<name>while keeping the synthetic template tracked, so PII can never be committed by accident; setup verifiesgit check-ignore. Durability is the encrypted off-site backup, explicitly not git. The one MCP making outbound Anthropic calls sources itsANTHROPIC_API_KEYfrom a gitignored file via a launch wrapper, so the secret never enters the installed plist. - Arbitrary-file-read guard.
ingestcan attach on-device PDFs/images, butvalidateFiles()runs before the agent is invoked and accepts a path only if inside the vault or an explicit allowlist — tests assert/etc/passwdis refused with no LLM call.
See also: Spec.
Voice intake¶
Your real on-device voice notes, ingested exactly-once.
- Reads the real OpenWhispr SQLite store read-only via the
sqlite3CLI. Instead of a native npm binding, it shells out to the macOS-bundledsqlite3 -readonly -jsonagainst~/Library/Application Support/open-whispr/transcriptions.db— no native build, and WAL mode means safe concurrent reads while the app runs. Maps each row to itsOpenWhispr-<ts>-<id>.webm, and reconciles both directions (a row whose audio vanished →audio_missing; a.webmwith no row → syntheticorphan:<file>) so no recording is silently lost. - Server-owned watermark for idempotent ingestion. OpenWhispr has no native "mark read," so the server invents one: a single
{id, created}JSON marker written via temp-file + atomic rename.list_transcriptsreturns only notes after the watermark;mark_processed(called last, after the vault/board write lands) advances it — making the loop at-least-once and naturally idempotent. It stores the note's truecreated, never wall-clock now (which would push the watermark past older unprocessed notes and hide them forever). - Numeric-aware ordering avoids a subtle data-loss bug. Real ids are integer PKs; a naive lexical compare mis-orders
'10' < '9', which on a same-second tie could hide a strictly-newer note.compareIdsdetects integer ids and compares numerically — encoded as a regression test (mark note 9, assert note 10 stays unprocessed). - One spoken thought → cross-linked vault page and board case. The voice recipe routes each transcript through the ingest router: knowledge re-synthesizes wiki pages, an action creates/updates a board card with tasks and the right lane, and most notes produce both, cross-linked. A spoken name resolves through the vault alias map. Four-tier source resolution (
OPENWHISPR_FIXTURES> real SQLite > legacy CLI > clean error) makes the whole path runnable fully offline against fixtures.
Agent-native architecture¶
One mutation path, two faces — and an MCP fleet that exposes the whole system to Claude.
- 5 core stdio MCP servers, 62 tools total. Verified counts: board 44 (v3.3.0), calendar 6, guard 6, openwhispr 4, vault 2. Every server is an
@modelcontextprotocol/sdkServeroverStdioServerTransport. The board is the giant — cases + 3-tier hierarchy, tasks, notes, messages, 8 reminder verbs, 5 priority verbs, label-bundle config, templates, and the propose/approve/reject queue — dispatched by a 44-case switch that exactly mirrors the 44-entry tool array. - Optional WhatsApp add-on (13 tools). The external
whatsapp-mcp(a Go whatsmeow bridge sidecar + a Python stdio MCP) exposes 13 tools — read contacts/chats/messages, fetch context and media, andsend_message/send_file/send_audio_message— and is wired into both Claude clients the same launchd + supergateway way as the core five. It pairs to your phone's linked devices (QR), so it reads and sends as you — no second phone number, no bot persona."whatsapp"is a first-classMESSAGE_SOURCEon the board, so WhatsApp threads become cases exactly like Gmail. - Agent-native parity — every human gesture is the visual twin of a board verb. The browser writes via one
board-client.tsmapping 1:1 to HTTP routes; the agent's twin path (the MCP) hits the same routes over fetch. There is no human-only or agent-only way to change the board, and every write funnels through the one atomic, version-guardedmutate()path. The single deliberate exception (destructive "Clean Done" mass-purge) is kept off the agent surface on purpose. - Shared
mcp-kittoolkit, relative-imported for launchd robustness.err/text/str/baseUrl/start/makeBoardApideduped out of four near-identical preambles — imported by relative path (not a bare@cos/mcp-kitspecifier) precisely because launchd invokes each server asnode /abs/path/server.mjswith no workspace-install dependence. The kit imports nothing from the SDK, so each server's bare specifier resolves against its own hoisted copy.makeBoardApiattributes every write two ways (header and body flag). - npm workspace with one hoisted SDK. The 4 JS servers +
mcp-kitshare one rootnode_modulesand lockfile (zero@modelcontextprotocolcopies under any server). The Next.js board is deliberately excluded — adding it gutsboard/node_modulesand 500s the dev server. - Dual-client bridge: launchd + supergateway HTTP for Claude Code, direct stdio for Cowork. The core five are fronted by
supergateway --outputTransport streamableHttp --corsunder per-server launchd agents on ports 8001–8005 (Claude Code over.mcp.jsontype:http), while Cowork spawns them directly ascommandentries (it rejects HTTPurlentries).ensure-bridges.sh(chained into the board's dev/start) bootstraps and kickstarts the bridges and the twouvsidecars, probing/healthzleniently and never tearing bridges down. - Lightweight live-badge endpoints.
GET /api/unread-countreturns just{unread, version}so the sidebar's Inbox badge stays current off the SSE stream without refetching every message. Linking a message auto-derives a 90-char preview from the body when none is given.
Operations & setup¶
Install and run as invocable skills, with all machine config resolved from one loader.
- Git-anchored, zero-hardcode config loader. Every setup/skill step sources
config/load-config.sh(POSIX sh):REPO_ROOTis derived fromgit rev-parse --show-toplevel(never stored — "you can't read a file inside the repo to discover where the repo is"), safe defaults are seeded beforecos.envexists, derived URLs are computed after the override so a changed port propagates everywhere, and it deliberately does not sourcesecrets.env. - Four-file config split by concern and sensitivity.
cos.env(machine paths/ports, gitignored) ·secrets.env(the singleANTHROPIC_API_KEY, loaded only by the one process that needs it) ·settings.json(board prefs incl.principalEmail) ·auto-sync.json(the only committed one — a behavior default, not a machine value). Memoized resolvers add a security-relevant fail-safe: an unconfigured principal makes all trust derivation a safe no-op, so a fresh board auto-trusts nobody. - Setup-as-executable-skills, dependency-ordered. The whole runbook is invocable Claude skills with frontmatter triggers, so "set up Guard on a new machine" or "restore after data loss" is an action, not a wiki page.
cos-setupsequencessetup-vault → guard-setup → mcp-bridge-setup → backup-recovery(each producing what the next needs), with a copy-pasteable CHECKPOINT that must pass before advancing. The skills encode hard-won operational knowledge (launchd can't expand$VARSor see an nvm shim → plists pin literal paths; pm2 6.x fails to fork → launchd is the supervisor). - AES-256-GCM encrypted off-site backups, immutable git history. Zero-dependency
node:crypto, self-describing on-disk frameCOSBAK1: MAGIC(8)|salt(16)|iv(12)|authTag(16)|ciphertext, 32-byte key viascryptSync(N=2¹⁵, r=8, p=1) with a fresh random salt+IV every run. Snapshots push to a private GitHub repo (one.encper run, never overwritten — git history is the versioned off-site record); the recovery key lives only in the macOS login Keychain. A failedgit pushdegrades to "committed locally, re-push later" rather than losing the snapshot. - The BOTH model: a guaranteed floor + opportunistic top-ups. Three callers of one
backup.mjs— a launchd agent at 03:30 (the floor, runs in the gui domain so it can read the login Keychain), a manual "Back up now" (?force=1), and a fire-and-forget top-up from hot read routes when the newest snapshot is >12 h old. All serialized by an exclusive.backup.lock(atomicopenSync('wx'), reclaimed if >120 s stale). Exit codes are a protocol:0=pushed,2=committed-locally-only (still success),3=benign lock-skip, other=hard fail. - Fail-closed identity guards on both sides.
assertDefaultRepoOrRefuse()runs first and refuses to snapshot+push unless pointed at the genuinely-configured repo (so a/tmpsandbox test can't trigger a real backup); the board'sisLiveBoard()is a positive-identity check (data dir insideREPO_ROOT/board/, correct repo, on-disk anchor) gating board-spawned backups. The read path is fail-safe (never throws); the write path is fail-closed. - Safe-by-default reversible restore. Restores are dry-run unless
--apply, and verify three ways before any write (GCM auth tag → sha256 vsMANIFEST.json→ every restored*.jsonparses).--applyfirst snapshots current state to~/cos-recovery/pre-restore-<ts>/(so even a restore is undoable), then prints the exactlaunchctl kickstartcommands. A read-only/backupspage merges 5+ independent sources into one fail-safe health envelope that never throws.
Quality¶
The depth is enforced, not asserted — by hermetic tests, an independent linter, and a generated docs site.
- 415 headless unit tests, all passing — no build step. 18
*.test.tsfiles run directly over the board's TypeScript lib modules undernode --test --experimental-strip-types(Node ≥22 strips types natively; a 6-line resolve hook retries extensionless specifiers). Determinism comes from injecting a fixednow; tests touch only in-memory fixtures. - Hermetic, net-zero, self-skipping integration harness.
tests/run.shmktemps a sandbox, rsync-copies the board, seedsCOS_DATA_DIRfrom a fixture, dead-ends the sidecar URLs to a closed port, bootsnext devon 3999, and runs 14 numbered live-board steps (15 scripts:api-*,board-lint,concurrency,guard-quarantine-release) — each independently gated (unit below Node 22 skip, python withoutuvskip, api steps never fall back to the live board). Every live test snapshots+restorescases.jsonin afinally, so a run is provably net-zero on every store. - Independent zero-dep invariant linter + dual-backend python suites.
board-lint.mjsre-asserts 20+ structural invariant groups as a second source of truth;search/test_search.py(26 functions) exercises both ANN backends (turbovec + numpy brute-force);guard/test_guard.py(65 functions) runs fully offline via the heuristic classifier. - Fail-open / fail-closed properties tested as invariants. Search tests assert
GET/POSTare always 2xx and still find the seeded marker whether the sidecar is up or down (CI default: down) — proving the board searches with no sidecar and nouv. Conversely, guard tests drive the sidecar to prove release upserts trust while dismiss is inert, and that the released queue drains once replayed. - Golden fixtures asserted on structure, not prose. Because the LLM loop is non-deterministic, fixtures never diff wording — each
*.expected.jsonlists invariants from a fixed six-verb vocabulary (page-exists,entity-links-case,no-duplicate-thread…) mapping 1:1 to the routing contract, with an explicit convergence criterion as the measurable "good enough" bar. - Generated-docs pipeline (single source of truth). A two-stage generator ingests a design workflow's JSON, validates colors/ids, splices the cleaned
LABEL_BUNDLESinto TypeScript, then re-parses that to emitdocs/reference/labels.md— stamped "do not hand-edit," wired into the MkDocs nav, and validated by the strict build, closing the design→code→docs loop. - MkDocs Material site, strict-built and CI-published.
mkdocs build --strict --verbosefails CI on any broken cross-link, deployed to GitHub Pages with least-privilege token scopes. CLAUDE.md elevates this to policy ("docs in MkDocs, not loose root.md"). Ships the full governance kit — MITLICENSE, a substantiveSECURITY.mddocumenting the designed security properties, andCONTRIBUTING.md.