Skip to content

Deep feature tour

The README tells the story; this page is the engineering-level companion for the skeptical reader. Below is the depth behind the headline subsystems — exact models, algorithms, crypto, schema versions, tool counts, and the deliberate fail-open / fail-closed asymmetries that make Cos more than a kanban with an LLM bolted on. Every headline number here is verified against source.

Triage & intelligence

On-device semantic search across the whole board, plus the natural-language surfaces an agent drives.

  • On-device semantic embeddings via model2vec (minishlab/potion-base-8M, 256-d, no torch). A distilled static token-embedding model: encode() is a token-lookup + pooling, not a transformer forward pass — warm load ~0.2 s, near-instant inference, ~30 MB one-time download, no GPU. Vectors are L2-normalized so the index's dot-product metric is exactly cosine.
  • ANN index via turbovec (Google TurboQuant, 4-bit SIMD quantization, Rust-backed). turbovec.IdMapIndex(dim=256, bit_width=4) compresses each vector to 4 bits/dimension for SIMD top-K. To absorb quantization error, every query over-fetches topn=max(50,k) candidates, then re-ranks with an exact scorer before slicing to k. A stable string-id ↔ uint64 map survives index rebuilds.
  • Hybrid scoring that beats fuzzy cosine when you mean it. On top of raw cosine: exact-id +5.0, id-substring +3.0 (so CASE-1 still matches inside CASE-11), title/name +2.0, token-Jaccard ×1.0, weak-substring +0.5 — and a why[] array reports which signals fired. Typing a case id or a client's name always jumps the obvious hit to #1.
  • Fail-OPEN by design — search never darkens the board. The deliberate inverse of the guard. If the sidecar is absent, cold, crashed, hijacked by a foreign process, or returns a garbage 200, the board falls through (both the fetch and the res.json() parse live in one try with an 800 ms AbortSignal.timeout) to an in-process keyword scan that re-applies the same filters and a JS mirror of the same hybrid scorer, still returns HTTP 200 with an identical envelope, and tags itself engine:'keyword'. The board works with no sidecar and no uv installed.
  • Content-digest cache key, not db.version. The in-memory index rebuilds exactly when content changes, keyed on blake2b of the sorted per-doc id:hash lines — deliberately not db.version (which can decrease or repeat after a migration or .bak recovery and would serve stale vectors forever). Embedder change and archive-scope change fold into the same key.
  • Double fallback so it runs anywhere NumPy runs. No model → a deterministic blake2b char 3–5-gram hashing embedder (same 256-d unit-norm output, process-stable unlike Python's salted hash()); no turbovec wheel → an exact NumPy brute-force index (matmul + argsort). The test suite parametrizes both backends.
  • Batch multi-query dedupe-before-create. An agent fires up to MAX_QUERIES=32 queries at once (the person, the topic, the deliverable); the card recurring across the most query angles is the existing matter — enforcing "one matter, one card." Records are rebuilt server-side from the in-hand DB, never the sidecar's projection, so a stale index can't poison results. Mandated in the agent tool descriptions ("ALWAYS SEARCH BEFORE create_case").
  • One endpoint, four streams. Cases, tasks (<caseId>::<tid>), messages, and reminders are searched together; every hit flags its nature. Reminders have no archive so done/dismissed ones stay findable; the case-lane status filter exempts them (different status space).
  • Natural-language command palette (Cmd/Ctrl+K, three modes). JUMP (text → case), SPOTLIGHT (semantic search with keyword fallback), and COMMAND (a small grammar: move <case> to <lane>, archive, add task … to …, complete …, create <work|life> case, with lane synonyms like wip → in_progress). An unmatched command throws a NoChange sentinel that aborts the write so a no-op never bumps db.version or fires a spurious live refresh.

See also: Search reference.

Safety & privacy

A fail-closed injection gate, unspoofable trust derivation, and defense on the rendering side too.

  • Fail-CLOSED prompt-injection guard — the inverse posture of search. Untrusted email/tool-output is screened before the triage agent loads it as context. When the classifier is unreachable, scans return a non-error verdict (UNAVAILABLE — treat as UNTRUSTED) rather than a false all-clear — deliberately not isError, because an error invites a blind retry while an explicit verdict forces the safe branch. A 2xx with unparseable JSON counts as offline. The fail-closed api() (4 s timeout) is not shared from the common toolkit precisely because the asymmetry is security-critical.
  • Default classifier: Meta Llama-Prompt-Guard-2-86M (86 M-param mDeBERTa, 8 languages: en/fr/de/es/it/pl/pt/ru, gated under the Llama license, threshold 0.5). Measured separation: a benign French mail scores ~0.0008 vs EN/DE/FR injections at ~0.999. Presets also ship qualifire/prompt-injection-sentinel (ModernBERT-large, English-only) and a heuristic-only mode; any raw HF id is a passthrough escape hatch.
  • Label-aware malicious-class resolution — no hardcoded index. A 3-stage resolver reads the model's own id2label (keyword match → binary "the other label" fallback → LABEL_1 last resort) so swapping classifiers needs no code change, and it never inverts.
  • Dependency-free heuristic fallback that self-reports as degraded. No torch/transformers/network: calibrated regex/keyword detectors (override prompts, DAN/developer-mode personas, system-prompt exfiltration, credential leakage, send-to-URL channels) with explicit score bands. The name heuristic-fallback is the sole signal the board/MCP read as DEGRADED — a silent heuristic when you wanted the model is treated as "not done."
  • Overlapping-window scanning — flag if ANY window is malicious. The real model windows on its own tokenizer (MAX_TOKENS=512, WINDOW_OVERLAP=64) so a split-across-paragraphs injection lands wholly inside one window; the heuristic windows on blank-line paragraphs. The verdict takes the max malicious score across all windows, decomposed into named segments (subject, body#1…).
  • Auto-derived, unspoofable trust from genuine two-way correspondence. A correspondent becomes trusted only via (A) handshake — they wrote in and you replied; (B) direct 1:1 — sole to of a no-cc outbound; or (C) origination — recipient of a thread you started. A reply-all to someone else's thread never blanket-trusts the room. A message counts as outbound only when outbound===true (set solely from the Gmail SENT scan) and its from resolves to the principal — so a spoofed From: <you> can never mint trust. The agent never calls a trust verb; an unconfigured principal trusts nobody.
  • Display-name spoof rejection in the trust-key extractor. The one place an attacker-controlled header becomes a trust key actively refuses the classic deception: a header like ceo@corp.com <attacker@evil.com> (display name itself contains @) is rejected outright (returns null), as are stray-bracket malformed forms. This deterministic header parsing is the substrate the whole "unspoofable trust" story rests on.
  • Sender whitelist is a second axis, never a bypass. Content scan ("is this text injecting?") and sender knowledge ("do we know this sender?") are orthogonal — a trusted sender's mail is still scanned, so a compromised/forwarding account can't smuggle an injection through. The trust push fails open on the write side only (a missed push leaves a sender at the more-cautious "unknown"); the content gate stays fail-closed.
  • Three never-conflated scan outcomes. A real verdict, a DISABLED passthrough (master toggle off — a deliberate user choice, non-error, no quarantine record), and an UNREACHABLE fail-closed verdict are kept rigorously distinct so "user chose OFF" is never confused with "gate silently failed."
  • Master ON/OFF toggle with a board-enforced deps gate. One switch in /security; a fresh machine ships OFF. The sidecar always permits the flip (a deps-short model just scans degraded, never a false all-clear); the board disables turning ON until a network-free probe ({torch, transformers, modelCached, hfToken, ready}) reports ready, surfacing a copy-paste setup command. Atomic read-modify-write config store so neither key clobbers the other.
  • Quarantine: content-hashed dedup, release ≠ dismiss, self-draining TTL. Quarantine ids are Q- + blake2b(from\nsubject\nbody) so a retried injection upserts a counter instead of flooding the queue (bodies capped at 16 000 chars). Clicking Release re-admits the exact Gmail thread to triage and trusts the sender without re-scanning (which would loop forever); Dismiss is inert. Released records auto-purge on a TTL (default 7 days, lock-free fast path, lexicographic UTC-timestamp compare, settable live) so the replay queue can't grow unbounded — with no scheduler.
  • Untrusted-markdown safe renderer — the rendering-side complement to the guard. All agent/email-derived prose renders through a hardened component (react-markdown + remark-gfm) that treats content as untrusted: no rehype-raw (inline <script>/<img onerror> render as inert text), default scheme-stripping of javascript:/data:, GFM task-list checkboxes are read-only, and images render as links, never inline <img> so the board never auto-fetches a remote tracking pixel. Links open rel="noreferrer noopener".

See also: Guard.

Data & integrity

A zero-dependency JSON store with serious correctness properties — no database to run.

  • Zero-dependency JSON store with crash-safe atomic writes. The entire board (cases, messages, events, reminders, priorities, labels) lives in one cases.json, written via POSIX tmp + rename (a reader sees the complete old or new file, never a partial). Each write copies to a .bak; readDB() migrates+validates on read and transparently falls back to .bak on any parse/validation failure. Pure Node fs/promises — no deps.
  • Promise-chain write mutex — no lost updates, no colliding ids. A single module-level promise chain funnels every read-modify-write through mutate(), serializing concurrent agent + UI writes; an aborted body never reaches writeDB (transactional). Proven by a regression test that fires 25 parallel create_case + 25 add_task and asserts exact count growth and unique ids. Optional expectedVersion gives optimistic concurrency (409 on conflict).
  • Additive schema versioning with migrate-on-read (v8). migrate() upgrades any older file with no migration script: v4 added events, v5 reminders, v6 reminder labels/tasks + message.reminderId, v7 priorities + case.starred, v8 message.url — each an optional that rides through verbatim, so every prior version still reads with no transform. An independent zero-dep linter (board-lint.mjs) re-asserts 20+ structural invariants (id shapes, enum validity, bidirectional message↔case linkage, the full hierarchy contract) as a second source of truth.
  • Initiative > Workstream > Case hierarchy as a flat tree, one invariant checker. All three tiers are CaseRecords in one array and one id space (tier = kind, place = parentId); hierarchyViolation() is the single source of truth (no cycles, depth ≤ 3, tier rules) that the store throws on and the lint independently re-derives. Containers roll up progress (done/total leaves, summed tasks, distinct rolled-up message counts) over their non-archived descendants.
  • Five kanban lanes from one enum. urgent · todo · in_progress · waiting_for_input · doneVALID_CASE_STATUS is the runtime guard, mirrored byte-for-byte in the MCP, lint, and Tailwind. Lane (workflow state) is deliberately distinct from Priority P0–P3 (importance).
  • Soft-delete Trash with lazy retention auto-purge. One soft Delete verb (sets archivedAt); there is no hard-delete HTTP path (the old one orphaned emails and caused re-triage duplicates). sweepExpiredTrash() runs inside mutate() on every write and purges trash older than the retention window (default 30 days; ≤0 disables) via cleanCases, the sole permanent-removal primitive — which keeps any email still referenced by a surviving case or reminder. Reminders get a parallel two-stage lifecycle (soft-delete done/dismissed idle >7 d, then purge >30 d).
  • Board-local generational backup ring (distinct from the off-site pipeline). Every write also drops data/backups/cases-<ISO>.json and prunes with a 3-tier time-based policy: keep all <36 h, newest-per-calendar-day for 30 days, floor of 50 snapshots. This deliberately replaced a count cap whose fatal flaw let a write-burst purge all real history in seconds.
  • Per-case append-only audit trail (human vs agent vs system). Every write is attributed (resolveActor() defaults to human, upgrades to agent only on an explicit header/flag, and never trusts a body claiming system), capped at the last 50 entries, with describeCaseChange() rendering human-readable diffs (todo→done, restored). This is the load-bearing guardrail behind the agent's "never undo the user's manual board edits" contract.
  • Catalog-backed label taxonomy with a reject-unknown contract. Structured, color-coded labels drawn from 20+ installable role packs (Manager, Founder, PM, Sales, IT, Legal…) over a fixed 12-color palette — distinct from freeform tags. Each LabelDef carries a first-class description (the field agents read to decide when a label applies); an unknown label id is rejected inside the lock, forcing agents to list_labels first.

See also: Migration, Labels.

Surfaces

The writable board plus its time/focus and knowledge surfaces — every one of them live.

  • Live SSE board feed (fs.watch + 150 ms debounce). GET /api/stream watches cases.json and pushes one change event per write to every open client — the single producer behind every "optimistic + live refetch" behavior. It does a cheap raw parse for db.version only (returns -1 on a transient mid-write read so clients skip rather than flap), emits an initial hello{version} and a 25 s heartbeat, and sets x-accel-buffering:no to defeat proxy buffering. prefs.json is deliberately split out of the versioned store so a sort toggle never bumps the version, never fires this event, and never churns the backup ring.
  • Optimistic board with undo stack and race-safe revert. Every drag, lane move, archive, star, and bulk edit applies instantly; on failure it re-pulls authoritative state via refetch() rather than restoring a stale render snapshot (which could resurrect a card another writer removed). Cmd/Ctrl+Z and a toast Undo replay an inverse op. Intra-lane reorder uses a fractional-position bisect (one write for the moved card) with a rebase seeding path and explicit non-finite-position guards.
  • Agent-native authoring — no manual "new case" composer. The +New Case button, New▾ menu, and per-lane quick-add were all removed on purpose so cases.json doesn't bloat with blank spawns. Cases arrive only from the agent, inbox triage, or the command palette: the human curates and approves, the agent authors.
  • propose → approve/reject → commit pipeline. An agent stages a mutation (propose(verb, target?, payload, summary)P-<n> in db.pending[]); the propose-time route validates the verb against the exact COMMITTABLE_VERBS set so a bad verb fails loudly up front. On approve, commitVerb() re-validates the payload and runs the same store helpers the normal routes use — decision + commit + status-flip in one mutate() critical section, so a throw aborts everything and the proposal stays pending. Re-deciding a settled proposal → 409.
  • Inbox triage as one pure selector. selectInboxMessages composes read-state and from/to/cc substring filters with two mutually-exclusive ordering modes (semantic-relevance vs date), all precedence in one unit-testable function. The non-obvious bit: the currently-open message is exempt from the read filter only, so auto-marking it read under an "unread" view doesn't make it vanish under the cursor.
  • Unified color-coded activity/audit feed. One reverse-chronological stream flattening case audit rows + synthesized reminder/event lifecycle rows, with a 31-verb → 11-category Tailwind color map. Clicking a row opens that subject's detail drawer in place over the feed (no navigation), with focus restored on close. A fixed SSR now is threaded through every relative-time call to prevent hydration drift.
  • Entity-360 pre-call brief pages. Resolving a person/company/topic name assembles one SSR brief in parallel: their vault knowledge page (over the same /api/vault the agent uses), every non-archived case whose vaultLinks include them, and a flattened relationship timeline (top 15). 404-safe ("Nothing on file"). Turns per-case vaultLinks into a reverse index with no separate datastore.
  • Safe Gmail deep-linking back to source. Two import-free leaf modules gate untrusted link strings: normalizeMessageUrl accepts only absolute http(s) (drops javascript:/data:/relative), and messageDeepLink prefers the structured url but otherwise mines the body with a Gmail-thread-anchored regex so a marketing footer URL is never mistaken for the message's own link.
  • Calendar, reminders, and priorities — three time/focus surfaces, single-field linking. An appointment, nudge, or starred favorite ties to work via exactly one field (event.caseId / reminder.caseId / case.starred); the reverse side is always derived by filtering, never a parallel array that can drift. caseId is validated against a real record inside the lock (closing the read-then-write TOCTOU). The calendar is its own stdio MCP (port 8003, 6 tools); reminder (8) and priority (5) verbs ride the existing board MCP with no new port. get_priorities lets the agent read the user's starred nodes + free-text notes so it triages toward what the user actually cares about. UTC-anchored due-buckets (Overdue/Today/Soon/Later) read correctly west of UTC.
  • needsAttention self-auditing triage tray. A four-bucket projection over live cases — overdue, aging-waiting (idle >3 d), untriaged (todo, no tasks, no priority), unlinked (not done, no knowledge attached) — plus a separate 5-day waiting_for_input SLA breach rendered as a "Waiting Nd" chip. The board's "what have I neglected" layer.
  • Board templates engine. apply_template/list_templates instantiate a pre-shaped case (status, priority, tags, seed-task checklist — e.g. a 6-step "Plugin onboarding checklist") through the same locked create path the agent uses, with every override validated against the enums and seed tasks store-minted.

See also: Board, Calendar, Reminders, Activity, Priorities.

Knowledge vault

A local-first, private LLM-Wiki — the only MCP that actually runs an LLM, and the one with the cleverest sandboxing.

  • LLM-Wiki re-synthesis engine (embedded Agent SDK). Unlike the thin fetch-wrapper MCPs, the vault server embeds @anthropic-ai/claude-agent-sdk and runs a full headless Claude Code query() session per tool call (claude-sonnet-4-6), cwd-scoped to the vault root. Implementing Karpathy's LLM-Wiki, every ingest re-synthesizes (rewrites, never appends) the 10–15 pages a substantive source touches, so back-links and consistency compound instead of accreting an append-only log. Two tools only: ingest + query.
  • Quadruple anti-recursion firewall. Because the vault is itself bridged in the repo's .mcp.json, a naive inner session would re-mount this server and recurse forever. Four independent safeguards prevent it: mcpServers:{} + strictMcpConfig:true (inner agent mounts no MCP, can't read any .mcp.json), disallowedTools hard-denying the vault tools, settingSources:['project'] pinned explicitly, and cwd=COS_VAULT_DIR. Plus an in-process FIFO semaphore (COS_VAULT_CONCURRENCY=2) caps concurrent subprocesses.
  • Knowledge-only editorial contract. The vault keeps timeless who/what/why and refuses any to-do, status, deadline, or checkbox (those go to the board). Enforced top to bottom: a "knowledge librarian" ingest persona, a query skill that declines pure open-work questions with a board pointer, and a lint that hunts stray - [ ] checkboxes and cross-domain leaks — flagging, never auto-restructuring.
  • Domain-split wiki with a strong thematic index. Hard-partitioned work/, life/, shared/ trees that never bleed (only truly-dual entities live once in shared/). Each domain is navigated through an index whose top-level sections are themes grouping concepts/entities/sources — a retrieval design, not a flat alphabetical list; every page sits under exactly one theme.
  • Bidirectional vault↔board cross-linking + entity resolution. A case's vaultLinks names the pages it draws on; each page carries the inverse cases: frontmatter (the two arrays are lint-checked inverses), recorded by reference only — the ingest agent has no board tool. A global aliases.md collapses one person's email, spoken name, and board name onto a single canonical wiki entity.
  • Local-first & private: gitignored, off-site-encrypted durability. .gitignore patterns auto-ignore every real vault/<name> while keeping the synthetic template tracked, so PII can never be committed by accident; setup verifies git check-ignore. Durability is the encrypted off-site backup, explicitly not git. The one MCP making outbound Anthropic calls sources its ANTHROPIC_API_KEY from a gitignored file via a launch wrapper, so the secret never enters the installed plist.
  • Arbitrary-file-read guard. ingest can attach on-device PDFs/images, but validateFiles() runs before the agent is invoked and accepts a path only if inside the vault or an explicit allowlist — tests assert /etc/passwd is refused with no LLM call.

See also: Spec.

Voice intake

Your real on-device voice notes, ingested exactly-once.

  • Reads the real OpenWhispr SQLite store read-only via the sqlite3 CLI. Instead of a native npm binding, it shells out to the macOS-bundled sqlite3 -readonly -json against ~/Library/Application Support/open-whispr/transcriptions.db — no native build, and WAL mode means safe concurrent reads while the app runs. Maps each row to its OpenWhispr-<ts>-<id>.webm, and reconciles both directions (a row whose audio vanished → audio_missing; a .webm with no row → synthetic orphan:<file>) so no recording is silently lost.
  • Server-owned watermark for idempotent ingestion. OpenWhispr has no native "mark read," so the server invents one: a single {id, created} JSON marker written via temp-file + atomic rename. list_transcripts returns only notes after the watermark; mark_processed (called last, after the vault/board write lands) advances it — making the loop at-least-once and naturally idempotent. It stores the note's true created, never wall-clock now (which would push the watermark past older unprocessed notes and hide them forever).
  • Numeric-aware ordering avoids a subtle data-loss bug. Real ids are integer PKs; a naive lexical compare mis-orders '10' < '9', which on a same-second tie could hide a strictly-newer note. compareIds detects integer ids and compares numerically — encoded as a regression test (mark note 9, assert note 10 stays unprocessed).
  • One spoken thought → cross-linked vault page and board case. The voice recipe routes each transcript through the ingest router: knowledge re-synthesizes wiki pages, an action creates/updates a board card with tasks and the right lane, and most notes produce both, cross-linked. A spoken name resolves through the vault alias map. Four-tier source resolution (OPENWHISPR_FIXTURES > real SQLite > legacy CLI > clean error) makes the whole path runnable fully offline against fixtures.

Agent-native architecture

One mutation path, two faces — and an MCP fleet that exposes the whole system to Claude.

  • 5 core stdio MCP servers, 62 tools total. Verified counts: board 44 (v3.3.0), calendar 6, guard 6, openwhispr 4, vault 2. Every server is an @modelcontextprotocol/sdk Server over StdioServerTransport. The board is the giant — cases + 3-tier hierarchy, tasks, notes, messages, 8 reminder verbs, 5 priority verbs, label-bundle config, templates, and the propose/approve/reject queue — dispatched by a 44-case switch that exactly mirrors the 44-entry tool array.
  • Optional WhatsApp add-on (13 tools). The external whatsapp-mcp (a Go whatsmeow bridge sidecar + a Python stdio MCP) exposes 13 tools — read contacts/chats/messages, fetch context and media, and send_message/send_file/send_audio_message — and is wired into both Claude clients the same launchd + supergateway way as the core five. It pairs to your phone's linked devices (QR), so it reads and sends as you — no second phone number, no bot persona. "whatsapp" is a first-class MESSAGE_SOURCE on the board, so WhatsApp threads become cases exactly like Gmail.
  • Agent-native parity — every human gesture is the visual twin of a board verb. The browser writes via one board-client.ts mapping 1:1 to HTTP routes; the agent's twin path (the MCP) hits the same routes over fetch. There is no human-only or agent-only way to change the board, and every write funnels through the one atomic, version-guarded mutate() path. The single deliberate exception (destructive "Clean Done" mass-purge) is kept off the agent surface on purpose.
  • Shared mcp-kit toolkit, relative-imported for launchd robustness. err/text/str/baseUrl/start/makeBoardApi deduped out of four near-identical preambles — imported by relative path (not a bare @cos/mcp-kit specifier) precisely because launchd invokes each server as node /abs/path/server.mjs with no workspace-install dependence. The kit imports nothing from the SDK, so each server's bare specifier resolves against its own hoisted copy. makeBoardApi attributes every write two ways (header and body flag).
  • npm workspace with one hoisted SDK. The 4 JS servers + mcp-kit share one root node_modules and lockfile (zero @modelcontextprotocol copies under any server). The Next.js board is deliberately excluded — adding it guts board/node_modules and 500s the dev server.
  • Dual-client bridge: launchd + supergateway HTTP for Claude Code, direct stdio for Cowork. The core five are fronted by supergateway --outputTransport streamableHttp --cors under per-server launchd agents on ports 8001–8005 (Claude Code over .mcp.json type:http), while Cowork spawns them directly as command entries (it rejects HTTP url entries). ensure-bridges.sh (chained into the board's dev/start) bootstraps and kickstarts the bridges and the two uv sidecars, probing /healthz leniently and never tearing bridges down.
  • Lightweight live-badge endpoints. GET /api/unread-count returns just {unread, version} so the sidebar's Inbox badge stays current off the SSE stream without refetching every message. Linking a message auto-derives a 90-char preview from the body when none is given.

Operations & setup

Install and run as invocable skills, with all machine config resolved from one loader.

  • Git-anchored, zero-hardcode config loader. Every setup/skill step sources config/load-config.sh (POSIX sh): REPO_ROOT is derived from git rev-parse --show-toplevel (never stored — "you can't read a file inside the repo to discover where the repo is"), safe defaults are seeded before cos.env exists, derived URLs are computed after the override so a changed port propagates everywhere, and it deliberately does not source secrets.env.
  • Four-file config split by concern and sensitivity. cos.env (machine paths/ports, gitignored) · secrets.env (the single ANTHROPIC_API_KEY, loaded only by the one process that needs it) · settings.json (board prefs incl. principalEmail) · auto-sync.json (the only committed one — a behavior default, not a machine value). Memoized resolvers add a security-relevant fail-safe: an unconfigured principal makes all trust derivation a safe no-op, so a fresh board auto-trusts nobody.
  • Setup-as-executable-skills, dependency-ordered. The whole runbook is invocable Claude skills with frontmatter triggers, so "set up Guard on a new machine" or "restore after data loss" is an action, not a wiki page. cos-setup sequences setup-vault → guard-setup → mcp-bridge-setup → backup-recovery (each producing what the next needs), with a copy-pasteable CHECKPOINT that must pass before advancing. The skills encode hard-won operational knowledge (launchd can't expand $VARS or see an nvm shim → plists pin literal paths; pm2 6.x fails to fork → launchd is the supervisor).
  • AES-256-GCM encrypted off-site backups, immutable git history. Zero-dependency node:crypto, self-describing on-disk frame COSBAK1: MAGIC(8)|salt(16)|iv(12)|authTag(16)|ciphertext, 32-byte key via scryptSync (N=2¹⁵, r=8, p=1) with a fresh random salt+IV every run. Snapshots push to a private GitHub repo (one .enc per run, never overwritten — git history is the versioned off-site record); the recovery key lives only in the macOS login Keychain. A failed git push degrades to "committed locally, re-push later" rather than losing the snapshot.
  • The BOTH model: a guaranteed floor + opportunistic top-ups. Three callers of one backup.mjs — a launchd agent at 03:30 (the floor, runs in the gui domain so it can read the login Keychain), a manual "Back up now" (?force=1), and a fire-and-forget top-up from hot read routes when the newest snapshot is >12 h old. All serialized by an exclusive .backup.lock (atomic openSync('wx'), reclaimed if >120 s stale). Exit codes are a protocol: 0=pushed, 2=committed-locally-only (still success), 3=benign lock-skip, other=hard fail.
  • Fail-closed identity guards on both sides. assertDefaultRepoOrRefuse() runs first and refuses to snapshot+push unless pointed at the genuinely-configured repo (so a /tmp sandbox test can't trigger a real backup); the board's isLiveBoard() is a positive-identity check (data dir inside REPO_ROOT/board/, correct repo, on-disk anchor) gating board-spawned backups. The read path is fail-safe (never throws); the write path is fail-closed.
  • Safe-by-default reversible restore. Restores are dry-run unless --apply, and verify three ways before any write (GCM auth tag → sha256 vs MANIFEST.json → every restored *.json parses). --apply first snapshots current state to ~/cos-recovery/pre-restore-<ts>/ (so even a restore is undoable), then prints the exact launchctl kickstart commands. A read-only /backups page merges 5+ independent sources into one fail-safe health envelope that never throws.

Quality

The depth is enforced, not asserted — by hermetic tests, an independent linter, and a generated docs site.

  • 415 headless unit tests, all passing — no build step. 18 *.test.ts files run directly over the board's TypeScript lib modules under node --test --experimental-strip-types (Node ≥22 strips types natively; a 6-line resolve hook retries extensionless specifiers). Determinism comes from injecting a fixed now; tests touch only in-memory fixtures.
  • Hermetic, net-zero, self-skipping integration harness. tests/run.sh mktemps a sandbox, rsync-copies the board, seeds COS_DATA_DIR from a fixture, dead-ends the sidecar URLs to a closed port, boots next dev on 3999, and runs 14 numbered live-board steps (15 scripts: api-*, board-lint, concurrency, guard-quarantine-release) — each independently gated (unit below Node 22 skip, python without uv skip, api steps never fall back to the live board). Every live test snapshots+restores cases.json in a finally, so a run is provably net-zero on every store.
  • Independent zero-dep invariant linter + dual-backend python suites. board-lint.mjs re-asserts 20+ structural invariant groups as a second source of truth; search/test_search.py (26 functions) exercises both ANN backends (turbovec + numpy brute-force); guard/test_guard.py (65 functions) runs fully offline via the heuristic classifier.
  • Fail-open / fail-closed properties tested as invariants. Search tests assert GET/POST are always 2xx and still find the seeded marker whether the sidecar is up or down (CI default: down) — proving the board searches with no sidecar and no uv. Conversely, guard tests drive the sidecar to prove release upserts trust while dismiss is inert, and that the released queue drains once replayed.
  • Golden fixtures asserted on structure, not prose. Because the LLM loop is non-deterministic, fixtures never diff wording — each *.expected.json lists invariants from a fixed six-verb vocabulary (page-exists, entity-links-case, no-duplicate-thread…) mapping 1:1 to the routing contract, with an explicit convergence criterion as the measurable "good enough" bar.
  • Generated-docs pipeline (single source of truth). A two-stage generator ingests a design workflow's JSON, validates colors/ids, splices the cleaned LABEL_BUNDLES into TypeScript, then re-parses that to emit docs/reference/labels.md — stamped "do not hand-edit," wired into the MkDocs nav, and validated by the strict build, closing the design→code→docs loop.
  • MkDocs Material site, strict-built and CI-published. mkdocs build --strict --verbose fails CI on any broken cross-link, deployed to GitHub Pages with least-privilege token scopes. CLAUDE.md elevates this to policy ("docs in MkDocs, not loose root .md"). Ships the full governance kit — MIT LICENSE, a substantive SECURITY.md documenting the designed security properties, and CONTRIBUTING.md.

Back to the docs home or the README.