Link - Scale

Practical Promise

Link is local personal memory software. The expected sweet spot is a real user's working wiki: hundreds to thousands of Markdown pages, raw sources, durable memories, and graph links. The product is designed so agents do not need to enumerate that whole corpus for normal work.

Search

SQLite FTS5 is used when Python provides it. Link stores a derived FTS sidecar under .link-cache/ so unchanged wikis can reuse the search table across process starts.

Cache reuse

Persistent page-cache records let repeated runs reuse unchanged parsed pages instead of rereading everything after every command. If SQLite FTS is unavailable, Link falls back to a token index and reports that backend in status and benchmark output.

Graph

Large graphs open as a bounded high-signal overview first. Users can search, filter, focus neighborhoods, or explicitly load more when they need it.

Agent context

MCP and CLI query packets use micro, small, medium, and large budgets. Agents read the recall capsule first, then follow-up actions instead of dumping every page into the model.

Bounded Surfaces

The important scale behavior is not "can Link list every page?" It is "does the default agent and UI path stay bounded?"

SurfaceDefault behaviorExpansion path

recallReturns a budgeted context packet plus a tiny hybrid-ranked recall capsule.Use follow-up actions for more memory, search, context, or graph detail.

admin(action="graph_summary")Returns a bounded topic or high-degree graph neighborhood.Increase limit/depth or request the full graph only when needed.

Local graph UIStarts capped with sparse labels and motion limits for large wikis.Use type filters, node search, focused neighborhoods, fullscreen, and explicit all-data loading.

All pagesLists a bounded window grouped by page type.Use paging, search, and type filters instead of scrolling the whole wiki.

HealthShows readiness, validation, interrupted writes, backend, and cache reuse.Run repair commands only when health points to a specific issue; doctor uses the parsed page cache for backlink checks.

Measure Locally

Recall quality is measured too, not just speed: the repo ships a 1,176-case recall benchmark and a third-party LoCoMo retrieval track with reproduction commands — see benchmarks/RESULTS.md.

Use lnk benchmark on your real wiki. It reports cache time, persistent-cache reuse, search backend, search/query timing, graph payload shape, value evidence, and recommendations. The value section compares broad wiki body text with the bounded query packet so you can see whether Link is reducing context-budget waste.

lnk benchmark "agent memory"
lnk health
lnk status --validate

From a source checkout, use a synthetic 10k-page check without touching your real wiki:

python3 scripts/smoke_large_wiki.py --pages 10000

The smoke script prints the generated wiki path, the local viewer command, and graph URLs so you can inspect browser behavior manually.

Large Wiki Habits

Prefer brief, query, and graph-summary over full exports.
Use page-type filters and search before opening full graph data.
Run lnk health after ingest or broad manual edits.
Run lnk doctor --fix only when health or validation points to a repairable issue.
Keep private raw sources and local wiki/log.md out of Git; use snapshot, team-sync, and compliance-export for reviewable sharing paths.

Current Limits

Link is not pretending to be an enterprise search cluster. These are the current boundaries to understand before betting a huge corpus on it:

Indexes are local and process-owned. Very large wikis still use memory proportional to the page/index size.
The SQLite FTS index is built locally from wiki pages rather than maintained as a separate hosted service.
The local HTTP viewer is for personal loopback use, not a multi-user hosted deployment.
Full graph exports are intentionally not the default for large wikis. Use bounded graph summaries first.
At tens of thousands of pages and beyond, persistent on-disk indexing and chunked graph loading become the next engineering frontier.

The near-term direction is clear: preserve the plain Markdown storage model while making cache, search, graph, and validation paths more incremental over time.