throughline¶
Stop re-explaining yourself to every new chat.
At a glance: ~33,700 LOC Python ยท 7 runtime modules ยท 1,840+ tests, zero external network calls in CI ยท 6 first-class vector store backends ยท 5 native rerankers ยท 16 LLM-provider presets ยท full threat model documented ยท MCP server with 7 tools covering save/recall/list-topics, three Reflection Layer tools (loose ends / consistency / drift), and a status/onboarding probe ยท vault portable across all AI tools (not locked to any one vendor) ยท running 24/7 against the maintainer's vault since v0.1.0.
v0.2.0 alpha โ running 24/7 in production. Issues welcome. Docs: https://jprodcc-rodc.github.io/throughline/
๐ Who this is for¶
throughline is built for people whose context is complex enough that re-explaining it costs real time, energy, or accuracy:
- People with multiple chronic conditions, polypharmacy, or layered medical history that no single chat ever fully captures.
- Long-term solo builders who keep losing the reasoning behind decisions they made months ago โ not the decision itself, the why.
- Neurodivergent thinkers (ADHD, autism, AuDHD, โฆ) whose working memory is already at capacity before the AI loop even starts.
If your use case is "occasional ChatGPT chats and I'd like them remembered," ChatGPT's built-in memory is probably enough. throughline is designed for the cases where built-in memory is not enough, and where carrying that context across model + provider switches matters more than zero-friction onboarding.
โจ What it does¶
Before throughline: Every new chat starts from zero. Your AI forgets your medical history, your project, your preferences, the conclusions you reached last month. You re-explain yourself every time. Past conversations pile up somewhere you never look again.
After throughline: Every conversation gets refined into a durable six-section Markdown card in your Obsidian vault. The next chat that overlaps automatically pulls the card back in. Your AI already knows you.
- Cards are plain Markdown โ grep them, edit them, read them in five years no matter what tool you use then.
- Taxonomy grows as you write โ 5 broad domains seed; system observes drift, proposes new ones for your approval.
- Zero lock-in โ 16 OpenAI-compatible LLM providers, 5 swappable vector stores, 5 swappable rerankers.
๐ก Why this exists¶
Most personal-knowledge tools either: - Record but don't synthesize (raw transcripts pile up) - Synthesize but lose personal context (generic answers about your own meds / projects / history) - Inject personal context but leak it into the public index (your RAG now has your address in it)
throughline separates mechanism (system provides) from content (you provide) at every layer, so you can safely share the engine without sharing yourself.
๐ Quickstart¶
Docker compose (evaluate in 5 minutes, no Python install)¶
git clone https://github.com/jprodcc-rodc/throughline.git
cd throughline
cp .env.example.compose .env # set ONE provider API key
docker compose up -d
docker compose run --rm daemon \
python -m throughline_cli import sample # 10 sample conversations
docker compose logs -f daemon
Default EMBEDDER=openai + RERANKER=skip keeps the image to
~400 MB. Add --build-arg INSTALL_LOCAL=1 for the full local
(bge-m3 + reranker) path. Full walkthrough in
docs/DEPLOYMENT.md ยง Docker compose.
Local install โ 1 command (--express)¶
git clone https://github.com/jprodcc-rodc/throughline.git
cd throughline
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
export ANTHROPIC_API_KEY=sk-... # or any of 16 supported providers
python install.py --express # ~3 seconds, writes config, prints next steps
python rag_server/rag_server.py & # FastAPI on :8000
python daemon/refine_daemon.py & # watchdog โ refine โ vault writer
That's the whole install. --express auto-detects whichever LLM
provider env var you have exported (OPENAI_API_KEY,
ANTHROPIC_API_KEY, OPENROUTER_API_KEY, โฆ), picks bge-m3 local
embedder + local reranker + Qdrant + sensible budget cap, prints
per-conversation cost + daily cap, and writes config. No interactive
prompts. Run --express --dry-run to preview without committing.
Drop filter/openwebui_filter.py into OpenWebUI's Admin โ Functions
panel; set its RAG_SERVER_URL valve to your local server. Now your
chats refine into cards, the cards get indexed, and the next chat
that overlaps gets the relevant cards injected.
Local install โ full wizard (16 steps, when you want full control)¶
If you want to pick a different vector DB, change the privacy tier,
import an existing OpenWebUI / ChatGPT / Claude history, or override
any default --express chose for you:
python install.py # walks you through all 16 steps
python install.py --reconfigure # later: change a few without restarting
python install.py --dry-run # preview the full wizard, no save
Steps, in order: Python check โ mission (Full / RAG-only / Notes-only) โ vector DB โ LLM provider โ privacy level โ embedder + reranker โ prompt family โ import source + path โ import scan + cost estimate + explicit privacy consent โ refine tier (Skim / Normal / Deep) โ card structure โ live-LLM preview of your first card with optional 5-dial tuning โ taxonomy strategy โ daily USD cap โ summary + run import.
Why 16 steps and not 5: each one is a decision the user actually needs to control (vector DB / provider / privacy / refine tier / card structure / taxonomy / budget). A 5-step wizard would have to lump these into "everything else" buckets โ and lose every reviewer who picks
chromafor the vector store but then can't change which prompt family their refiner uses. Full reasoning + the back-nav /--reconfigure/--dry-run/ pairwise-coverage design indocs/WIZARD_DESIGN.md.
Re-run, health-check, sample¶
python install.py --reconfigure # change a few settings
python -m throughline_cli doctor # 13 checks with remediation
python -m throughline_cli import sample # 10 synthetic conversations
python -m throughline_cli taxonomy review # approve self-growth signals
python -m throughline_cli refine --dry-run <raw.md> # preview refiner prompt, no LLM call
python -m throughline_cli stats # screenshot-friendly summary
python -m throughline_cli cost # LLM spend dashboard
python -m throughline_cli config validate # lint config.toml for typos / enum drift
Obsidian is optional. The daemon writes plain Markdown + frontmatter; any editor reads it. Obsidian is recommended for the graph + linking UI, but nothing downstream requires it.
Manual install (no wizard)¶
If the wizard is too opinionated for your setup, the long-form guide
in docs/DEPLOYMENT.md walks the same flow by
hand: configure .env, start Qdrant via Docker, launch the RAG
server + daemon, install the Filter.
Pluggable backends¶
| Component | Default | Alternates (today) | Coming in v0.3 |
|---|---|---|---|
Embedder (EMBEDDER) |
bge-m3 (local) |
openai (real impl); jina / voyage / cohere (alias โ OpenAI-compatible endpoint with custom EMBEDDING_API_BASE) |
nomic / minilm as distinct native impls |
Reranker (RERANKER) |
bge-reranker-v2-m3 (local) |
cohere, voyage, jina, skip (all real impls โ separate HTTP clients, not aliases) |
โ |
Vector store (VECTOR_STORE) |
qdrant |
chroma, lancedb, sqlite_vec, duckdb_vss (all embedded, zero-server), pgvector (Postgres + pgvector extension) โ all real impls, all six closed against tracking issues |
โ |
LLM providers¶
16 preset routes. Wizard step 4 auto-detects whichever env var
you've already exported and pre-selects the matching preset.
Existing users with OPENROUTER_API_KEY set keep working with zero
config change. Every preset speaks the OpenAI-compatible
/v1/chat/completions shape; throughline_cli/providers.py is the
canonical registry with verified-as-of-date model lists.
| Region | Providers | Env var pattern |
|---|---|---|
| Direct (anywhere) | OpenAI ยท Anthropic ยท DeepSeek ยท xAI | OPENAI_API_KEY / ANTHROPIC_API_KEY / DEEPSEEK_API_KEY / XAI_API_KEY |
| Hosted open-weights | Together.ai ยท Fireworks.ai ยท Groq | TOGETHER_API_KEY / FIREWORKS_API_KEY / GROQ_API_KEY |
| China (ๅคง้ access) | SiliconFlow (็ก ๅบๆตๅจ) ยท Moonshot (Kimi) ยท DashScope (Alibaba Qwen) ยท Zhipu (ๆบ่ฐฑ GLM) ยท Doubao (ๅญ่่ฑๅ ) | SILICONFLOW_API_KEY / MOONSHOT_API_KEY / DASHSCOPE_API_KEY / ZHIPUAI_API_KEY / DOUBAO_API_KEY |
| Multi-vendor proxy | OpenRouter (one key โ 300+ models) | OPENROUTER_API_KEY |
| Local / self-hosted | Ollama ยท LM Studio | (no key โ local HTTP) |
| Escape hatch | Any OpenAI-compatible endpoint | THROUGHLINE_LLM_URL + THROUGHLINE_LLM_API_KEY |
Smoke-test the install (after step 16): ask something in
OpenWebUI that overlaps your existing notes. You should see
โก anchor pass or auto recall: mode=general ยท conf=0.82 ยท N cards
above the reply.
๐ Access points โ Filter form vs MCP server form¶
The same vault + daemon + RAG server can be reached two different ways. Pick whichever matches the chat client you actually use.
Form A โ OpenWebUI Filter (the original delivery surface)¶
A single filter/openwebui_filter.py paste-into-Admin install. Runs
in-band with every chat turn, surfaces a status badge above each
reply, supports /recall / /forget / @pte slash commands. Best
for: heavy OpenWebUI users who want context injected automatically.
Setup is the wizard flow above.
Form B โ MCP server (Claude Desktop / Code / Cursor / Continue.dev)¶
A separate throughline-mcp PyPI package exposing seven tools over
stdio:
- Vault ops โ
save_refined_card,recall_memory,list_topics - Reflection Layer โ
find_loose_ends,check_consistency,get_position_drift - Discovery / onboarding โ
throughline_status(snapshot of install state, fires on "what's in my throughline?" / "first time using this" questions)
save_refined_card is the zero-LLM-cost save path: the host LLM
(Claude / etc.) synthesizes the 6-section card on subscription budget,
the tool just files it to vault. The older daemon-refining path
(save_conversation) is intentionally NOT exposed via MCP to avoid
double-billing subscription users โ OpenWebUI Filter form still uses
the daemon pipeline as before.
Best for: users who work across multiple AI tools and want one vault that all of them can write into and recall from โ not locked to any single vendor's backend. Setup:
# Once on PyPI (planned for v0.3 release):
pip install throughline-mcp # auto-pulls throughline + fastmcp
# Today, from a fresh clone:
git clone https://github.com/jprodcc-rodc/throughline
cd throughline
pip install -e . # base
pip install -e mcp_server # adds throughline-mcp + fastmcp
# Then add 4 lines to ~/Library/Application Support/Claude/claude_desktop_config.json
# (or the Windows / Linux / Cursor / Continue equivalent)
Full per-client config + troubleshoot in
docs/MCP_SETUP.md. The pip install throughline[mcp]
extras flag still works as a transitive shortcut (mcp = ["throughline-mcp>=0.1"]).
Both forms write to the same vault, share the same daemon and rag_server, share the same Qdrant collection. Running both at once is a supported configuration; conversations saved through MCP get refined by the daemon and become recallable from the Filter, and vice versa.
How this is different from Anthropic's chat memory + Cowork "thread"¶
Anthropic shipped two different features in 2026 that overlap with throughline's surface area. Honest framing of the differences:
- Anthropic chat memory (March 2026): reactive recall of past conversation snippets, locked to Claude, on Anthropic's servers. Trigger: user asks. Object: text snippets. "Where did I leave off?"
- Anthropic Cowork persistent agent thread (April 9, 2026 GA): an autonomous agent thread that runs tasks in an isolated VM โ schedule recurring jobs, manage workflows, "do things while you work elsewhere." Trigger: user delegates. Object: pending tasks
- their state. Action-oriented.
- throughline Reflection Layer: introspective surfacing of
thinking states in your knowledge base โ
find_loose_ends(renamed fromfind_open_threadsto avoid collision with Cowork's "thread" naming) for unfinished questions,check_consistencyfor contradictions with past reasoning,get_position_driftfor stance evolution. Daemon scans your refined card vault offline, surfaces results in any MCP-aware host, vault stays on user's machine. Reflection-oriented.
One-line dichotomy: - Claude memory remembers what you said. - Cowork executes tasks for you. - throughline knows what you stopped thinking about.
Three different products solving three different problems. Use
all three together if it makes sense. Reflection Layer design at
docs/REFLECTION_LAYER_DESIGN.md.
๐ฐ Cost expectations¶
throughline costs whatever your chosen LLM / embedder / reranker providers charge โ there's no throughline subscription on top. The wizard surfaces the unit cost up-front so you can decide before committing.
Per-conversation cost (refined into a card)¶
| Tier | Approx cost per refine | What you get |
|---|---|---|
skim |
~$0.005 | One Haiku call, 2-section card |
normal (default) |
~$0.040 | Sonnet 6-section card with full taxonomy routing |
deep |
~$0.200 | Opus multi-pass + critique + provenance audit |
Numbers are calibrated against MODEL_PRICING in the daemon and a
~3K-input / ~2K-output-token typical refine call. Your actual spend
depends on (1) which provider you pick, (2) the conversation length
distribution, and (3) how much of your existing chat history you
backfill on the first import.
What this means in practice¶
We deliberately don't extrapolate "$X/month" โ real usage is bursty (10 hours one day, 20 idle days). Instead the wizard asks you to set a daily USD cap at step 16; the daemon pauses when the cap is hit and resumes at midnight. Cap suggestions:
- Light evaluator ($5/day cap) โ ~125 normal refines max/day. Plenty for a 50-conversation backfill + ongoing daily use.
- Daily user ($20/day cap, the default) โ ~500 normal refines max/day. Headroom for catching up on a year of history without flagging.
- Power user with deep tier ($50/day cap) โ ~250 deep refines or a mixed daily flow. Unusual; most users never hit it.
Run python -m throughline_cli cost after a few days for actual
measured spend per provider, per tier, per day.
Free local-only path¶
If you pick EMBEDDER=bge-m3 + RERANKER=bge-reranker-v2-m3 +
VECTOR_STORE=qdrant + ollama (or lm_studio) as the LLM
backend, nothing leaves your machine and nothing costs anything
beyond the disk + RAM + GPU time you already pay for. Hardware
floor: ~8 GB RAM for inference; a discrete GPU helps refine speed
but isn't required (CPU-only refines run at ~30โ60s per card).
This is the path the wizard's local_only privacy tier
pre-configures.
๐ What a refined card looks like¶
You ask in chat six months ago: "I lost 12kg on strict keto but my weight's creeping back even at <30g carbs/day. What's going on?"
Six months later you hit it again. Without throughline, the AI starts
from scratch. With throughline, the daemon already refined that
conversation into a card โ Keto rebound after 6 months โ three
mechanisms, not willpower โ and the next chat pulls it back in.
---
title: "Keto weight rebound after 6 months โ three mechanisms, not willpower"
date: 2026-04-02 20:42:00
knowledge_identity: personal_persistent
tags: [Health/Biohack, y/Mechanism, z/Node]
source_conversation_id: "sample-002-keto-rebound"
claim_provenance: user_stated
---
# Scene & Pain Point
After ~6 months of strict keto (<30g carbs/day) with 12kg lost, weight
is creeping back despite holding the same macro rules. Easy to read as
willpower failure; usually not.
# Core Knowledge & First Principles
Three compounding mechanisms, in order of likely magnitude:
1. Adaptive thermogenesis โ BMR drops 10-15% during weight loss.
2. Calorie creep โ fat-fueled meals are calorie-dense; satiety adapts.
3. Insulin-sensitivity recovery โ small glucose loads stored more efficiently.
# Detailed Execution Plan
- Track 7 days of intake honestly vs TDEE minus 15% adaptation deficit.
- Re-introduce measured portions; eyeballing stops working at 4-6 months.
- If the gap is real, the intervention is calories, not carbs.
# Pitfalls & Boundaries
- "Still under 30g carbs" โ "still hypocaloric". Protocol-adherence
doesn't equal caloric deficit.
# Insights & Mental Models
6-month rebound is a data story, not a discipline story. Recovering
biology doing exactly what it's designed to do.
# Length Summary
Keto rebound at month 6 = adaptive thermogenesis + portion drift +
insulin recovery. Fix is a measurement week, not more willpower.
This is the file you grep with ripgrep, embed for RAG, and re-read
in five years. The conversation it came from is one line in a daemon
log. Six structural sections + frontmatter (with knowledge_identity,
XYZ axis tags, and claim_provenance so retrieval can filter
user-stated facts from LLM speculation) โ every refined card has the
same shape so downstream tooling stays simple.
How is this different from mem0 / Letta / SuperMemory / OpenWebUI built-in memory?¶
Short answer: throughline produces durable, human-readable Markdown that lives in your file system. The others produce vectors that live in their store. Different point on the privacy / portability axis, and a different target user โ not a "better than" claim.
| throughline | mem0 | Letta | SuperMemory | OpenWebUI memory | |
|---|---|---|---|---|---|
| Local-only mode | yes (default) | yes (with self-host config) | yes (open-source server) | no (cloud service) | yes (built into OpenWebUI) |
| Source of truth | Markdown files in your vault | own DB | own DB | cloud DB | sqlite next to OpenWebUI |
| Taxonomy mechanism | LLM proposals + manual approval into 9-domain hierarchy | similarity-based recall (no taxonomy) | similarity-based recall | similarity-based recall | flat note list |
| If you migrate away | vault is plain Markdown, grep-able with rg |
DB export to JSON | DB export | depends on cloud export | sqlite dump |
| Designed for | individuals with complex personal context | app developers integrating memory APIs | agent / multi-step builders | consumers wanting smart memory | casual OpenWebUI users |
throughline is heavier to install (it's a daemon + RAG server + Filter,
not a SaaS subscription or a pip install + one line) โ and that's a
real cost. The trade-off you're paying it for: cards live in plain
Markdown you can grep, edit, or read in five years independent of any
tool decision you make today.
๐๏ธ Architecture¶
flowchart TD
user[User in OpenWebUI]
filter[Filter inlet<br/>3-tier gate]
rag[RAG server<br/>embed + rerank]
qdrant[(Vector store)]
llm[LLM]
exporter[Exporter]
raw[Raw Markdown]
daemon[Refine daemon]
vault[Markdown vault]
buffer[Buffer stub]
user -->|turn| filter
filter -->|cheap / judge / slash| rag
rag --> qdrant
rag -->|context cards| filter
filter -->|injected system prompt| llm
llm -->|reply + status badge| user
user -. conversation .-> exporter
exporter --> raw
raw --> daemon
daemon -->|formal note| vault
daemon -->|stub| buffer
daemon -->|upsert embeddings| qdrant
buffer -. human triage .-> vault
Two independent pipelines meet at the vector store and the Markdown vault on disk. The Filter pipeline runs per-turn, in-band with the conversation, and never writes to the vault. The daemon pipeline runs out-of-band, produces knowledge cards from completed conversations, and never reads live chat sessions. Filter bugs cannot corrupt the vault; daemon bugs cannot pollute a live reply.
Reflection Layer (v0.3) data flow¶
flowchart LR
vault[Markdown vault<br/>cards from refine pipeline]
daemon[Reflection Pass daemon<br/>8-stage offline pipeline]
state[State files<br/>JSON under $THROUGHLINE_STATE_DIR]
mcp[MCP tools<br/>Claude / Cursor / Continue.dev]
host[Host LLM<br/>in conversation]
vault -->|walks + parses| daemon
daemon -->|stage 3-7 LLM calls| daemon
daemon -->|writes| state
daemon -.->|--commit-writeback| vault
mcp -->|reads| state
mcp -->|tool result| host
host -->|find_loose_ends / check_consistency / get_position_drift| mcp
The Reflection Layer adds a third pipeline: an offline daemon
that aggregates card-level signals into cross-card metadata
(open threads, contradicting positions, drift trajectories). The
daemon writes JSON state files that MCP tools read in real-time.
LLM cost stays on the daemon's offline path; the MCP hot path
is sub-millisecond JSON read. See
docs/REFLECTION_LAYER_USER_GUIDE.md
for the user-facing workflow,
docs/RUNTIME_STATE_FILES.md for
state file schemas.
Beyond the diagram¶
Several load-bearing safety + quality layers most "AI memory" tools
skip โ the parts that take real engineering, not just embed + store
+ retrieve:
- 3-tier recall gate in the Filter (cheap heuristic โ Haiku judge โ explicit slash command) โ keeps RAG silent when the user's question doesn't actually want context, instead of dumping every remotely-relevant card into the prompt.
- 5-layer RAG integrity controls โ anti-pollution rules,
claim_provenancetagging, de-individualisation pre-write, explicit DATA-not-INSTRUCTIONS wrapping at retrieval, forbidden- prefix denylist before Qdrant upsert. Prevents stored cards from becoming a prompt-injection vector against future replies. - 4-layer Personal Context stack โ profile / preference data rides the Filter's injection path but never enters Qdrant. A daemon bug cannot leak it into public RAG; a Filter bug cannot corrupt it.
- Echo Guard โ detects redundant card refines (LLM rewrites of the same content) and rejects them so the vault doesn't balloon with re-summaries.
- Self-growing taxonomy with human-in-the-loop โ the system proposes taxonomy expansions based on observed conversation patterns; it never adopts them autonomously. Stops the LLM-emits- ten-variants-of-the-same-tag drift that degrades retrieval.
- Forward-slash path-normalisation invariant โ load-bearing Windows / *nix path mismatches become silent Qdrant point-id corruption otherwise.
- Pluggable backend ABCs โ
BaseEmbedder/BaseReranker/BaseVectorStorefactories with alias routing, lazy model load, and stub-on-missing-dep so the wizard can list options without crashing at import. - Honest threat model (
SECURITY.md+docs/THREAT_MODEL.md) โ what's defended against, what isn't, and why each scope cut was made.
Full breakdown of each layer:
docs/ARCHITECTURE.md. Why each design call
was made: docs/DESIGN_DECISIONS.md.
๐งญ Key engineering decisions¶
Five non-obvious calls a reviewer might want to see reasoning
behind. Full Alternatives considered โ Call โ Reason writeups
for all 13 decisions live in
docs/DESIGN_DECISIONS.md.
- Haiku RecallJudge instead of regex or Sonnet (#1) โ the per-turn recall gate calls a calibrated Haiku 4.5 judge at 93.8% accuracy, ~5ms latency, ~$0.0003 per turn. Regex-only would over-recall and pollute every prompt; Sonnet would 30ร the cost
- latency without measurable accuracy gain on the ground-truth set.
- Dual-write โ buffer stub + formal note (#3) โ every refined
conversation produces both a formal vault card AND a triage stub
in
00_Buffer/. The stub path lets the user defer routing decisions without losing data; formal-only would force premature taxonomy commitments on conversations that don't yet have a clear domain. - Forward-slash path normalisation as a load-bearing contract
(#4) โ
_norm_path()is called on every Qdrantpoint_idderivation; Windows back-slash leaks would silently corrupt the vector index and produce zero-recall on cross-platform vaults. Most projects treat this as a Day-2 concern; throughline tests it from Day 1. - Separate Qdrant collections for sensitive packs (#7) โ the pack runtime supports per-pack collection override so a medical pack's cards never share a collection with public-domain cards. A collection-drop wipes just that pack's data; a collection leak exposes just that pack. Limits blast radius of any single index compromise.
- Prompts hardcoded in Python, not YAML (#8) โ controversial but deliberate. Hot-reload feels nice in theory; in practice prompt-format mistakes surface as malformed JSON outputs that are nearly impossible to debug without source-control history. Hardcoded prompts get the same PR review + CI test pass as the code that calls them.
๐ Repository layout¶
throughline/ ~33,700 LOC Python, ~5,400 LOC tests (1,300+ cases)
โโโ daemon/ 3.3K LOC Refine daemon (slice โ refine โ route โ vault writer);
โ watchdog ingest, claim_provenance + de-individualization
โ rules, dial vocabulary renderer, taxonomy growth observer
โโโ rag_server/ 2.3K LOC FastAPI: /v1/embeddings, /v1/rerank, /v1/rag,
โ /refine_status. BaseEmbedder/Reranker/VectorStore ABCs
โ + 13 concrete impls (2+5+6).
โโโ throughline_cli/ 8.5K LOC Install wizard (16 steps + --express + --reconfigure +
โ --dry-run), 4 import adapters (chatgpt/claude/gemini/
โ openwebui-sqlite), doctor (13 checks), taxonomy CLI,
โ cost & stats dashboards, config validation
โโโ filter/ 2.2K LOC Single-file OpenWebUI Filter (paste-into-Admin install):
โ 3-tier recall gate, badge UI, /recall + /forget +
โ @pte slash commands, valves config schema
โโโ mcp_server/ 2.1K LOC MCP server entry. 7 tools โ vault ops
โ (save_conversation, recall_memory, list_topics),
โ Reflection Layer (find_loose_ends,
โ check_consistency, get_position_drift), and
โ throughline_status (discovery probe). Stdio
โ transport via fastmcp, talks to existing daemon
โ + rag_server (no shared-core changes)
โโโ packs/ 0.4K LOC Pack runtime (per-domain slicer/refiner/routing override)
โโโ scripts/ 0.7K LOC ingest_qdrant.py, derive_taxonomy.py, uninstall scripts
โโโ prompts/en/ Verbatim runtime prompt strings (8 refiner variants
โ across 3 tiers + slicer + 4 graders)
โโโ config/ .env.example, taxonomy template (5-domain minimal +
โ 9-domain example), forbidden_prefixes denylist,
โ launchd / systemd service templates
โโโ samples/ 10 synthetic conversations + recording recipe
โโโ docs/ ARCHITECTURE (700 lines), DEPLOYMENT, DESIGN_DECISIONS,
โ FAQ, THREAT_MODEL, TAXONOMY_GROWTH_DESIGN, TESTING,
โ FILTER_BADGE_REFERENCE, ONBOARDING_DATA_IMPORT,
โ ALPHA_USER_NOTES, CHINESE_STRIP_LOG, PHASE_6_CHECKLIST
โโโ fixtures/ 15.7K LOC Regression suite (`pytest fixtures/`); zero real
network calls in CI, every LLM/HTTP call mocked
Each top-level directory has its own README.md for local detail.
Regression tests: see docs/TESTING.md.
๐ Links¶
- Docs site โ full navigable documentation
- Architecture โ how the pieces fit
- Deployment โ end-to-end install
- Design decisions โ why each call was made
- Roadmap โ what's shipping next
- Changelog โ version history
๐ค Contributing¶
PRs welcome. See CONTRIBUTING.md. Good first
issues filter:
https://github.com/jprodcc-rodc/throughline/labels/good%20first%20issue.
๐ License¶
MIT โ do what you want, no warranty.
๐ Acknowledgments¶
Built on: - OpenWebUI โ the chat frontend - Qdrant โ default vector store (others swappable) - BAAI/bge-m3 + bge-reranker-v2-m3 โ default local embeddings + reranking - The LLM providers listed above โ bring whichever one you already pay for