FAQ¶
Questions that come up repeatedly on HN / Reddit / X / email. If your question isn't here, open a Discussion.
Table of contents¶
- What is the Reflection Layer? How is it different from chat memory?
- How is this different from ChatGPT's built-in memory?
- How is this different from Claude Projects?
- How is this different from
mem0? - Can I use this alongside
mem0/ Letta / SuperMemory? - Do I need Obsidian?
- Do I need OpenWebUI?
- Does this work with Claude.ai / ChatGPT.com / Gemini directly?
- How much does it cost to run?
- Does it work fully offline?
- What happens to my data?
- Can I self-host all of it?
- Why 16 LLM providers? Feature creep?
- Does it support Chinese / Japanese / other languages?
- What throughline is not
- What's the roadmap?
- I hit a bug. What do you want from me?
What is the Reflection Layer? How is it different from chat memory?¶
The Reflection Layer is throughline's v0.3 differentiator. It exposes three MCP tools that turn the existing card store into a thinking-state tracker, not just a memory tracker:
| Tool | Surfaces | Use case |
|---|---|---|
find_open_threads |
Cards with unresolved questions where no later card on the same topic answered them | "I want to think about X again" → Claude shows you what you stopped thinking about |
check_consistency |
Historical positions (with their original reasoning) on the topic of the user's current statement | "I'm going with X" → Claude surfaces the case you made AGAINST X two months ago, asks if anything changed |
get_position_drift |
Chronological trajectory of cards on a topic, with stance + reasoning per entry | "What's my current framework for X?" → Claude shows you the three reasoned phases your thinking went through |
How this differs from chat memory (Claude Desktop, ChatGPT, mem0, Letta, OpenMemory MCP, etc.):
| Chat memory | Reflection Layer | |
|---|---|---|
| Trigger | Reactive (you ask) | Proactive (daemon scans + flags) |
| Object | Conversation snippets / facts | Thinking states (reasoning posture) |
| Capability | "Find related conversations" | "Find unfinished thinking" |
| Data ownership | Vendor servers | Local vault |
| Cross-tool | Locked to one vendor | Vault works across all AI tools |
One-line distinction: - Claude Desktop / ChatGPT memory remembers what you said - throughline's Reflection Layer knows what you stopped thinking about
These are different needs in a user's head. The market currently blurs them — that's throughline's empty niche. The framing is memory loyal to your past thinking, not your present comfort.
For implementation details:
docs/REFLECTION_LAYER_DESIGN.md;
schema in docs/POSITION_METADATA_SCHEMA.md;
runtime state files in
docs/RUNTIME_STATE_FILES.md.
Engineering gate: clustering accuracy ≥75% pairwise on the maintainer's vault — cleared 2026-04-28 at 0.975.
Real-vault one-time cost (when the user opts into LLM-using
stages 3 + 4): ~$0.01 / ¥0.07 on gemini-2.5-flash against a
72-card reflectable subset. Cache files dedupe so re-runs are
essentially free.
How is this different from ChatGPT's built-in memory?¶
ChatGPT memory stores short facts ("user prefers metric units", "user is vegetarian") in OpenAI's database. Three differences:
- You can't read them in a text editor. throughline writes plain Markdown files in your vault. You can grep, back up, version-control, or open them in any editor — ChatGPT memory is a SaaS row you can't see.
- It captures thinking, not just labels. ChatGPT memory remembers WHAT you like; a throughline card captures the six-section reasoning that led you to a conclusion (pain point, mechanism, execution, pitfalls, insights, summary).
- It survives tool changes. When you move off ChatGPT in two years, the memory disappears. A vault of refined Markdown cards keeps working with whatever comes next.
How is this different from Claude Projects?¶
Claude Projects is a system prompt + file-attachment scope. Throughline produces the same kind of content Projects consume — refined cards you'd want the LLM to see — but:
- The refining is automatic. You don't paste cards into a Project; they're generated from chat conversations as you have them.
- It works across LLMs. A Claude Project only runs inside Claude.ai. Throughline's cards feed OpenWebUI, which can talk to any of 16 LLM providers; the cards themselves are model-agnostic.
- Cards are Markdown in your filesystem, not a container- attached asset you can't export cleanly.
How is this different from mem0?¶
mem0 is a Python library that stores vectors in a service-managed store. Great for application developers who want a drop-in memory API.
Throughline is different in audience: throughline targets a vault-keeping individual, not an app builder. The Markdown-first design is the feature, not a workaround. If your end goal is "give my app a memory backend", mem0 is a better fit; if your end goal is "never re-explain myself to a new chat + have a searchable journal I actually read", throughline is.
See the comparison table in the README.
Can I use this alongside mem0 / Letta / SuperMemory?¶
Yes, but the value-add diminishes. Throughline's cards already cover the recall dimension those tools specialise in. Running both doubles storage + cost + LLM token spend without obvious upside.
The one stackable case: use Letta for an agent's stateful execution memory (its to-do list, its current call graph) and throughline for the user's learned knowledge store. Different slots.
Do I need Obsidian?¶
No. The refine daemon writes plain Markdown files with YAML
frontmatter — same format VS Code, nvim, iA Writer, Typora,
Sublime, or Notepad can read. Obsidian is recommended because
its graph/linking UI matches the six-section card shape, but
it's not required. Nothing downstream depends on Obsidian's
.obsidian/ folder, canvas files, or plugin format.
Do I need OpenWebUI?¶
The full flywheel assumes OpenWebUI as the chat frontend (the Filter is an OpenWebUI Function). But two of the three missions are lighter:
- RAG-only: skip the Filter; you just want cards in Qdrant and
a FastAPI endpoint (
/v1/rag) your own code can hit. No OpenWebUI needed. - Notes-only: skip the vector store and the Filter. You just want the refine daemon writing Markdown from some conversation source (e.g. a chat export you drop into the raw directory).
See the wizard's step 2 (Mission) for the choice.
Does this work with Claude.ai / ChatGPT.com / Gemini directly?¶
Not live — those web apps don't expose a Filter-style hook point.
But the import adapters let you bulk-convert an exported
history: python -m throughline_cli import claude <zip>,
import chatgpt, or import gemini turn any official data
export into raw Markdown the daemon refines the same way.
Once imported, future conversations on those platforms go through an OpenWebUI frontend that proxies Claude / OpenAI / Gemini / etc. APIs. Same keys, different UI.
How much does it cost to run?¶
Two cost axes:
Upfront (one time):
- ~4.6 GB download for bge-m3 embedder + bge-reranker-v2-m3
(if using the default local setup). Cloud embedders skip this.
- Docker install for Qdrant (~150 MB image).
Recurring: - LLM API calls for refine. Rough numbers: - Skim tier: ~$0.005/conversation (Haiku-class, single call). - Normal tier: ~$0.015-0.05/conversation (Sonnet-class, slicer + refiner + router). - Deep tier: ~$0.20/conversation (Opus-class + critique pass). - Qdrant + bge-m3 + daemon + rag_server all run locally after the initial model download. Zero recurring infra cost.
A heavy user (20 conversations/day on Normal tier) spends ~$10-30/ month. The wizard's step 15 sets a daily USD cap the daemon enforces (pauses queue when reached, resets at local midnight).
Use python -m throughline_cli cost for the live dashboard.
Does it work fully offline?¶
Almost. The only mandatory cloud call is the LLM refine (and even
that can be a local Ollama via the ollama provider preset).
Every other component — embedder, reranker, vector store, vault
writer — runs locally by default.
For fully-offline mode: pick Ollama at wizard step 4, local
bge-m3 at step 7, local qdrant at step 3. No internet needed
beyond the first model downloads.
What happens to my data?¶
- Your conversations get written to a raw directory the daemon watches. Sent to the LLM provider you chose (one of 16) when refine fires. Never sent anywhere else.
- Refined cards land in your vault as plain Markdown.
de_individualizationrules in the refiner prompt replace private IPs, home paths, and personal emails with placeholders (192.0.2.10,/path/to/...,user@example.com) before the card is written. - The
import_sourcetag in every card's frontmatter lets you bulk-purge any batch.import sample-2026-04-25tagged cards can be removed with onerg -l + rmorthroughline_cli uninstallon a wider sweep. - No telemetry. throughline makes zero outbound requests to any throughline-operated server. There isn't a throughline- operated server.
See SECURITY.md for the disclosure channel +
THREAT_MODEL.md for the full attack-surface
enumeration.
Can I self-host all of it?¶
Yes:
- Qdrant — Docker container.
- RAG server — local FastAPI process.
- Daemon — local systemd / launchd service.
- Filter — pasted into your own OpenWebUI install.
- LLM — Ollama on localhost, or any OpenAI-compatible server reachable on your LAN (LM Studio, vLLM, text-generation-webui, TEI).
No throughline-owned infrastructure in the loop.
Why 16 LLM providers? Feature creep?¶
Because locking to one (historically OpenRouter) made the tool unusable for users who already pay for Anthropic / OpenAI / a Chinese-market provider, or who want to run entirely on Ollama. The abstraction is 200 lines of code; the value is "works with whatever you already pay for". The wizard auto-detects the env var you have set — most users only think about this once.
Does it support Chinese / Japanese / other languages?¶
Prompts ship in English only (prompts/en/). The codebase was
originally Chinese-first and de-localised for the open-source
release; see docs/CHINESE_STRIP_LOG.md
for what was stripped and what's re-introducible.
User-content handling is locale-neutral: - Non-ASCII card titles, tags, and body text round-trip correctly. - The taxonomy observer normalises AI/代理 and AI/Agent as string-distinct tags; v0.3 adds semantic clustering. - Frontmatter dates use ISO format, locale-independent.
To contribute translated prompts, see
prompts/README.md § Adding a new language.
What throughline is not¶
External reviewers (and good-faith first-time visitors) sometimes arrive expecting a SaaS-shaped tool and bounce off when throughline doesn't fit. Stating the negative space explicitly so nobody wastes their time:
throughline is not a SaaS, not a cloud product, not a multi- device sync layer, and not a team collaboration tool. It is a single-user, local-first system designed for one person managing their own context on their own machine. The vault you build is yours, on your disk, in plain Markdown — no account, no shared namespace, no admin panel.
What that means for specific shapes of need:
- "I need this to follow me across phone + laptop + tablet." throughline doesn't ship sync. The vault is a folder; layer Syncthing / Resilio / iCloud Drive / Obsidian Sync on top if you want cross-device. We may add an opt-in sync bridge later, but "your data, your machine" stays the core philosophy.
- "My team needs to share context." Wrong tool. throughline optimises for individual cognitive density, not for handing context off between people. A shared vault would dilute the privacy + identity scoping the refiner relies on.
- "I want a polished mobile app." Reading surface is OpenWebUI
- Obsidian; both have their own mobile stories. throughline's daemon + RAG server are server-side processes that wouldn't run on a phone anyway.
- "Can it replace ChatGPT / Claude / Gemini for me?" No. It augments whichever LLM you already pay for — it gives that LLM durable memory of you, not a UI to talk to it.
If any of those shapes is your blocker, throughline is the wrong
tool for you and we'd rather you find that out here than after
even the 1-command --express install.
What's the roadmap?¶
- v0.2.x (now): bug fixes + small polish against v0.2.0. Several items originally pencilled in for v0.3 already shipped here — see "Shipped in v0.2.x" in the full roadmap.
- v0.3: frontend decoupling (MCP server adapter, OpenAI-compatible
proxy adapter), engineering hardening (path-invariant lint rule,
recall-accuracy regression suite), stale-triage auto-archive,
remaining native embedders (
nomic/MiniLM),taxonomy retagCLI, PyPI release, hero screencast. - v1.0: stability commitment on config.toml / CLI / ABI; shipped when someone OTHER than the author has been running it in production for meaningful hours and the recall-accuracy suite has 3+ model-version data points.
Full list: ROADMAP.md.
I hit a bug. What do you want from me?¶
Open a bug report. The template asks for:
python -m throughline_cli doctoroutput (redact absolute paths).- OS + Python version.
- Which component (wizard / daemon / rag_server / Filter / …).
- Which LLM provider + model you picked.
- Logs from
~/throughline_runtime/logs/refine_daemon.logor rag_server stderr — redact paths and API keys before pasting.
That's usually enough for me to tell you what to try next.