Changelog¶
All notable changes land here. Dates are UTC. For the full commit
list behind any entry, git log vX..vY.
This project follows Semantic Versioning; pre-1.0 minor bumps can include breaking config shape changes.
Contributors: add new entries under
[Unreleased]below. When the next version is cut, the maintainer renames that section to the new version number, dates it, and starts a fresh[Unreleased]block on top.
[Unreleased]¶
Added — P0-2 stance-based drift detection (algorithmic surface complete) (2026-04-29)¶
Phase 2's claim-level drift detector — the engine behind
get_position_drift and check_consistency once it lands —
shipped the algorithmic surface plus the LLM judge spec, but not
yet the live LLM HTTP wrapper.
What's there:
mcp_server/claim_schema.py— pydantic v2Claimmodel with 9 LLM-emitted fields (subject, subject_canonical, predicate, object, stance ∈ [-1.0, +1.0], hedging ∈ [0.0, 1.0], raw_text, reasoning, kind) and 4 daemon-attached persistence fields (id, card_id, timestamp, source_turn_id). Strict validators on range, snake_case canonicalization, predicate enum (11 verbs), kind enum (stance / fact / preference / commitment).mcp_server/drift_detector.py— pure-functiondetect_drift(current, vault, judge) → DriftReport | Noneanddetect_consistency_issues(current, vault, judge) → list[ConsistencyReport]. Five-gate algorithm: kind-eligibility → history-exists → |stance_delta| ≥ 0.6 → judge says drift → judge confidence ≥ 0.5. Severity formula damps bymax(prior.hedging, current.hedging). Vault and LLMJudge are Protocols so storage backends and judge models swap freely.prompts/en/claim_extraction.md— extraction prompt that emits the Claim shape from a conversation slice. 5 worked examples including the reverse cases (factual question, quoted_other) that produce empty arrays.prompts/en/drift_judge.md— judge prompt with 5 worked verdict examples and confidence calibration anchors. Outputs strict 3-field JSON consumed byparse_judge_verdict()/safe_parse_judge_verdict().docs/CLAIM_STANCE_SCORING.md— design integration doc explaining how the new numeric stance/hedging extends the existing categoricalposition_signalblock without breaking the 72+ already-refined cards in the maintainer's vault.- 200+ new unit tests pinning the algorithm behaviour with
fully-mocked LLM judge (no network calls in CI). 2032 / 3
skipped in
fixtures/v0_2_0.
What's deferred:
- The live LLM HTTP wrapper conforming to LLMJudge. The
protocol is locked; the wrapper is a small commit alone.
- Refiner prompt updates to emit numeric stance_score /
hedging_score (gated on a focused review session per the
feedback rule against speculative refiner prompt churn).
- Wiring detect_drift / detect_consistency_issues into
daemon/reflection_pass.py.
Commits: 3b5d71d, f91a94a, 3c91e52, 04a19b6, 33ed64b.
Added — Tool-triggering eval harness + P0-1 closure (2026-04-29)¶
Closes P0-1 of private/SPEC_DEEP_OPTIMIZATION.md. Anthropic
Messages API's tool-selection behaviour was previously evaluated
by hand. Now there's a reproducible harness that scores per-tool
F1 + false-positive rate against a fixture set of utterances.
What ships:
evals/tool-triggering/run_eval.py— auto-detects provider from env (OpenRouter or Anthropic direct), introspectsmcp_server.tools.*into Anthropic tool defs (function signatures → JSON schema, full Decision-Guide docstrings → descriptions), runs the fixtures through the Messages API, computes per-tool TP/FP/FN/TN.--no-llmdry-run validates fixtures + tool schemas without burning any API budget.evals/tool-triggering/fixtures/{drift,consistency,loose_ends, recall}-{positive,negative}.jsonl— 24 hand-written cases covering each of the 4 differentiation tools.evals/tool-triggering/results/2026-04-29.{md,jsonl}— baseline run + iteration 1 result, both committed for reproducibility.
Concurrent docstring upgrades. The 7 host-callable MCP tool
docstrings were rewritten in the spec's "Decision Guide" format
(CALL THIS PROACTIVELY WHEN: / DO NOT CALL WHEN: /
EXAMPLE TRIGGERS: / EXAMPLE NON-TRIGGERS:). The format is
now a contract enforced by 28 scaffold tests across all 7 tools.
Iteration trajectory on claude-sonnet-4-6 via OpenRouter:
| Tool | Baseline F1 | Iter 1 F1 |
|---|---|---|
get_position_drift |
1.00 | 1.00 |
check_consistency |
0.75 | 1.00 |
find_loose_ends |
0.86 | 1.00 |
recall_memory |
0.55 | 0.86 |
Spec target (F1 ≥ 0.75 + FPR ≤ 0.15 per tool) cleared on all 4
differentiation tools after one iteration of docstring tuning.
Two key fixes: recall_memory got HARD OVERRIDE non-triggers
covering fresh-start signals (English + Chinese); check_consistency
got an explicit BOUNDARY-WITH-drift section routing topic-decision
announcements to drift instead of double-firing.
Cost: ~$0.10 OpenRouter credit total for both runs (24 cases × ~10K tokens × 2 runs at Sonnet 4.6 pricing). The eval is small enough to grow alongside the fixture set without budget concern.
Pending: Task 1.3 (≥30 real conversation fixtures from the maintainer's Claude history) is the remaining open item — only the user can capture and label these from their actual sessions.
Commits: 2d499b9, 9ac33e3, 06da9ac, 18445c1, ddc8e85,
a778b9a, 765389d.
Renamed — find_open_threads → find_loose_ends (Cowork collision) (2026-04-29)¶
Anthropic shipped Cowork persistent agent thread at GA on
April 9, 2026 — an autonomous task execution agent running
workflows in an isolated VM. Throughline's find_open_threads
surfaces unfinished thinking in your knowledge base. Same word
"thread", completely opposite shape. The naming collision creates
user confusion.
Renamed the public surface:
- MCP tool find_open_threads → find_loose_ends (file
find_open_threads.py → find_loose_ends.py via git mv)
- Slash prompt /threads → /loose_ends
- All docstring + doc references updated
Backward compat (preserves existing vault data):
- State file on disk kept as reflection_open_threads.json
- Card frontmatter status: open_thread kept
- daemon-internal helpers like reflection_explain._find_open_threads_entry
kept (no public surface)
README "How throughline differs from Anthropic" section rewritten to address all three Anthropic features that overlap our space: chat memory (March), Cowork persistent agent thread (April GA), and our Reflection Layer. Three-line dichotomy:
- Claude memory remembers what you said.
- Cowork executes tasks for you.
- throughline knows what you stopped thinking about.
Fix: c68f46a. Tests 1823 passed / 3 skipped.
Added — save_refined_card MCP tool (zero-LLM-cost save path) (2026-04-29)¶
Real Claude Desktop dogfood revealed: save_conversation charges
~$0.04 per save via daemon's OpenRouter Sonnet refining call. For
a user paying for a Claude subscription, that's double-billed and
dramatic friction — "I'm using Claude already, why is throughline
charging me again?" Bad OSS UX.
New tool resolves the economics. Decision flow:
1. User says save / remember / 记住 / 保存
2. Host LLM (Claude / etc.) synthesizes 6-section card from
conversation context using its own subscription budget
3. Host calls save_refined_card with title / body / domain /
knowledge_identity
4. Tool atomically writes the .md to <vault>/<domain>/<title>.md
5. Done — daemon never invoked, $0 to user
Both save tools share frontmatter shape: cards from either path are identical to recall_memory + Reflection Layer. The only difference is who paid for the LLM synthesis work.
Saved cards get managed_by: "host_llm_refined" in frontmatter
so daemon-side tooling can distinguish provenance from the
existing refine_thinker_daemon_v9 source.
save_conversation (the legacy paid path) is deregistered from
the MCP surface (commit 866e94a) but still ships as an
importable function for direct callers (bulk import scripts that
queue raw .md to the daemon's watch directory). Removing it from
MCP eliminates the docstring tie-break problem where Claude could
accidentally pick the paid path.
OpenWebUI Filter form COMPLETELY UNAFFECTED. Filter has its own exporter → RAW_ROOT → daemon-watchdog flow that bypasses MCP.
Fix: 604e3e4 (new tool) + 866e94a (deregister save_conversation)
+ 58100a4 (docstring tuning to bias host LLM choice). 17 new tests
covering input validation, atomic write, filename collision
handling, frontmatter shape, server registration.
Added — MCP slash-command prompts (overview / loose_ends / save_card) (2026-04-29)¶
MCP servers can expose prompts/list entries that surface in
host LLM clients as /<server>:<prompt> slash shortcuts. Three
prompts ship:
/overview— quick vault state. Triggersthroughline_status- summary + next-step menu.
/loose_ends(renamed from/threads) — surface unfinished thinking. Triggersfind_loose_endswith limit=5 + numbered list format./save_card— synthesize current conversation into 6-section card on subscription budget, then callsave_refined_card(zero LLM cost).
Each prompt is a docstring-described function returning the
expanded chat message. Hosts pick them up via prompts/list.
Note: Claude Desktop's UI surfaces these in attachment menus rather than slash autocomplete (a UI choice on their end). Claude Code CLI surfaces them as slash commands directly.
Fix: f7818a3.
Added — throughline_status MCP tool (discovery entry point) (2026-04-28+2)¶
The 7th MCP tool, closing a UX gap surfaced when reasoning through cold-start flow. The previous 6 tools each have specific call conditions (save_conversation when user says 'save', recall_memory when user asks about a topic, etc.). None of them are the natural "tell me about my throughline" entry point. A fresh-install user with 0 cards had no discoverable trigger for any of the 6 — Claude had no signal that the system existed beyond the tool list.
The new tool returns a snapshot of the install: card_count,
domain_count, vault_root, last Reflection Pass timestamp + staleness
flag (mirroring the doctor 14d threshold). Three status escalations
attach actionable _message hints when relevant:
card_count == 0→cold_start+ 'remember this' / '保存这个' hint pointing at save_conversation- cards exist but no Reflection Pass yet →
warning+ hint pointing at the auto-schedule templates from the previous entry - Reflection state >14d →
warning+ re-run hint
No LLM calls, no network. Pure local file reads + reuse of list_topics's 60s vault-scan cache. Sub-millisecond on small vaults.
Strong "Call this when:" docstring covers the natural triggers
(general mentions of "knowledge base" / "my vault" / "what's in my
throughline?", post-install "I just set this up", reflection-state
queries). Fix: f11cd17. 8 new tests + docs/MCP_SETUP.md tool
table updated 6 → 7.
Added — daemon stage 1.6: dedup buffer/translation twins by slice_id (2026-04-28+2)¶
The refine daemon emits each refined slice twice in the default
vault layout: once at route_to (canonical destination) and once
in 00_Buffer/00.03_Refined_Notes/ (staging). Both copies share
the same slice_id. Without dedup they entered clustering as
siblings, ran through stages 4/6/7 twice, and the contradiction
judge spent LLM calls confirming they say the same thing —
correctly classified as agreement but each judgment cost $.
Discovered during 2026-04-28 real-vault E2E: maintainer's vault had 72 reflectable cards but only ~36 unique slice_ids; the other 36 were buffer twins inflating cluster sizes by ~2× and burning duplicate LLM calls in stages 4/6/7.
New stage 1.6 (between filter_reflectable and cluster) calls
_dedup_by_slice_id. For each slice_id group with >1 card:
- Cards whose parent dir matches route_to (daemon-canonical
destination) win over buffer copies.
- Among routed candidates, longest-body card wins (more material
for stage 4 back-fill).
Cards without slice_id (managed_by master profiles) are kept as-is
— they have no daemon-emitted twin. Stage report shows the dedup
count when > 0:
filter_reflectable (36 kept / 2405 excluded — logs/indexes/drafts;
36 dedup'd by slice_id)
Fix: 877788b. 7 new tests covering: unique-slice-id pass-through,
no-slice cards preserved, routed-wins-over-buffer, longest-body
fallback, three-card group, Windows backslash path normalization,
and mixed slice/no-slice cards. Non-destructive on already-committed
vault data — cards previously written via commit-writeback retain
their frontmatter; the next pass just operates on the dedup'd subset.
Added — Reflection Pass auto-schedule templates + doctor staleness check (2026-04-28+2)¶
The Reflection Pass was previously a manual command. A typical user would install throughline, set up MCP, save some conversations, and then never run the pass — leaving the 3 Reflection Layer MCP tools permanently in "has not run yet" state.
3 new service templates (mirroring the existing refine-daemon / rag-server pattern):
config/launchd/com.example.throughline.reflection-pass.plist— macOS, fires Sunday 3 AM viaStartCalendarInterval. Same env shape as the refine-daemon template (OPENROUTER key reused).RunAtLoad=falseso login doesn't burn LLM cost.config/systemd/throughline-reflection-pass.service— Linux oneshot unit. Runs the pass with all four LLM stages enabled. Reuses the sameopenrouter.envfile from the refine-daemon unit.config/systemd/throughline-reflection-pass.timer—OnCalendar=Sun *-*-* 03:00:00,Persistent=trueso a missed run catches up on next boot.
Templates do NOT pass --commit-writeback. The pass writes state
files + a preview JSON; vault frontmatter stays untouched. Users
who want metadata in their cards run the same command interactively
with --commit-writeback once.
Cost-conscious tweaks documented in template headers: drop
--enable-llm-contradictions to halve cost (O(n²) per cluster);
drop all four --enable-llm-* flags for free-only stages.
Doctor staleness check (mcp_server/doctor.py): [ok] flips
to [warn] when a Reflection state file is older than 14 days
(twice the recommended weekly cadence). The warning hint references
both manual re-run and auto-schedule install — making the
schedule's existence discoverable via --doctor.
Documentation:
docs/DEPLOYMENT.md— new "Optional: Reflection Pass" subsection with template paths, cadence table, cost estimates, and the--commit-writebackdecision flow.docs/REFLECTION_LAYER_USER_GUIDE.md— new "Auto-schedule" block under "Periodic re-run" with verify / on-demand commands per OS.
Test coverage: 1 new test in test_mcp_server_doctor_and_wait.py
exercising the staleness threshold (file backdated 20 days → warn
+ "reflection_pass" hint in output). Full suite 1839 passed,
1 skipped.
Added — cold-start UX hints in recall_memory + list_topics (2026-04-28+2)¶
When a fresh-install user connects an MCP client (Claude Code,
Cursor, Continue.dev) but hasn't saved any conversation yet, the
previous behavior of recall_memory and list_topics was to
silently return zero results. Claude would say "I couldn't find
anything" and stop — no signal to the host LLM that the vault is in
cold-start state.
The 3 Reflection Layer tools (find_open_threads, check_consistency,
get_position_drift) already emit a state-file hint pointing the
user at python -m daemon.reflection_pass. The 2 retrieval tools
that a user actually hits first were the silent gap.
Now both tools attach a _message field on zero-result responses
that teaches the host LLM to suggest 'remember this' / '保存这个'
/ '记住这个' so save_conversation fires. Status stays ok (not
an error condition). list_topics only emits the hint when
include_card_counts=True to avoid false-positive teaching when
the caller opted out of the vault scan. Fix: 0cb29a6. 5 new
tests covering zero/nonzero/no-counts paths.
Validated — Phase 2 Reflection Layer first real-LLM E2E (2026-04-28+2)¶
First end-to-end pass with real LLM calls against the maintainer's
production vault. All 8 stages of daemon.reflection_pass ran
successfully on a 2477-file Obsidian vault, validating the full Phase 2
pipeline beyond synthetic-fixture testing.
Pipeline numbers:
- 2477 markdown files scanned → 72 reflectable (filter dropped 2405
logs / drafts / indexes)
- 72 cards clustered into 24 topics @ similarity threshold 0.70
(matches the gate experiment best score)
- 23 of 24 clusters successfully named by LLM (1 sanitizer reject —
see Known issues below)
- 72/72 cards back-filled with claim_summary and open_questions
- 42 cards flagged with unresolved questions
- 237 contradiction-judge pairs evaluated, 0 contradictions detected
(taxonomy agreement / orthogonal / evolution working;
validates precision — recall would need a deliberate-conflict test
set)
- 35 drift phases computed across 24 clusters; per-cluster
drift_kind classifications correctly identify temporal evolution
- 72 cards' worth of frontmatter additions visible in writeback
preview JSON; vault was NOT mutated (--commit-writeback flag
defaults OFF, schema verified non-conflicting)
Cost: ~$0.18 across all LLM stages (gemini-2.5-flash via OpenRouter). Wall time: 8 minutes.
Significance: Per-project history this is the first real LLM API call from any Phase 2 daemon stage. Prior validation was mock-tested; quality of synthetic outputs is now confirmed against real user data.
Sample quality (cluster names): personal_medication_regimen
(largest, size 17) · veo_imagen_prompt_design (size 13) ·
australian_immigration_491 · tailscale_exit_node_troubleshooting
· venlafaxine_xr_missed_dose — all sensible snake_case matching
the cluster's actual content.
Sample quality (open_questions): specific actionable follow-ups, not generic "what is X" — e.g. for the PS5/HDMI card: "Does Topology A introduce noticeable input lag for competitive gaming?" "How to verify the actual passthrough bandwidth when manufacturer only states 'HDMI 2.1' without 4K/120Hz/VRR/ALLM specifics?"
Known issues surfaced and addressed in follow-up commits:
- Cluster name sanitizer rejected
3d_modeling_methodsbecause_VALID_NAME_RErequired leading letter; loosened to allow digit prefix (snake_case identifier conventions allow it; schema docs do not require letter-start). Fix:2980244. - Contradiction judge
DEFAULT_MAX_TOKENS=200truncated 1/237 responses with verbosereasoning_diff; raised to 600. Fix:e03b4c1. - Contradiction judge had no retry on transient TCP timeouts (4/237
affected); added
_urlopen_with_retryhelper distinguishing 4xx (permanent) from 5xx/URLError/timeout (retry with exponential backoff). Same commite03b4c1. Re-run of the E2E pass after these fixes shipped landed 0 errors across 242 pair judgments (24/24 cluster names, 0 truncations, 0 unhandled timeouts).
Outstanding (deferred to a follow-up commit): the maintainer's
vault stores ZH/EN translations of the same conversation as
separate cards (20260415refined_<title>.md paired with
<title>__<hash>.md), causing duplicated work in stages 2/4/6.
Upstream dedup heuristic (same hash suffix + similarity > 0.95)
is the natural fix.
Run artifacts (gitignored; private to maintainer):
private/dryrun_2026-04-28/ — FINDINGS.md plus 7 state JSON
files plus run logs. Useful as a real-world reference when iterating
on prompts, sanitizers, or schema.
Added — Phase 2 Reflection Layer overnight wave (2026-04-28+1)¶
12 commits extending the Phase 2 Reflection Layer with:
Stages 6 + 7 LLM enrichment (mock-tested; real LLM gated on working API key):
- Stage 6 contradiction judgment (
aaca4de):mcp_server/llm_judge.py(judge_pair) + reflection_pass_stage_detect_contradictions+ reflection_contradictions.json state file. Conservative system prompt —is_contradiction=trueonly for direct_reversal, NOT for evolution / scope_narrowing / agreement / orthogonal.check_consistencyMCP tool gracefully filters to actually-contradicting cards when stage 6 has run; falls back to all-cluster-positions otherwise. - Stage 7 drift segmentation (
ee5e07f):mcp_server/llm_drift_segmenter.py+ reflection_pass_stage_compute_drift+ reflection_drift.json. Topic-leveldrift_kindclassification (healthy_evolution / drift_without_reasoning / following_trends / mood_swings / unsegmented). Per-cluster phase segmentation — each phase gets phase_name, stance, started/ended dates, transition_reason, card_paths.get_position_driftMCP tool returns per-phase trajectory when stage 7 has run; per-card otherwise.
Real frontmatter writeback (13bc812):
daemon/writeback_commit.py implementing the 2026-04-28+1
architectural decision (hybrid):
position_signal+open_questions→ surgical text append to existing frontmatter. Existing keys NEVER overwritten. No PyYAML round-trip = zero formatting drift on user customizations (quotes, comments, trailing whitespace, key order).reflection.*→ sidecar JSON file<card_dir>/.<card_name>.reflection.json. Daemon-managed metadata refreshed every pass without ever touching the frontmatter block.- Atomic write:
NamedTemporaryFile+os.replacein same dir (Windows + cross-device safe). - Backup:
<card_dir>/.<card_name>.backup-<unix_timestamp>before any mutation. Daemon never auto-deletes. - Idempotency: re-running with same data is a true no-op.
CLI: --commit-writeback flag (default OFF). Without it, only
preview JSON is written; real writeback gated.
--no-writeback-backup to skip backups when user has git tracking.
Diagnostic CLIs:
--inspectsubcommand (e6091f8):daemon/reflection_inspect.pypretty-prints summaries of all state files. Per-file: presence ✓/✗ inventory, file sizes, age ("3h ago"), per-summarizer extraction of relevant fields (cluster sizes / open thread samples / writeback diffs).--explain CARD_PATHsubcommand (e2bdaeb):daemon/reflection_explain.pydumps everything the daemon thinks about ONE card — cluster membership, sister cards, back-fill cache, open-thread status, what writeback would add. Useful when an MCP tool returns surprising results and operator wants to see why.
Refactor — centralized state paths (7639a60):
daemon/state_paths.py consolidates seven default_*_file()
helpers + card_timestamp() chronology resolver. Reserves
paths for stage 6/7 outputs (reflection_contradictions.json,
reflection_drift.json) ahead of their implementation.
all_state_files() returns mapping for diagnostics tools.
Doctor extension (002f66b):
mcp_server/doctor.py adds Phase 2 Reflection Layer state
section. Reports per-file presence + size + age. When all state
files are missing, single consolidated warn line with fix hint.
Documentation:
- NEW:
docs/RUNTIME_STATE_FILES.md(~400 lines) — every state file's writer, readers, refresh cadence, JSON schema, sample payload. Plus architecture diagram showing daemon → state files → MCP tools dataflow. (e03afde) - NEW:
docs/REFLECTION_LAYER_USER_GUIDE.md(~400 lines) — user-facing companion to design + schema docs. TL;DR table, per-tool sections with sample conversations, three-step setup walkthrough, cost expectations by vault size, "what NOT to expect" calibration, when-to-call-which-tool matrix. (892c8cd) - UPDATED:
docs/MCP_SETUP.md— tool table 3 → 6 with new Phase 2 trio. (e03afde) - UPDATED:
docs/FAQ.md— top section "What is the Reflection Layer? How is it different from chat memory?" with side-by-side comparison and one-line distinction. (e03afde) - UPDATED:
README.md— Reflection Layer mermaid diagram showing third pipeline. (892c8cd) - UPDATED: per-tool docstrings (
c325b2f) — 4 example trigger conversations + "what to do with the result" guidance per tool, helping host LLMs fire at the right moment.
Test depth:
- 29 edge-case tests (
1639828): enormous body / only-emoji header / mixed-language / malformed YAML / state files with invalid UTF-8 / Chinese statement matching Chinese cluster. - 7 end-to-end integration tests (
545a0bd): synthetic vault → full pipeline → MCP tools. Includes namer-cache-on-rerun test confirming no double LLM calls. run_pass(use_default_state_paths=True)— programmatic callers now get same default-path behavior as CLI.
Test counts: 1697 → ~1860 passes net across the wave (count varies ±30 by pytest collection).
LLM cost on author's vault (when all stages enabled): - Stage 3 cluster naming: ~$0.0004 - Stage 4 back-fill: ~$0.01 - Stage 6 contradiction: ~$0.002 (~50 pairs) - Stage 7 drift: ~$0.005 (~6 multi-card clusters) - Total full pass: ~$0.017 / ¥0.12. Cache makes re-runs near-zero.
0 vault file mutations in this wave — real writeback
infrastructure shipped but --commit-writeback flag is OFF
by default. 0 real LLM calls fired during development; all
tests use mocked clients.
Added — Phase 2 Reflection Layer (2026-04-28)¶
The Reflection Layer ships in incremental commits behind opt-in
flags. All three MCP tool surfaces are real implementations
reading state files written by the Reflection Pass daemon. No
vault file mutations — frontmatter writeback is preview-only
in this batch and lands in a follow-up commit with explicit
--commit-writeback gating.
Engineering gate: clustering accuracy ≥75% pairwise on the maintainer's vault (≥2,300 cards) — cleared 2026-04-28 at 0.975 (best threshold 0.70).
New daemon module: daemon/reflection_pass.py orchestrates
an 8-stage pass — load + frontmatter parse → reflectable filter
(slice_id or managed_by) → bge-m3 clustering → cluster name
canonicalization (LLM, opt-in --enable-llm-naming) → Path A
back-fill (LLM, opt-in --enable-llm-backfill, claim_summary +
open_questions) → open-thread detection (structural,
token-overlap, conservative threshold) → contradiction detection
(stub) → drift segmentation (stub) → writeback preview (no
vault mutation).
New MCP tools (real impls replacing stubs):
find_open_threads(topic?, limit=5)— surfaces unfinished reasoning. Readsreflection_open_threads.json.check_consistency(statement, soft_mode=True)— finds best-overlap cluster, returns historical positions as candidate contradictions. Host LLM does the soft-mode framing in conversation. Readsreflection_positions.json.get_position_drift(topic, granularity='transitions')— chronological trajectory of cards on the topic. Readsreflection_positions.json.
State files under $THROUGHLINE_STATE_DIR/:
reflection_pass_state.json— per-pass watermarkreflection_cluster_names.json— cluster_signature → name cachereflection_backfill_state.json—path|mtime→ essence cachereflection_open_threads.json— surfaced open threadsreflection_positions.json— comprehensive position databasereflection_writeback_preview.json— what would be written
New supporting modules:
daemon/card_body_parser.py— bilingual section parser (English + Chinese-emoji-English + Chinese-only headers). Real-vault smoke: 80.4% of frontmatter cards / 100% of slicer-output cards have at least 1 known section.daemon/open_threads.py— CJK bigram + English unigram tokenizer, token-overlap question-resolution heuristic.daemon/writeback.py— frontmatter-addition assembler; preview-only.mcp_server/llm_namer.py— stdlib HTTP client for cluster naming.OPENROUTER_API_KEY→OPENAI_API_KEYenv-var fallback.mcp_server/llm_extractor.py— stdlib HTTP client for Path A back-fill, same env-var conventions.mcp_server/position_state.py— shared state-file readers + cluster-matching helpers used bycheck_consistencyandget_position_drift.
Public docs:
docs/POSITION_METADATA_SCHEMA.md— schema reference + the 2026-04-28 vault-format addendum calibrated against the maintainer's real vault.docs/REFLECTION_LAYER_DESIGN.md— public-facing rationale + side-by-side comparison with Anthropic's chat-memory feature.
LLM cost on the maintainer's 72-card reflectable subset:
- Stage 3 cluster naming: ~$0.0004 per pass (~24 clusters × gemini-2.5-flash)
- Stage 4 back-fill: ~$0.01 per pass (one-time; cache dedupes)
- Total: ~$0.01 per full pass. Re-runs essentially free.
Test coverage: 1,455 → 1,697 tests pass (net +242 across the Phase 2 commit series). Mock LLM clients throughout — zero real API calls fired during tests.
Changed — Phase 1.5 PyPI split (2026-04-28)¶
throughline-mcpis now its own PyPI package. Newmcp_server/pyproject.tomldefines an independentthroughline-mcppackage built from the same git repo. Install becomes a single line once published:pip install throughline-mcp(auto-pullsthroughline >= 0.2.0+fastmcp >= 0.4).- The parent
throughlinepackage no longer bundles themcp_serverPython package directly — it's excluded fromsetuptools.packages.findto prevent two wheels claiming the same files. Verified via build inspection: the throughline-0.2.0 wheel contains 0 mcp_server files; the throughline_mcp-0.1.0 wheel contains all 11 of them. - The
pip install throughline[mcp]extras flag still works for backward compat — its dependency rewires tothroughline-mcp >= 0.1(wasfastmcp >= 0.4), so users running the extras command get the new package transitively. - New
throughline-mcpconsole script:throughline-mcpinvokesmcp_server.__main__:maindirectly. Claude Desktop / Continue.dev configs can use just"command": "throughline-mcp"instead of"command": "python", "args": ["-m", "mcp_server"]. - Same git repo, same source tree, no file moves. Phase 1.5 is pure packaging metadata.
Fixed — adapter→daemon H1/H2 role-marker contract (2026-04-28)¶
throughline_cli/adapters/common.py:render_markdownwas emitting H1 capitalised (# User/# Assistant); the daemon's_MSG_SPLIT_REparser atdaemon/refine_daemon.py:853only matches H2 lowercase (^## (user|assistant)\s*$). Result: every ChatGPT / Claude / Gemini export imported through the adapter path silently produced raw .md files the daemon couldn't parse — slicer saw zero messages, conversations refined to nothing.- Fix:
render_markdownnow writes## user/## assistant, matching the daemon's actual parser. Existing test that asserted the H1 format (locking in the bug) updated to assert H2. - Regression suite: new
fixtures/v0_2_0/test_adapter_to_daemon.pywith 7 tests that round-trip through the actualrender_markdown→_parse_messagespath. Catches future drift immediately. - User-facing impact: any raw .md files written by the broken
adapter (if any) need re-import to parse correctly. The simplest
path is rerunning
python -m throughline_cli import <source>.
Added — MCP server (Phase 1, 2026-04-27)¶
mcp_server/package — Phase 1 complete. New top-level Python package alongsidedaemon//rag_server//throughline_cli/, exposing three tools over MCP stdio so any MCP-aware host (Claude Desktop / Claude Code / Cursor / Continue.dev / etc.) can reach the throughline vault without OpenWebUI in the loop. Setup atdocs/MCP_SETUP.md.save_conversation(text, title?, source, wait_for_refine?)— writes a timestamped .md to$THROUGHLINE_RAW_ROOT/YYYY-MM/in the daemon's expected## user/## assistantH2 format, with defensive turn-shape coercion handling 4 input shapes (H2 canonical / H1 capitalised /User:prefix / free prose). Daemon's existing watchdog picks up automatically. 25 tests.recall_memory(query, limit?, include_personal_context?, domain_filter?)— HTTP client to localhost rag_server/v1/rag(orTHROUGHLINE_RAG_URLoverride). Three typed exceptions for clear error UX: server unreachable (most common: rag_server not running), server-side error, malformed response. Maps rag_server's response to the documented MCP shape. Honorsinclude_personal_context=Trueby settingpp_boost=1.0and surfacing personal_persistent cards as a concatenated string. Domain filter applied client-side via X-axis tag prefix match. 27 tests.list_topics(prefix?, include_card_counts?)— reads activedaemon.taxonomy.VALID_X_SET(33 default domains; user override atconfig/taxonomy.pyhonoured via the daemon's existing resolution chain) + optionally walks vault for per-domain counts. 60s in-process cache to avoid re-walking on every call. 23 tests.pyproject.tomladds[mcp]extras (fastmcp >= 0.4):pip install -e .[mcp].python -m mcp_serveris the stdio entry point; absentfastmcpinstall it prints a clear hint and exits 1 (locked decision Q2: fail-with-message, never auto-install).- No existing code modified — Phase 1 is purely additive (~770 LOC code + ~1,000 LOC tests). 1260 → 1372 tests.
- The architectural decisions + 6 locked design choices that
drove this work confirmed 99% of existing 33,700 LOC is shared
core that needs zero changes for the MCP form.
Phase 1.5 (post-dogfood, ~½ day) splits
mcp_server/into an independentthroughline-mcpPyPI package from the same git repo for cleaner one-line install (pip install throughline-mcp).
Added — post-v0.2.0 ship-and-iterate wave (2026-04-26)¶
- LanceDB is now a first-class
VECTOR_STOREbackend alongside Qdrant + Chroma — embedded, file-based, zero-server (closes #6).pip install throughline[lancedb]._LanceDBUnavailablestub returned when the optional dep is missing so the wizard can list it without an import-time crash. 7 tests (TestLanceDBStoreWithoutLancedb+TestLanceDBStoreWithFakeLancedb). - sqlite-vec is a first-class
VECTOR_STOREbackend (closes #11) — single SQLite file + thesqlite-vecloadable extension, the lightest-weight credible backend. Runs on a Raspberry Pi; sqlite3 is stdlib so the only dep issqlite-vecitself. Two-table schema (vec0 virtual table for embeddings + companion meta table mapping rowids back to string ids + JSON payloads)._SqliteVecUnavailablestub when missing. 7 tests using a sqlite3.connect proxy that emulates vec0's MATCH operator with Python-side L2 ranking. - DuckDB-VSS is a first-class
VECTOR_STOREbackend (closes #10) — embedded analytical SQL + vector search in a single .duckdb file. Best fit when DuckDB is already in the analytics stack. Single-table schema per collection (id VARCHAR PK, vector FLOAT[N], payload JSON); upsert viaINSERT ... ON CONFLICT (id) DO UPDATE; search viaarray_distance()ORDER BY dist LIMIT k. VSS extension auto-installed/loaded on connect._DuckDBVSSUnavailablestub whenduckdbis missing. 7 tests via fakeduckdbmodule with a minimal SQL parser. - U27.5 (lite): pending-candidates surface in doctor. New
taxonomy.pending_candidates_count()helper +check_taxonomy_pendingdoctor check. Warns when growth candidates are pending review (with a pointer topython -m throughline_cli taxonomy review); ok otherwise. The U27 loop's value collapses if users never run the review command; the doctor is the obvious place to remind them. 9 tests. - U27.7 (lite): zero-usage leaf detection. New
taxonomy.detect_zero_usage_leaves()helper +taxonomy zero-usageCLI subcommand. Walks the vault, intersectsprimary_xfrontmatter againstVALID_X_SET, and lists leaves with no cards as deprecation candidates. Read-only — actual deprecation is a manualtaxonomy.pyedit, mirroring the cautious philosophy of U27.4 (the user always signs off on taxonomy mutations). 8 tests. - Voyage + Jina rerankers now ship as real
RERANKERbackends alongside Cohere — no longer alias to Cohere. Both follow the same{index, relevance_score}shape;VoyageRerankerdefaults torerank-2-lite,JinaRerankertojina-reranker-v2-base-multilingual(deliberately multilingual-default given the project's Chinese-first heritage). Standard env vars:VOYAGE_API_KEY/JINA_API_KEY, with*_BASE_URLoverrides for proxies. Both fall through to SkipReranker on missing key — graceful degrade. +10 tests. - pgvector is a first-class
VECTOR_STOREbackend (closes #9) — Postgres + the pgvector extension. The only server-based backend in the embedded-alternates set; useful when the team already operates Postgres and wants vectors in the same DB. Connection viaPGVECTOR_DSNenv (falls back toDATABASE_URL); per- collection table(id TEXT PK, vector vector(N), payload JSONB); HNSW index withvector_cosine_ops(auto-falls-back to IVFFlat on older pgvector). Upsert viaINSERT ... ON CONFLICT (id) DO UPDATE; search viavector <=> %s::vector ORDER BY dist LIMIT k._PgVectorUnavailablestub whenpsycopg(v3) is missing. 8 tests via fakepsycopgmodule. All four originally- aliased v0.3 backends (lancedb / sqlite_vec / duckdb_vss / pgvector) are now first-class in v0.2.x — onlynoneremains as an alias-to-qdrant placeholder. throughline_cli refine --dry-run <path.md>— zero-cost refiner-prompt preview. Parses a raw conversation, reports which slicer tier WOULD fire + which model, and prints the refiner system + user prompts as they'd be sent — without calling any LLM.--show-full-prompt/--pack NAME/--no-color. Refuses to run without--dry-run(there's no real-refine CLI path in v0.2.x — daemon handles that). 11 tests.- Config schema validation + doctor check + CLI.
config.validate()surfaces typos (dailey_budget_usd), enum drift (privacy = "cloudmax"), type mismatches, and unknown provider IDs. Newcheck_config_schemadoctor check warns (not fails) per issue with Levenshtein-based suggestions. Runtimeconfig.load()behaviour UNCHANGED — validation is surfaced on demand. Newpython -m throughline_cli config [validate | show | path]subcommand for standalone use + CI linting (with--jsonoutput and a custom PATH argument). 33 tests. - Tier 2 additions from the A-J backburner wave:
throughline_cli uninstall— tear down config / state / logs / raw, vault untouched, with--dry-run/--yes/--drop-collection.throughline_cli anthropic_adapter— native /v1/messages shape for Anthropic-direct users. Translates to/from OpenAI- compatible shape the rest of the codebase expects.throughline_cli cost— LLM spend dashboard (today/week/month/all).throughline_cli stats— vault + taxonomy + lifetime-cost summary (screenshot-friendly).docs/FAQ.md— 15 recurring questions (differentiation vs ChatGPT memory / Claude Projects / mem0 / OpenWebUI memory).docs/THREAT_MODEL.md— asset inventory + threat actors + defences + explicit scope cuts + hardening recommendations.prompts/README.md— 209-line contributor guide for the 8 refiner prompt variants + how to add a new-language pack.refine_kept_slices()extraction fromprocess_raw_file— the per-slice refine loop is now a testable pure-ish function. +5 tests without watchdog setup.
Milestone¶
- Repo flipped PUBLIC — https://github.com/jprodcc-rodc/throughline. First time the project is visible to anyone on the internet.
- Docs site live — https://jprodcc-rodc.github.io/throughline/.
mkdocs-material, auto-deploys on push via
.github/workflows/docs.yml.
Changed — README polish (2026-04-25)¶
- New tagline: "Stop re-explaining yourself to every new chat."
- Status block rewritten to drop "for the author" signal that implied nobody else could use it.
- Front-page mermaid replaced with a before/after text pair; mermaid retained for the later Architecture section.
- Quickstart moved up to section #2 (after "What it does").
- Comparison table trimmed from 8 dimensions to 5 (mobile-friendly).
- Card example swapped from PyTorch MPS → keto-rebound for a more emotional, personal-context-capturing demo (still real data from the bundled sample export).
- Badges trimmed from 5 to 3 (test + license + python).
- Phase 6 regression section moved from README to
docs/TESTING.md; README now carries a one-line pointer instead.
Added¶
docs/TESTING.md— regression suite overview, Phase 6 gate historical record, contributor test conventions.
Added¶
- U28 · multi-provider LLM support — new
throughline_cli/providers.pyregistry with 16 OpenAI-compatible presets: Global (OpenAI, Anthropic via OpenAI-compat shim, DeepSeek, Together, Fireworks, Groq, xAI, OpenRouter), China-market (SiliconFlow 硅基流动, Moonshot/Kimi, DashScope Alibaba Qwen, Zhipu GLM, Doubao 字节豆包), Local (Ollama, LM Studio), plus a generic OpenAI-compatible escape hatch. Each preset is a(base_url, env_var, signup_url, model_list, extra_headers, region)tuple. Data-driven; new providers = one dict entry. llm.pygainsprovider_id=kwarg; endpoint + key + extra headers resolved from the preset. Unknown provider_id falls through to legacy chain (no crash). Error messages cite the provider's specific env var + signup URL.- Wizard steps 4 + 5 split: step 4 picks provider backend (auto- defaults to whichever env var is set, ● marker next to configured providers); step 5 picks a model SCOPED to that provider's list.
- New
throughline_cli/active_provider.pyresolves the active provider for NON-wizard callers (daemon, scripts) with precedenceTHROUGHLINE_LLM_PROVIDERenv >llm_providerin config.toml > autodetect > "openrouter". Never raises. - Daemon
call_llm_json()reads resolved endpoint + key at module load; provider-specific extra headers merged without clobbering daemon's X-Title. LegacyOPENROUTER_URLstill honoured. Startup log now showsLLM PROV = <id> -> <url>. doctorgainscheck_llm_provider_key: verifies the resolved provider's env var is set; warns (not fails) when missing so fresh installs stay green.-
57 new tests:
test_providers.py(32),test_llm_providers.py(13),test_active_provider.py(12). -
Open-source-project hardening: GitHub Actions CI (pytest 3.11 + 3.12, ruff lint), CodeQL weekly scan, Dependabot for pip + github-actions, branch protection ruleset, repo metadata + 16 topics, 3 seeded
good first issuetickets, YAML-form issue templates, PR template,SECURITY.md,CODE_OF_CONDUCT.md,ROADMAP.md,pyproject.tomlpackage skeleton with optional-dep extras (local/openai/chroma/all/dev) + console-script entry points (throughline-{install, import, taxonomy, doctor}). - UX wave (post-v0.2.0):
python -m throughline_cli doctor— 10-check health probe (Python / imports / config / state / services / caches) with remediation hints,--quietand--jsonmodes.python -m throughline_cli import sample— bundled 10-conversation synthetic export atsamples/claude_sample.jsonlso users can see the loop without their own export.python -m throughline_cli --version/-V/version— print package version.__version__resolved fromimportlib.metadata, falling back to a literal for source checkouts.- Wizard end-of-flow next-steps panel — mission-tailored copy-paste commands for rag_server, daemon, and Filter install.
- Wizard step 13 cost preflight — explicit
ask_yes_no("Run the preview?")gate before the ~$0.01 LLM call. - README polish — comparison table vs mem0 / Letta / SuperMemory / OpenWebUI memory; before/after card example; Mermaid architecture diagram replacing the ASCII flow.
- Documentation:
CONTRIBUTING.mdexpanded — dev setup, claim-issue flow, commit conventions, house style.docs/DEPLOYMENT.md— new "Quick install (via wizard)", "Pluggable backends", "Diagnostics" sections; Windows note upgraded to tier 1 for dev + wizard + tests.docs/ARCHITECTURE.md— new §13 covers v0.2.0 additions (U12/U20/U21 abstractions, U23 dials, U27 growth loop, U3 budget, doctor surface) without rewriting §1-12.docs/DESIGN_DECISIONS.md— entries 10-13 capture v0.2.0 design calls (aliased backends,proposed_x_idealas separate field, dial defaults render to empty string, three-state doctor reporting).docs/ALPHA_USER_NOTES.md— v0.2.0 update section: which deferrable rough edges got fixed + 5 new UX edges surfaced.
Changed¶
- Provider-agnostic front door. README +
docs/DEPLOYMENT.mdrewritten so OpenRouter is listed alongside 15 other providers rather than the default. README's provider table regrouped by use case (Direct / Hosted open-weights / China / Multi-vendor proxy / Local / Escape hatch). Prose now says "no preferred vendor — wizard auto-detects whichever env var you already have set." ExistingOPENROUTER_API_KEYusers keep working with zero friction; no behaviour change. - Repo description + topics refreshed — reflects the 16-
provider story. Topics now: 16 entries including
anthropic,openai,deepseek,siliconflow,ollama,local-first.
Fixed¶
- Daemon LLM calls ignored the wizard's provider choice.
Before
9536ba0,daemon/refine_daemon.call_llm_json()hard- coded OpenRouter. A user who picked SiliconFlow in the wizard would see the preview hit SiliconFlow, then watch the real refine daemon keep hitting OpenRouter. Now module-load reads throughthroughline_cli.active_provider.resolve_endpoint_and_key(). - Fresh-clone pip install silently failed on Chinese-locale
Windows.
requirements.txthad UTF-8 em-dashes in comment banners; pip on GBK/cp936 locales couldn't decode them and returned exit 0 anyway. Fixed: pure ASCII + inline note aboutPYTHONUTF8=1for anyone hitting similar drift elsewhere. Regression test inTestRequirementsFileAscii. python install.py --helpstarted the wizard. The wizard main() ignored unknown args, so--helpsilently began a 16-step prompt flow. Added explicit help handling + unknown- flag rejection (exit 2 with usage panel).- Stale
v0.2.0-devlabels in wizard banner + step 4 text. Banner now reads dynamically fromthroughline_cli.__version__. - Daemon import on a fresh clone.
daemon/refine_daemon.pyimported four names fromdaemon/taxonomy.py(JD_ROOT_MAP,JD_LEAF_WHITELIST,normalize_route_path,is_valid_leaf_route) that the module never exported. Agit clone+python -m daemon.refine_daemonImportError'd at module load. Aliases added; regression test intest_daemon_import_surface.py. - rag_server actually uses the U12/U20/U21 abstractions. Before
the wiring commit (
3568b22),EMBEDDER/RERANKER/VECTOR_STOREenv vars set in the wizard were ignored — the server hard-coded bge-m3 + bge-reranker-v2-m3 + Qdrant. Now the env vars flip the backend end-to-end. - Error messages without remediation. Audit pass added "what to
do next" hints to user-facing failures in
scripts/ingest_qdrant.py(VAULT_PATHmissing, openai missing),daemon/pack_source_model_guard.pyCLI handler,throughline_cli/llm.pyno-API-key. scripts/ingest_qdrant.pyopenai import deferred. Was a module-loadsys.exit(1)if openai was missing; now lazy via_get_embed_client()so the module imports cleanly without the optional dep.- Ruff F + E9 dead-code sweep. 22 unused imports + 4 unused local-variable bindings cleaned up across daemon / ui / adapters / tests.
v0.2.0 — 2026-04-23¶
The v0.1.0 → v0.2.0 jump turns throughline from "clone-and-configure"
into a python install.py onboarding flow and widens the backend +
corpus envelope along the way.
Added — wizard + import (the user-facing spine)¶
- U14 · 16-step install wizard (rich-based TUI, mission-branched).
- U2 · Three import adapters: Claude export, ChatGPT export, Gemini Takeout. Dogfooded against live exports; 7 real bugs caught.
- U17 · First-card preview gate that calls the LLM live.
- U23 · 5-dial constrained preview edit (tone / length / sections /
register / keep-verbatim). Persists to
config.tomlso every daemon refine inherits it. - U4 · Privacy-consent dry-run panel at step 10 tail — explicit yes/no before data leaves the machine.
- U24 · Mission branching: Full flywheel / RAG-only / Notes-only.
- U26 · Wizard banner + between-step progress ticker.
Added — refine pipeline¶
- U15 · Tier matrix: skim / normal / deep (40× cost spread).
- U16 · Card structure options: compact / standard / detailed.
- U22 · Prompt family loader (Claude XML / generic Markdown).
- U25 · RAG-optimized card format (title + entities + 3–8 atomic claims).
- U1 · Cold-start 🌱/🌿 status line in the OpenWebUI Filter.
- U3 · Daily USD budget cap enforced by the daemon
(
THROUGHLINE_MAX_DAILY_USD>daily_budget_usdin config.toml; zero = kill switch; day rollover resets naturally).
Added — taxonomy (U13 + U27 MVP loop)¶
- U13 ·
scripts/derive_taxonomy.py— one-shot LLM derivation for users with 100+ cards. - U27.1 · Skeletal 5-domain starter (
config/taxonomy.minimal.py) for <100-card users. - U27.2 · All 8 refiner prompts emit
proposed_x_idealalongside the constrainedprimary_x. - U27.3 ·
daemon/taxonomy_observer.pyappends every refine tostate/taxonomy_observations.jsonl. - U27.4 ·
python -m throughline_cli taxonomy [review | reject]closes the growth cycle.
Added — swappable backends¶
- U12 ·
rag_server/embedders.py—BgeM3Embedder(local torch, lazy-load) +OpenAIEmbedder(stdlib urllib). Registry + alias map. - U20 ·
rag_server/rerankers.py—BgeRerankerV2M3+CohereReranker+SkipReranker. Cohere realigns rel-sorted results to input order. - U21 ·
rag_server/vector_stores.py—QdrantStore(stdlib urllib) +ChromaStore(optional dep, stub on missing install). Alias routing for lancedb / duckdb_vss / sqlite_vec / pgvector until v0.3+ ships the real drivers. rag_server.pywires through the three factories soEMBEDDER/RERANKER/VECTOR_STOREenv vars actually flip the backend end-to-end.
Added — packaging + ergonomics¶
- U5 · "Obsidian is optional" callout in README + DEPLOYMENT.
- U6 ·
bge-m3preflight section for the ~4.6 GB model download. - U8 · Uninstall scripts for mac/linux + windows.
Fixed¶
- Daemon import surface —
JD_ROOT_MAP,JD_LEAF_WHITELIST,normalize_route_path,is_valid_leaf_routeare now exported as documented aliases. A freshgit clonecan start the daemon without requiring a localconfig/taxonomy.pyoverride.
Tests¶
- 551 passed, 10 xfailed — up from 38 + 10 at v0.1.0.
Not shipped (deferred to v0.3+)¶
- U27.5 — Filter outlet "N candidates pending" hint.
- U27.6 —
taxonomy retagbatch re-refine. - U27.7 — Deprecation of zero-usage leaves + merge proposal.
- Real implementations of lancedb / duckdb_vss / sqlite_vec / pgvector (abstraction + alias routing is in place).
- Voyage / Jina / bge-reranker-v2-gemma dedicated reranker impls (currently alias to Cohere / bge-m3).
v0.1.0 — 2026-04-23¶
First public release. Working flywheel: OpenWebUI → daemon refines conversations → Obsidian-style Markdown vault → Qdrant indexing → RAG server → OpenWebUI Filter.
daemon/refine_daemon.py— watchdog-driven refine pipeline with 6-section knowledge cards, XYZ taxonomy, dedup, dual-write.rag_server/— FastAPI with bge-m3 embeddings + bge-reranker-v2-m3 cross-encoder + Qdrant retrieval + freshness / payload boosts.scripts/ingest_qdrant.py— one-shot vault → Qdrant ingest.packs/— pack-aware routing + policy override system.Filter/— OpenWebUI Filter function with status badge and forbidden-prefix guards.
Full release notes: https://github.com/jprodcc-rodc/throughline/releases/tag/v0.1.0