KnowledgeRefinery/docs/operational-notes.md
oho 38a99476d6 Knowledge Refinery: local-first semantic search & 3D concept visualization
macOS app for corpus ingestion, semantic search, and concept universe
visualization powered by local LLMs via LM Studio.

Architecture:
- Go daemon (17MB single binary, zero dependencies)
  - chi router, pure-Go SQLite, tiktoken tokenizer
  - 6-stage pipeline: scan → extract → chunk → embed → annotate → conceptualize
  - Brute-force cosine vector search in memory
  - 89 tests across 8 packages
- SwiftUI app (macOS 15+)
  - Multi-workspace management with auto-start daemons
  - Live pipeline progress, search, concept browser
  - WebGPU 3D universe renderer with Canvas2D fallback
  - Custom crystal app icon
2026-02-13 18:09:46 +01:00

2.6 KiB

Operational Notes

Data Locations

Each workspace has its own data directory under ~/.knowledge-refinery/workspaces/<id>/.

Item Path
Workspace root ~/.knowledge-refinery/workspaces/<id>/
SQLite DB ~/.knowledge-refinery/workspaces/<id>/refinery.db
Vector DB ~/.knowledge-refinery/workspaces/<id>/vectors/
Thumbnails ~/.knowledge-refinery/workspaces/<id>/thumbnails/
Temp files ~/.knowledge-refinery/workspaces/<id>/tmp/
PID file ~/.knowledge-refinery/workspaces/<id>/daemon.pid

Resetting

To start fresh, remove the data directory:

rm -rf ~/.knowledge-refinery

Monitoring

The daemon logs to stdout. Key log patterns:

  • Stage N: ... - Pipeline stage progress
  • Embedded batch N: X chunks - Embedding progress
  • ERROR - Errors during processing

Live Pipeline Monitoring (M8)

During pipeline execution, real-time progress is available via the enriched /ingest/status endpoint. The daemon maintains:

  • Live progress dict: Per-stage status (pending/running/done) with progress percentages
  • Counters: chunk_count, annotation_count, concept_count, edge_count
  • Activity log: 200-entry ring buffer; the last 50 events are returned via the API

The SwiftUI app polls at 1.5-second intervals and renders a full Pipeline Progress Panel with stage checkmarks, animated counters, and an auto-scrolling activity log. Polling auto-stops when the pipeline reaches idle/done state. The 3D universe auto-refreshes every 5 seconds during ingestion using mergeUniverse() for incremental node injection.

API Endpoints

Method Path Description
GET /health Health check
POST /volumes/add Add watched directory
GET /volumes/list List watched directories
DELETE /volumes/remove Remove watched directory
POST /ingest/start Start pipeline
GET /ingest/status Pipeline status
POST /search Vector search
GET /search/quick?q=... Quick search
GET /evidence/{asset_id} Get asset info
GET /evidence/chunk/{chunk_id} Get chunk details
GET /evidence/assets/all List all assets
GET /universe/snapshot Universe snapshot
POST /universe/focus Focus on node
POST /concepts/refine Refine concept
GET /concepts/list List concepts

Performance Considerations

  • Large files (>500MB) are skipped by default
  • Embedding batch size defaults to 32 (adjustable)
  • SQLite uses WAL mode for concurrent reads
  • Pipeline runs in a background thread
  • Incremental processing skips unchanged files (content hash comparison)