KnowledgeRefinery/docs/running.md
oho 38a99476d6 Knowledge Refinery: local-first semantic search & 3D concept visualization
macOS app for corpus ingestion, semantic search, and concept universe
visualization powered by local LLMs via LM Studio.

Architecture:
- Go daemon (17MB single binary, zero dependencies)
  - chi router, pure-Go SQLite, tiktoken tokenizer
  - 6-stage pipeline: scan → extract → chunk → embed → annotate → conceptualize
  - Brute-force cosine vector search in memory
  - 89 tests across 8 packages
- SwiftUI app (macOS 15+)
  - Multi-workspace management with auto-start daemons
  - Live pipeline progress, search, concept browser
  - WebGPU 3D universe renderer with Canvas2D fallback
  - Custom crystal app icon
2026-02-13 18:09:46 +01:00

4.3 KiB

Running Knowledge Refinery

Prerequisites

  • macOS Tahoe (26.x)
  • Python 3.12+
  • Xcode 26+
  • LM Studio running locally with at least one model loaded

1. Start LM Studio

  1. Open LM Studio
  2. Load an embedding model (e.g., nomic-embed-text-v1.5 or text-embedding-3-small)
  3. Load a chat model (e.g., llama-3.2-3b-instruct or similar)
  4. Start the local server (default port 1234)
  5. Verify: curl http://127.0.0.1:1234/v1/models

2. Start the Daemon

cd daemon
source .venv/bin/activate
python -m knowledge_refinery.main

The daemon will:

  • Create data directory at ~/.knowledge-refinery/workspaces/<id>/
  • Initialize SQLite database
  • Connect to LM Studio
  • Write a PID file to {data_dir}/daemon.pid for process detection
  • Listen on its assigned port (default http://127.0.0.1:8742)

Tip: Use Start All in the app toolbar to launch all workspace daemons at once. Each workspace runs an independent daemon with its own port and data directory. After connection, ingestion auto-starts.

Environment Variables

Variable Default Description
KR_DATA_DIR ~/.knowledge-refinery Data directory
KR_LM_STUDIO_URL http://127.0.0.1:1234/v1 LM Studio API URL
KR_PORT 8742 Daemon port

Verify Daemon

curl http://127.0.0.1:8742/health

3. Run the macOS App

cd apps/macos/KnowledgeRefinery
swift run

Or open in Xcode:

open Package.swift

The app will:

  • Auto-start daemons for all workspaces on launch
  • Detect already-running daemons via PID files
  • Auto-restart crashed daemons (up to 3 times)
  • Show connection status in the toolbar

4. Ingest Documents

  1. In the app, go to Volumes tab
  2. Click Add Folder and select a directory
  3. Go to Ingest tab and click Start Ingestion, or use Start All from the dashboard
  4. Watch live pipeline progress in the Pipeline Progress Panel:
    • Stage tracker: Each of the 6 stages (Scan, Extract, Chunk, Embed, Annotate, Conceptualize) shows a checkmark when complete or an animated progress bar when running
    • Animated counters: Live tallies for chunks, vectors, annotations, concepts, and edges
    • Interaction indicators: Visual status of App-to-Daemon and Daemon-to-LM Studio connections
    • Activity log: Auto-scrolling log of the last 50 pipeline events
  5. The dashboard card shows a compact spinner with the current stage name and chunk count
  6. The 3D universe auto-refreshes every 5 seconds during ingestion, using incremental node injection

The app polls /ingest/status every 1.5 seconds during pipeline execution and automatically stops polling when the pipeline completes.

Via API

# Add a volume
curl -X POST http://127.0.0.1:8742/volumes/add \
  -H "Content-Type: application/json" \
  -d '{"path": "/path/to/documents"}'

# Start ingestion
curl -X POST http://127.0.0.1:8742/ingest/start \
  -H "Content-Type: application/json" \
  -d '{}'

# Check status (enriched response with live progress)
curl http://127.0.0.1:8742/ingest/status

The enriched /ingest/status response includes:

{
  "status": "running",
  "stage": "embed",
  "chunk_count": 142,
  "annotation_count": 87,
  "concept_count": 12,
  "edge_count": 45,
  "live": {
    "scan": {"status": "done", "progress_pct": 100},
    "extract": {"status": "done", "progress_pct": 100},
    "chunk": {"status": "done", "progress_pct": 100},
    "embed": {"status": "running", "progress_pct": 64},
    "annotate": {"status": "pending", "progress_pct": 0},
    "conceptualize": {"status": "pending", "progress_pct": 0}
  },
  "activity_log": [
    {"timestamp": "2026-02-12T10:30:01Z", "message": "Scanning 3 volumes..."},
    {"timestamp": "2026-02-12T10:30:03Z", "message": "Found 47 files, 12 new"},
    "..."
  ]
}

Use the Search tab in the app, or:

curl -X POST http://127.0.0.1:8742/search \
  -H "Content-Type: application/json" \
  -d '{"query": "machine learning", "limit": 10}'

Troubleshooting

  • Daemon won't start: Check that port 8742 is free
  • LM Studio unavailable: Ensure LM Studio server is running on port 1234
  • No embeddings: Verify an embedding model is loaded in LM Studio
  • App can't connect: Check daemon is running on the expected port