mirror of https://github.com/saymrwulf/KnowledgeRefinery.git synced 2026-05-18 21:20:08 +00:00

oho 38a99476d6 Knowledge Refinery: local-first semantic search & 3D concept visualization

macOS app for corpus ingestion, semantic search, and concept universe
visualization powered by local LLMs via LM Studio.

Architecture:
- Go daemon (17MB single binary, zero dependencies)
  - chi router, pure-Go SQLite, tiktoken tokenizer
  - 6-stage pipeline: scan → extract → chunk → embed → annotate → conceptualize
  - Brute-force cosine vector search in memory
  - 89 tests across 8 packages
- SwiftUI app (macOS 15+)
  - Multi-workspace management with auto-start daemons
  - Live pipeline progress, search, concept browser
  - WebGPU 3D universe renderer with Canvas2D fallback
  - Custom crystal app icon

2026-02-13 18:09:46 +01:00

4.3 KiB

Raw Blame History

Running Knowledge Refinery

Prerequisites

macOS Tahoe (26.x)
Python 3.12+
Xcode 26+
LM Studio running locally with at least one model loaded

1. Start LM Studio

Open LM Studio
Load an embedding model (e.g., nomic-embed-text-v1.5 or text-embedding-3-small)
Load a chat model (e.g., llama-3.2-3b-instruct or similar)
Start the local server (default port 1234)
Verify: curl http://127.0.0.1:1234/v1/models

2. Start the Daemon

cd daemon
source .venv/bin/activate
python -m knowledge_refinery.main

The daemon will:

Create data directory at ~/.knowledge-refinery/workspaces/<id>/
Initialize SQLite database
Connect to LM Studio
Write a PID file to {data_dir}/daemon.pid for process detection
Listen on its assigned port (default http://127.0.0.1:8742)

Tip: Use Start All in the app toolbar to launch all workspace daemons at once. Each workspace runs an independent daemon with its own port and data directory. After connection, ingestion auto-starts.

Environment Variables

Variable	Default	Description
`KR_DATA_DIR`	`~/.knowledge-refinery`	Data directory
`KR_LM_STUDIO_URL`	`http://127.0.0.1:1234/v1`	LM Studio API URL
`KR_PORT`	`8742`	Daemon port

Verify Daemon

curl http://127.0.0.1:8742/health

3. Run the macOS App

cd apps/macos/KnowledgeRefinery
swift run

Or open in Xcode:

open Package.swift

The app will:

Auto-start daemons for all workspaces on launch
Detect already-running daemons via PID files
Auto-restart crashed daemons (up to 3 times)
Show connection status in the toolbar

4. Ingest Documents

In the app, go to Volumes tab
Click Add Folder and select a directory
Go to Ingest tab and click Start Ingestion, or use Start All from the dashboard
Watch live pipeline progress in the Pipeline Progress Panel:
- Stage tracker: Each of the 6 stages (Scan, Extract, Chunk, Embed, Annotate, Conceptualize) shows a checkmark when complete or an animated progress bar when running
- Animated counters: Live tallies for chunks, vectors, annotations, concepts, and edges
- Interaction indicators: Visual status of App-to-Daemon and Daemon-to-LM Studio connections
- Activity log: Auto-scrolling log of the last 50 pipeline events
The dashboard card shows a compact spinner with the current stage name and chunk count
The 3D universe auto-refreshes every 5 seconds during ingestion, using incremental node injection

The app polls /ingest/status every 1.5 seconds during pipeline execution and automatically stops polling when the pipeline completes.

Via API

# Add a volume
curl -X POST http://127.0.0.1:8742/volumes/add \
  -H "Content-Type: application/json" \
  -d '{"path": "/path/to/documents"}'

# Start ingestion
curl -X POST http://127.0.0.1:8742/ingest/start \
  -H "Content-Type: application/json" \
  -d '{}'

# Check status (enriched response with live progress)
curl http://127.0.0.1:8742/ingest/status

The enriched /ingest/status response includes:

{
  "status": "running",
  "stage": "embed",
  "chunk_count": 142,
  "annotation_count": 87,
  "concept_count": 12,
  "edge_count": 45,
  "live": {
    "scan": {"status": "done", "progress_pct": 100},
    "extract": {"status": "done", "progress_pct": 100},
    "chunk": {"status": "done", "progress_pct": 100},
    "embed": {"status": "running", "progress_pct": 64},
    "annotate": {"status": "pending", "progress_pct": 0},
    "conceptualize": {"status": "pending", "progress_pct": 0}
  },
  "activity_log": [
    {"timestamp": "2026-02-12T10:30:01Z", "message": "Scanning 3 volumes..."},
    {"timestamp": "2026-02-12T10:30:03Z", "message": "Found 47 files, 12 new"},
    "..."
  ]
}

5. Search

Use the Search tab in the app, or:

curl -X POST http://127.0.0.1:8742/search \
  -H "Content-Type: application/json" \
  -d '{"query": "machine learning", "limit": 10}'

Troubleshooting

Daemon won't start: Check that port 8742 is free
LM Studio unavailable: Ensure LM Studio server is running on port 1234
No embeddings: Verify an embedding model is loaded in LM Studio
App can't connect: Check daemon is running on the expected port

4.3 KiB Raw Blame History