mirror of
https://github.com/saymrwulf/KnowledgeRefinery.git
synced 2026-05-18 21:20:08 +00:00
macOS app for corpus ingestion, semantic search, and concept universe visualization powered by local LLMs via LM Studio. Architecture: - Go daemon (17MB single binary, zero dependencies) - chi router, pure-Go SQLite, tiktoken tokenizer - 6-stage pipeline: scan → extract → chunk → embed → annotate → conceptualize - Brute-force cosine vector search in memory - 89 tests across 8 packages - SwiftUI app (macOS 15+) - Multi-workspace management with auto-start daemons - Live pipeline progress, search, concept browser - WebGPU 3D universe renderer with Canvas2D fallback - Custom crystal app icon
4.3 KiB
4.3 KiB
Running Knowledge Refinery
Prerequisites
- macOS Tahoe (26.x)
- Python 3.12+
- Xcode 26+
- LM Studio running locally with at least one model loaded
1. Start LM Studio
- Open LM Studio
- Load an embedding model (e.g.,
nomic-embed-text-v1.5ortext-embedding-3-small) - Load a chat model (e.g.,
llama-3.2-3b-instructor similar) - Start the local server (default port 1234)
- Verify:
curl http://127.0.0.1:1234/v1/models
2. Start the Daemon
cd daemon
source .venv/bin/activate
python -m knowledge_refinery.main
The daemon will:
- Create data directory at
~/.knowledge-refinery/workspaces/<id>/ - Initialize SQLite database
- Connect to LM Studio
- Write a PID file to
{data_dir}/daemon.pidfor process detection - Listen on its assigned port (default
http://127.0.0.1:8742)
Tip: Use Start All in the app toolbar to launch all workspace daemons at once. Each workspace runs an independent daemon with its own port and data directory. After connection, ingestion auto-starts.
Environment Variables
| Variable | Default | Description |
|---|---|---|
KR_DATA_DIR |
~/.knowledge-refinery |
Data directory |
KR_LM_STUDIO_URL |
http://127.0.0.1:1234/v1 |
LM Studio API URL |
KR_PORT |
8742 |
Daemon port |
Verify Daemon
curl http://127.0.0.1:8742/health
3. Run the macOS App
cd apps/macos/KnowledgeRefinery
swift run
Or open in Xcode:
open Package.swift
The app will:
- Auto-start daemons for all workspaces on launch
- Detect already-running daemons via PID files
- Auto-restart crashed daemons (up to 3 times)
- Show connection status in the toolbar
4. Ingest Documents
- In the app, go to Volumes tab
- Click Add Folder and select a directory
- Go to Ingest tab and click Start Ingestion, or use Start All from the dashboard
- Watch live pipeline progress in the Pipeline Progress Panel:
- Stage tracker: Each of the 6 stages (Scan, Extract, Chunk, Embed, Annotate, Conceptualize) shows a checkmark when complete or an animated progress bar when running
- Animated counters: Live tallies for chunks, vectors, annotations, concepts, and edges
- Interaction indicators: Visual status of App-to-Daemon and Daemon-to-LM Studio connections
- Activity log: Auto-scrolling log of the last 50 pipeline events
- The dashboard card shows a compact spinner with the current stage name and chunk count
- The 3D universe auto-refreshes every 5 seconds during ingestion, using incremental node injection
The app polls /ingest/status every 1.5 seconds during pipeline execution and automatically stops polling when the pipeline completes.
Via API
# Add a volume
curl -X POST http://127.0.0.1:8742/volumes/add \
-H "Content-Type: application/json" \
-d '{"path": "/path/to/documents"}'
# Start ingestion
curl -X POST http://127.0.0.1:8742/ingest/start \
-H "Content-Type: application/json" \
-d '{}'
# Check status (enriched response with live progress)
curl http://127.0.0.1:8742/ingest/status
The enriched /ingest/status response includes:
{
"status": "running",
"stage": "embed",
"chunk_count": 142,
"annotation_count": 87,
"concept_count": 12,
"edge_count": 45,
"live": {
"scan": {"status": "done", "progress_pct": 100},
"extract": {"status": "done", "progress_pct": 100},
"chunk": {"status": "done", "progress_pct": 100},
"embed": {"status": "running", "progress_pct": 64},
"annotate": {"status": "pending", "progress_pct": 0},
"conceptualize": {"status": "pending", "progress_pct": 0}
},
"activity_log": [
{"timestamp": "2026-02-12T10:30:01Z", "message": "Scanning 3 volumes..."},
{"timestamp": "2026-02-12T10:30:03Z", "message": "Found 47 files, 12 new"},
"..."
]
}
5. Search
Use the Search tab in the app, or:
curl -X POST http://127.0.0.1:8742/search \
-H "Content-Type: application/json" \
-d '{"query": "machine learning", "limit": 10}'
Troubleshooting
- Daemon won't start: Check that port 8742 is free
- LM Studio unavailable: Ensure LM Studio server is running on port 1234
- No embeddings: Verify an embedding model is loaded in LM Studio
- App can't connect: Check daemon is running on the expected port