KnowledgeRefinery/README.md
oho 38a99476d6 Knowledge Refinery: local-first semantic search & 3D concept visualization
macOS app for corpus ingestion, semantic search, and concept universe
visualization powered by local LLMs via LM Studio.

Architecture:
- Go daemon (17MB single binary, zero dependencies)
  - chi router, pure-Go SQLite, tiktoken tokenizer
  - 6-stage pipeline: scan → extract → chunk → embed → annotate → conceptualize
  - Brute-force cosine vector search in memory
  - 89 tests across 8 packages
- SwiftUI app (macOS 15+)
  - Multi-workspace management with auto-start daemons
  - Live pipeline progress, search, concept browser
  - WebGPU 3D universe renderer with Canvas2D fallback
  - Custom crystal app icon
2026-02-13 18:09:46 +01:00

3.4 KiB

Knowledge Refinery

A local-first macOS Tahoe application that ingests heterogeneous document corpora, extracts structured knowledge via local LLMs (LM Studio), and provides semantic search with 3D concept visualization.

Installation

Prerequisites

  • macOS Tahoe (26.x) on Apple Silicon
  • Xcode or Xcode Command Line Tools (for Swift 6.2+)
  • Python 3.12+ (system Python or from python.org)
  • LM Studio from lmstudio.ai

One-Line Install

git clone <repo-url> && cd LongLocalTimeHorizonInfoRetrieval && bash scripts/install.sh

This will:

  1. Check all prerequisites
  2. Create a Python virtual environment and install dependencies
  3. Build the SwiftUI application
  4. Create a proper .app bundle
  5. Install to /Applications

Manual Build

# Set up daemon
cd daemon
python3 -m venv .venv
.venv/bin/pip install -e ".[dev]"

# Build app bundle
cd ..
make build

# Or just run in development mode
make app-run

LM Studio Setup

Before launching Knowledge Refinery:

  1. Open LM Studio
  2. Load models:
    • Chat: gemma-3-4b (or any chat model)
    • Embeddings: nomic-embed-text-v1.5 (768-dim)
  3. Start the local server on port 1234

Quick Start

  1. Launch Knowledge Refinery from Applications or Spotlight
  2. The dashboard shows LM Studio status (green = connected)
  3. Click New Workspace — name it, add data lake folders
  4. Click Start All to launch all workspace daemons and auto-start ingestion
  5. Watch live pipeline progress: stage tracker, animated counters, activity log
  6. Search, explore the concept universe, browse clusters

Architecture

  • SwiftUI Master Control App — Multi-workspace dashboard, LM Studio monitoring, daemon lifecycle, live pipeline visibility
  • Python Daemon (FastAPI) — Per-workspace instances with independent ports and data directories (~/.knowledge-refinery/workspaces/<id>/)
  • Live Pipeline Progress — 1.5s fast polling during ingestion, enriched /ingest/status with per-stage progress, counters, and activity log
  • LanceDB — Embedded vector store for semantic search
  • SQLite — Metadata, graph store, pipeline state
  • LM Studio — Local LLM inference (embeddings + chat)
  • WebGPU — 3D concept universe visualization with auto-refresh during ingestion

Project Structure

apps/macos/KnowledgeRefinery/   SwiftUI macOS application
daemon/                         Python backend daemon
shared/                         Prompt templates, schemas
docs/                           Architecture and operational docs
scripts/                        Build and install scripts
test_corpus/                    Sample documents for testing
dist/                           Built .app bundle (after make build)

Development

make help          # Show all commands
make test          # Run daemon tests + Swift build check
make app-run       # Run app via swift run (dev mode)
make daemon-run    # Run daemon directly
make clean         # Remove build artifacts

Milestones

  • M1: Core ingestion + search + evidence
  • M2: LLM structured annotation
  • M3: Concept clustering + labeling
  • M4: WebGPU 3D Universe visualization
  • M5: Semantic zoom + lenses
  • M6: Extended format support (EPUB, archives, DICOM)
  • M7: Master Control App (multi-workspace, LM Studio monitoring, daemon lifecycle)
  • M8: Live Pipeline Visibility (real-time progress panel, activity log, universe auto-refresh)