mirror of
https://github.com/saymrwulf/KnowledgeRefinery.git
synced 2026-05-14 20:47:51 +00:00
141 lines
4.3 KiB
Markdown
141 lines
4.3 KiB
Markdown
|
|
# Running Knowledge Refinery
|
||
|
|
|
||
|
|
## Prerequisites
|
||
|
|
|
||
|
|
- macOS Tahoe (26.x)
|
||
|
|
- Python 3.12+
|
||
|
|
- Xcode 26+
|
||
|
|
- LM Studio running locally with at least one model loaded
|
||
|
|
|
||
|
|
## 1. Start LM Studio
|
||
|
|
|
||
|
|
1. Open LM Studio
|
||
|
|
2. Load an embedding model (e.g., `nomic-embed-text-v1.5` or `text-embedding-3-small`)
|
||
|
|
3. Load a chat model (e.g., `llama-3.2-3b-instruct` or similar)
|
||
|
|
4. Start the local server (default port 1234)
|
||
|
|
5. Verify: `curl http://127.0.0.1:1234/v1/models`
|
||
|
|
|
||
|
|
## 2. Start the Daemon
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd daemon
|
||
|
|
source .venv/bin/activate
|
||
|
|
python -m knowledge_refinery.main
|
||
|
|
```
|
||
|
|
|
||
|
|
The daemon will:
|
||
|
|
- Create data directory at `~/.knowledge-refinery/workspaces/<id>/`
|
||
|
|
- Initialize SQLite database
|
||
|
|
- Connect to LM Studio
|
||
|
|
- Write a PID file to `{data_dir}/daemon.pid` for process detection
|
||
|
|
- Listen on its assigned port (default `http://127.0.0.1:8742`)
|
||
|
|
|
||
|
|
> **Tip**: Use **Start All** in the app toolbar to launch all workspace daemons at once. Each workspace runs an independent daemon with its own port and data directory. After connection, ingestion auto-starts.
|
||
|
|
|
||
|
|
### Environment Variables
|
||
|
|
|
||
|
|
| Variable | Default | Description |
|
||
|
|
|----------|---------|-------------|
|
||
|
|
| `KR_DATA_DIR` | `~/.knowledge-refinery` | Data directory |
|
||
|
|
| `KR_LM_STUDIO_URL` | `http://127.0.0.1:1234/v1` | LM Studio API URL |
|
||
|
|
| `KR_PORT` | `8742` | Daemon port |
|
||
|
|
|
||
|
|
### Verify Daemon
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl http://127.0.0.1:8742/health
|
||
|
|
```
|
||
|
|
|
||
|
|
## 3. Run the macOS App
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd apps/macos/KnowledgeRefinery
|
||
|
|
swift run
|
||
|
|
```
|
||
|
|
|
||
|
|
Or open in Xcode:
|
||
|
|
```bash
|
||
|
|
open Package.swift
|
||
|
|
```
|
||
|
|
|
||
|
|
The app will:
|
||
|
|
- Auto-start daemons for all workspaces on launch
|
||
|
|
- Detect already-running daemons via PID files
|
||
|
|
- Auto-restart crashed daemons (up to 3 times)
|
||
|
|
- Show connection status in the toolbar
|
||
|
|
|
||
|
|
## 4. Ingest Documents
|
||
|
|
|
||
|
|
1. In the app, go to **Volumes** tab
|
||
|
|
2. Click **Add Folder** and select a directory
|
||
|
|
3. Go to **Ingest** tab and click **Start Ingestion**, or use **Start All** from the dashboard
|
||
|
|
4. Watch live pipeline progress in the **Pipeline Progress Panel**:
|
||
|
|
- **Stage tracker**: Each of the 6 stages (Scan, Extract, Chunk, Embed, Annotate, Conceptualize) shows a checkmark when complete or an animated progress bar when running
|
||
|
|
- **Animated counters**: Live tallies for chunks, vectors, annotations, concepts, and edges
|
||
|
|
- **Interaction indicators**: Visual status of App-to-Daemon and Daemon-to-LM Studio connections
|
||
|
|
- **Activity log**: Auto-scrolling log of the last 50 pipeline events
|
||
|
|
5. The dashboard card shows a compact spinner with the current stage name and chunk count
|
||
|
|
6. The 3D universe auto-refreshes every 5 seconds during ingestion, using incremental node injection
|
||
|
|
|
||
|
|
The app polls `/ingest/status` every 1.5 seconds during pipeline execution and automatically stops polling when the pipeline completes.
|
||
|
|
|
||
|
|
### Via API
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Add a volume
|
||
|
|
curl -X POST http://127.0.0.1:8742/volumes/add \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{"path": "/path/to/documents"}'
|
||
|
|
|
||
|
|
# Start ingestion
|
||
|
|
curl -X POST http://127.0.0.1:8742/ingest/start \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{}'
|
||
|
|
|
||
|
|
# Check status (enriched response with live progress)
|
||
|
|
curl http://127.0.0.1:8742/ingest/status
|
||
|
|
```
|
||
|
|
|
||
|
|
The enriched `/ingest/status` response includes:
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"status": "running",
|
||
|
|
"stage": "embed",
|
||
|
|
"chunk_count": 142,
|
||
|
|
"annotation_count": 87,
|
||
|
|
"concept_count": 12,
|
||
|
|
"edge_count": 45,
|
||
|
|
"live": {
|
||
|
|
"scan": {"status": "done", "progress_pct": 100},
|
||
|
|
"extract": {"status": "done", "progress_pct": 100},
|
||
|
|
"chunk": {"status": "done", "progress_pct": 100},
|
||
|
|
"embed": {"status": "running", "progress_pct": 64},
|
||
|
|
"annotate": {"status": "pending", "progress_pct": 0},
|
||
|
|
"conceptualize": {"status": "pending", "progress_pct": 0}
|
||
|
|
},
|
||
|
|
"activity_log": [
|
||
|
|
{"timestamp": "2026-02-12T10:30:01Z", "message": "Scanning 3 volumes..."},
|
||
|
|
{"timestamp": "2026-02-12T10:30:03Z", "message": "Found 47 files, 12 new"},
|
||
|
|
"..."
|
||
|
|
]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## 5. Search
|
||
|
|
|
||
|
|
Use the **Search** tab in the app, or:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl -X POST http://127.0.0.1:8742/search \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{"query": "machine learning", "limit": 10}'
|
||
|
|
```
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
- **Daemon won't start**: Check that port 8742 is free
|
||
|
|
- **LM Studio unavailable**: Ensure LM Studio server is running on port 1234
|
||
|
|
- **No embeddings**: Verify an embedding model is loaded in LM Studio
|
||
|
|
- **App can't connect**: Check daemon is running on the expected port
|