# Tichy A self-contained, privacy-focused RAG (Retrieval-Augmented Generation) system in Go. All data stays local - nothing is sent to external LLM providers. ## Requirements - **Go 1.24.4+** - **Docker and Docker Compose** - **NVIDIA GPU with CUDA support** (required for llama.cpp inference with default docker-compose.yml) - For CPU-only inference, use `ghcr.io/ggerganov/llama.cpp:server` image and remove the `runtime: nvidia` and NVIDIA environment variables from the llm and embeddings services - **GGUF Models**: - Main LLM model (e.g., Gemma 3 12B) - Embedding model (e.g., nomic-embed-text v1.5) ## Quick Start ### 1. Prepare Models Place your GGUF models in a directory of your choice (e.g., `~/models/llama/`): ```bash mkdir -p ~/models/llama # Copy your models to: # ~/models/llama/google_gemma-3-12b-it-Q8_0.gguf # ~/models/llama/nomic-embed-text-v1.5.Q8_0.gguf ``` Update the volume paths in `docker-compose.yml` if using a different location. ### 2. Start Services Start PostgreSQL, LLM server, and embeddings server: ```bash docker compose up -d ``` Verify services are running: ```bash docker compose ps ``` ### 3. Configure Environment Copy and configure the environment file: ```bash cp examples/insurellm/.env .env # Edit .env if needed to adjust URLs, ports, or chunk sizes ``` ### 4. Build and Run Build the application: ```bash make build ``` Or use Docker to run commands without building locally: ```bash docker compose run --rm tichy db up docker compose run --rm tichy ingest --source /mnt/cwd/examples/insurellm/knowledge-base/ --mode text ``` Initialize the database: ```bash ./tichy db up ``` Ingest documents: ```bash ./tichy ingest --source ./examples/insurellm/knowledge-base/ --mode text ``` ### 5. Start Chatting Start an interactive chat session: ```bash ./tichy chat ``` Or with markdown rendering: ```bash ./tichy chat --markdown ``` ## Usage Examples ### Ingest Documents ```bash ./tichy ingest --source ./path/to/documents/ --mode text ``` ### Interactive Chat ```bash ./tichy chat > When InsureLLM was founded? ``` ### Generate Tests ```bash ./tichy tests generate --num 20 --output tests.json ``` ### Evaluate RAG Performance ```bash ./tichy tests evaluate --input tests.json ``` ## Services - **PostgreSQL + pgvector**: Vector database (port 5432) - **LLM Server**: llama.cpp inference server (port 8080) - **Embeddings Server**: llama.cpp embeddings server (port 8081) ## Configuration Key environment variables in `.env`: - `DATABASE_URL`: PostgreSQL connection string - `LLM_SERVER_URL`: LLM inference endpoint - `EMBEDDING_SERVER_URL`: Embeddings endpoint - `SYSTEM_PROMPT_TEMPLATE`: Path to system prompt template - `CHUNK_SIZE`: Document chunk size (default: 500) - `CHUNK_OVERLAP`: Chunk overlap (default: 100) - `TOP_K`: Number of results to retrieve (default: 10) ## Acknowledgments The example insurance knowledge base in `examples/insurellm/` is derived from the dataset provided by [LLM Engineering course](https://github.com/ed-donner/llm_engineering). ## License BSD 3-Clause - see [LICENSE](LICENSE) for details.