# Lemonade Runtime - Quick Start

Get up and running with Lemonade in under 5 minutes.

## Prerequisites

1. **Install Lemonade SDK**:
   ```bash
   pip install lemonade-sdk
   ```

   Or using uv (recommended):
   ```bash
   uv pip install lemonade-sdk
   ```

## Step 1: Download a Model

Before starting the server, download a model using the `lemonade-server-dev pull` command:

```bash
# For a small, fast model (0.6B parameters)
uv run lemonade-server-dev pull user.Qwen3-0.6B \
  --checkpoint unsloth/Qwen3-0.6B-GGUF \
  --recipe llamacpp

# For a balanced model (4B parameters, recommended)
uv run lemonade-server-dev pull user.Qwen3-4B \
  --checkpoint unsloth/Qwen3-4B-GGUF:Q4_K_M \
  --recipe llamacpp

# For a powerful model (8B parameters)
uv run lemonade-server-dev pull user.Qwen3-8B \
  --checkpoint unsloth/Qwen3-8B-GGUF:Q4_K_M \
  --recipe llamacpp
```

### Recommended Models to Try

**Small & Fast (< 1GB):**
```bash
# Qwen3-0.6B - Great for quick responses
uv run lemonade-server-dev pull user.Qwen3-0.6B \
  --checkpoint unsloth/Qwen3-0.6B-GGUF \
  --recipe llamacpp

# Qwen3-1.7B - Small reasoning model
uv run lemonade-server-dev pull user.Qwen3-1.7B \
  --checkpoint unsloth/Qwen3-1.7B-GGUF:Q4_K_M \
  --recipe llamacpp

# Llama-3.2-1B - Meta's small model
uv run lemonade-server-dev pull user.Llama-3.2-1B-Instruct \
  --checkpoint unsloth/Llama-3.2-1B-Instruct-GGUF:Q4_K_M \
  --recipe llamacpp
```

**Balanced (2-5GB):**
```bash
# Qwen3-4B - Best balance of speed and quality
uv run lemonade-server-dev pull user.Qwen3-4B \
  --checkpoint unsloth/Qwen3-4B-GGUF:Q4_K_M \
  --recipe llamacpp

# Gemma-3-4b-it - Google's instruction-tuned model with vision
uv run lemonade-server-dev pull user.Gemma-3-4b-it \
  --checkpoint bartowski/Gemma-3-4b-it-GGUF:Q4_K_M \
  --recipe llamacpp
```

**Powerful (5GB+):**
```bash
# Qwen3-8B - High-quality reasoning
uv run lemonade-server-dev pull user.Qwen3-8B \
  --checkpoint unsloth/Qwen3-8B-GGUF:Q4_K_M \
  --recipe llamacpp

# DeepSeek-Qwen3-8B - Advanced reasoning capabilities
uv run lemonade-server-dev pull user.DeepSeek-Qwen3-8B \
  --checkpoint unsloth/DeepSeek-Qwen3-8B-GGUF:Q4_K_M \
  --recipe llamacpp

# Qwen2.5-Coder-32B - Large coding model
uv run lemonade-server-dev pull user.Qwen2.5-Coder-32B-Instruct \
  --checkpoint unsloth/Qwen2.5-Coder-32B-Instruct-GGUF:Q4_K_M \
  --recipe llamacpp
```

**Specialized:**
```bash
# Devstral-Small-2507 - Coding and tool-calling
uv run lemonade-server-dev pull user.Devstral-Small-2507 \
  --checkpoint unsloth/Devstral-Small-2507-GGUF:Q4_K_M \
  --recipe llamacpp

# Qwen2.5-VL-7B - Vision model
uv run lemonade-server-dev pull user.Qwen2.5-VL-7B-Instruct \
  --checkpoint unsloth/Qwen2.5-VL-7B-Instruct-GGUF:Q4_K_M \
  --recipe llamacpp
```

**Check Downloaded Models:**
```bash
uv run lemonade-server-dev list
```

## Step 2: Start Lemonade Server

```bash
# From the llamafarm project root
LEMONADE_MODEL=user.Qwen3-4B nx start lemonade
```

This starts Lemonade on port 11534 with llama.cpp backend (recommended for GGUF models).

> **Note:** The `nx start lemonade` command automatically picks up configuration from your `llamafarm.yaml`. Currently, Lemonade must be manually started from the project root. In the future, Lemonade will run as a container and be auto-started by the LlamaFarm server.

## Step 3: Configure Your Project

Create or update `llamafarm.yaml`:

```yaml
version: v1
name: my-lemonade-project
namespace: default

runtime:
  models:
    - name: lemon
      description: "Lemonade local model"
      provider: lemonade
      model: user.Qwen3-4B  # Use the model you downloaded
      base_url: "http://127.0.0.1:11534/v1"
      default: true
      lemonade:
        backend: llamacpp
        port: 11534
        context_size: 32768

prompts:
  - role: system
    content: "You are a helpful assistant."
```

## Step 4: Chat!

```bash
lf chat "What is the capital of France?"
```

## Multi-Model Setup

You can configure multiple Lemonade models by running separate instances on different ports:

```yaml
runtime:
  models:
    - name: fast
      description: "Fast Lemonade model"
      provider: lemonade
      model: user.Qwen3-0.6B
      base_url: "http://127.0.0.1:11534/v1"
      default: true
      lemonade:
        backend: llamacpp
        port: 11534

    - name: powerful
      description: "Powerful Lemonade model"
      provider: lemonade
      model: user.Qwen3-8B
      base_url: "http://127.0.0.1:11535/v1"
      lemonade:
        backend: llamacpp
        port: 11535
```

Start each instance (from llamafarm project root):
```bash
# Terminal 1
LEMONADE_MODEL=user.Qwen3-0.6B LEMONADE_PORT=11534 nx start lemonade

# Terminal 2
LEMONADE_MODEL=user.Qwen3-8B LEMONADE_PORT=11535 nx start lemonade
```

> **Note:** In the future, multiple Lemonade instances will run as containers and be auto-started by the LlamaFarm server.

## Custom Backends

### Using ONNX (cross-platform)

```bash
LEMONADE_BACKEND=onnx nx start lemonade
```

Or in `llamafarm.yaml`:

```yaml
runtime:
  models:
    - name: onnx-model
      description: "ONNX model"
      provider: lemonade
      model: Phi-3-mini-4k-instruct-onnx
      base_url: "http://127.0.0.1:11534/v1"
      default: true
      lemonade:
        backend: onnx
        port: 11534
```

### Using Transformers (PyTorch)

```bash
LEMONADE_BACKEND=transformers nx start lemonade
```

## Troubleshooting

### Port Already in Use
```bash
LEMONADE_PORT=11535 nx start lemonade
```

### Lemonade Not Installed
```bash
pip install lemonade-sdk
# or
uv pip install lemonade-sdk
```

### Model Not Found
Make sure you've downloaded the model first:
```bash
uv run lemonade-server-dev list  # Check what's downloaded
uv run lemonade-server-dev pull user.ModelName --checkpoint ... --recipe llamacpp
```

### Check Health
Visit: http://localhost:8000/health

Look for the "lemonade" component status.

## Next Steps

- Read the full [README.md](./README.md) for advanced configuration
- Check the [example config](./example.llamafarm.yaml)
- See available models: `uv run lemonade-server-dev list`
- Learn about model recipes: https://lemonade-server.ai/docs/server/server_models/