# RAG Coding Harness Guide

This repo runs as a terminal UI coding harness. The default interface is the fullscreen TUI; use `--classic` if your terminal does not support alternate-screen rendering.

## 0. From a fresh GitHub clone

If you just downloaded or pulled the repo, you do **not** need to publish/install anything first. Run it from the checkout:

```bash
cd smallcode
npm install
node bin/smallcode.js --help
```

Then start a local model server and create a `.env` file in the directory where you will run the harness. For testing against LM Studio's default local server this usually looks like:

```bash
cat > .env <<'ENV'
SMALLCODE_MODEL=your-loaded-model-name
SMALLCODE_BASE_URL=http://localhost:1234/v1
ENV
```

Start the UI from the checkout with:

```bash
node bin/smallcode.js
```

If you want the normal `smallcode` command instead of typing `node bin/smallcode.js`, link the checkout once:

```bash
npm link
smallcode
```

Use `node bin/smallcode.js --classic` or `smallcode --classic` if your terminal has trouble with the fullscreen interface.

## 1. Start a local model server

SmallCode talks to any OpenAI-compatible local endpoint.

### LM Studio

1. Open LM Studio.
2. Download/load a coder model such as Qwen Coder or another 8B-35B local coding model.
3. Start the **Local Server**.
4. Note the model name shown by LM Studio and the base URL, usually `http://localhost:1234/v1`.

Create `.env` in the project you want to edit:

```bash
SMALLCODE_MODEL=your-loaded-model-name
SMALLCODE_BASE_URL=http://localhost:1234/v1
```

### llama.cpp server

Start llama.cpp with its OpenAI-compatible server, then point SmallCode at it:

```bash
SMALLCODE_MODEL=local-model
SMALLCODE_BASE_URL=http://localhost:8080/v1
```

## 2. Run the UI harness

From the project directory you want the agent to edit:

```bash
smallcode
```

If you are developing from this repository checkout instead of a global install:

```bash
node bin/smallcode.js
```

Useful launch modes:

```bash
smallcode --classic                    # readline UI instead of fullscreen UI
smallcode -P "fix the parser bug"      # one-shot prompt
smallcode --non-interactive "refactor" # stdin/script friendly mode
smallcode --resume                     # continue previous session
```

Inside the UI:

- Type your task and press Enter.
- Use `/help` for commands.
- Use `/plan` to inspect the active plan.
- Use `/undo` to revert the last edit.
- Use `/quit` to exit.

## 3. Create the local GitHub RAG database

The scraper is a Python pipeline at `scripts/rag_scraper.py`. It shallow-clones or updates repositories, walks source files, and emits **snippet-sized chunks** around functions/classes/types plus sliding-window chunks for files without clear symbols. It does not index whole files as a single blob.

### Fast starter corpus

If you do not create a config file, the indexer uses the built-in `starter` preset: a curated multi-language set of popular frameworks/libraries across Python, JavaScript/TypeScript, Go, Rust, Java, C#, Ruby, PHP, and C/C++.

```bash
npm run rag:index
```

### Larger broad corpus

For a bigger language-modeling corpus, use the `broad` preset. This scrapes more well-known, high-signal codebases, but takes longer and uses more disk space.

```bash
npm run rag:index -- --preset broad
```

The curated presets live in `src/rag/curated_repos.json`, so you can review or change the selected repositories.

### Custom corpus

Create `.smallcode/rag/repos.json` in the workspace where you run SmallCode:

```json
{
  "preset": "starter",
  "repos": [
    "https://github.com/owner/framework-example.git",
    "https://github.com/owner/language-examples.git",
    { "url": "/absolute/path/to/local/repo", "tags": ["local", "examples"] }
  ],
  "maxFilesPerRepo": 1000,
  "maxSnippetsPerRepo": 4000,
  "chunkLines": 80,
  "overlap": 20
}
```

Set `"preset": "none"` if you only want your own repos.

Optional fields:

```json
{
  "cacheDir": ".smallcode/rag/repos",
  "indexPath": ".smallcode/rag/index.json",
  "languages": ["python", "typescript", "go"],
  "maxFileBytes": 250000,
  "minChars": 120,
  "repos": ["https://github.com/owner/repo.git"]
}
```

After package installation, the same command is also available as:

```bash
smallcode-rag-index --preset broad
```

The indexer saves `.smallcode/rag/index.json` by default.

## 4. How code search works

SmallCode searches **code snippets**, not full files. The pipeline stores each snippet with repo, language, path, symbol name, start/end lines, tags, term frequencies, and a sparse local hashed vector.

At query time the retriever runs a hybrid search:

1. **BM25 lexical search** over identifiers, paths, symbols, tags, and snippet text. This is strong for exact APIs, framework names, error names, and language constructs.
2. **Local hashed-vector similarity** over the same snippet text. This is dependency-free and helps related naming patterns match even when exact words differ.
3. The final rank combines BM25 and vector scores, then injects only the top bounded snippets into model context.

This approach is intentionally fast enough for local models and large local corpora without requiring a separate vector database or cloud embeddings.

## 5. Use RAG in the harness

Once `.smallcode/rag/index.json` exists, start the UI normally:

```bash
smallcode
```

For each user turn, SmallCode now:

1. plans/classifies the request,
2. retrieves similar snippets from the local RAG index,
3. injects the best snippets into the model context,
4. asks the local model to do one step at a time through the normal tool loop.

No cloud embedding service is required. The default embedding path is dependency-free and optimized for fast local startup.

## 6. Optional web fallback when RAG is weak

Web search is disabled by default. Enable it only when you want the model to search externally after local RAG confidence is low:

```bash
SMALLCODE_WEB_BROWSE=true smallcode
```

When enabled, low-confidence RAG context tells the model to use `web_search` with a GitHub/code-example query if it gets blocked.

## 7. Speed tips for local models

- Prefer the fullscreen UI (`smallcode`) for normal work; use `--classic` only for terminal compatibility issues.
- Keep `SMALLCODE_CACHE_SPLIT` at its default (`true`) so llama.cpp-style KV cache reuse is not invalidated by dynamic context.
- Keep RAG repos focused when you need fast indexing; use `--preset broad` when you intentionally want a large reference corpus.
- Use 8B-35B coder models; very small models often fail multi-step tool use.
- If the model struggles, ask for a smaller concrete task first, then continue with follow-up prompts.