# Hands-On AI: LLM Reference Guide

This file is a single-file reference you can paste into an AI assistant (Claude,
ChatGPT, and so on). Once it is in the context window, the assistant can answer
questions and write correct code for the `hands-on-ai` package.

## Overview

Hands-On AI is a provider-agnostic, educational AI toolkit that teaches how
modern AI systems work by building small, readable versions of them. It works
with any OpenAI-compatible LLM provider.

- Package name: `hands-on-ai`
- Version: 0.4.0
- Python: 3.10+
- Install: `pip install hands-on-ai` (one install includes everything; there are
  no optional extras to add)

It has five modules, meant to be learned in order:

1. `chat`: prompting, system prompts, personalities, and multi-turn conversation
2. `rag`: retrieval-augmented generation over your own documents
3. `agent`: tool use and step-by-step reasoning
4. `workflow`: orchestrating multi-step tasks as folders of stages
5. `eval`: judging output quality with an LLM (LLM-as-judge)

## The core idea

Every module is the same shape underneath: one call to the model
(`get_response`), wrapped in plain Python. Chat adds a system prompt. RAG adds
code to fetch context. An agent adds a loop that runs your functions. A workflow
adds folders. The model is the only "AI"; the rest is ordinary code. The package
is deliberately small and educational, not a production framework.

## Configuration

Configure with environment variables (highest priority), a config file
(`~/.hands-on-ai/config.json`), or built-in defaults.

```python
import os
os.environ["HANDS_ON_AI_SERVER"] = "http://localhost:11434"   # provider base URL (/v1 is auto-appended)
os.environ["HANDS_ON_AI_API_KEY"] = "your-api-key"            # omit for local Ollama
os.environ["HANDS_ON_AI_MODEL"] = "llama3"                    # chat model
os.environ["HANDS_ON_AI_EMBEDDING_MODEL"] = "nomic-embed-text" # used by RAG
os.environ["HANDS_ON_AI_LOG"] = "debug"                       # optional logging
```

Defaults: server `http://localhost:11434` (local Ollama), model `llama3`,
embedding model `nomic-embed-text`. In notebooks set these before the first call.

## Module 1: chat

### get_response

```
get_response(prompt, model=None, system="You are a helpful assistant.",
             personality="friendly", stream=False, retries=2, return_usage=False)
```

Stateless single-turn call. Returns the response string. With
`return_usage=True` it returns a `(response, usage)` tuple, where `usage` is a
dict of token counts (or None if the provider does not report them).

```python
from hands_on_ai.chat import get_response

print(get_response("What is machine learning?"))
print(get_response("Explain gravity.", system="You are a pirate. Answer in pirate slang."))

text, usage = get_response("Hello", return_usage=True)
```

### Personality bots

A "bot" is just `get_response` with a fixed system prompt. Available bots:

```python
from hands_on_ai.chat import (
    friendly_bot, sarcastic_bot, pirate_bot, shakespeare_bot, teacher_bot,
    coach_bot, caveman_bot, hacker_bot, therapist_bot, grumpy_professor_bot,
    alien_bot, emoji_bot, coder_bot,
)
print(pirate_bot("Tell me about the ocean."))
```

Create your own by writing a function that calls `get_response` with a system
prompt:

```python
def detective_bot(prompt):
    return get_response(prompt, system="You are a brilliant detective. Reason from clues and evidence.")
```

### Conversation (multi-turn memory)

An LLM is stateless: each `get_response` call is independent. To remember earlier
turns, resend the transcript. `Conversation` does that for you.

```python
from hands_on_ai.chat import Conversation

chat = Conversation(system="You are a helpful tutor.")
chat.ask("My name is Sam.")
print(chat.ask("What is my name?"))   # remembers "Sam"
print(chat.total_tokens)              # tokens used across the conversation
chat.reset()                          # clear history, keep the system prompt
```

Save and resume a conversation:

```python
chat.save("chat.json")
later = Conversation.load("chat.json")
```

### Streaming, token usage, and caching

Stream a response as it is generated:

```python
from hands_on_ai.chat import stream_response
for chunk in stream_response("Tell me a short story"):
    print(chunk, end="", flush=True)
```

See token usage: `get_response(prompt, return_usage=True)` returns
`(text, usage)`, and `get_last_usage()` returns usage from the most recent call.
The CLI flag `chat ask "..." --usage` prints it. The interactive REPL
(`chat interactive`) streams token-by-token automatically.

Opt-in response caching: set `HANDS_ON_AI_CACHE` to a directory and identical
`(model, system, prompt)` calls return a saved answer instead of calling the
model again (reproducible, free reruns, works offline once warmed).

```bash
export HANDS_ON_AI_CACHE=~/.hands-on-ai/cache
```

### Chat CLI

```bash
chat ask "What is AI?"                # one question
chat ask "Explain loops" --personality coder
chat interactive                     # REPL
chat bots                            # list personalities
chat doctor                          # connection test
```

## Module 2: rag

RAG indexes your documents, retrieves the most relevant chunks for a question,
and puts them in the prompt so the model answers from your sources. It needs an
embedding model in addition to the chat model.

### Python API

The helpers live in `hands_on_ai.rag.utils`:

```python
from hands_on_ai.rag.utils import (
    load_text_file, chunk_text, get_embeddings,
    save_index_with_sources, get_top_k,
)

# Build an index
chunks, sources = [], []
for path in ["notes.txt"]:
    text = load_text_file(path)            # supports .txt, .md, .pdf, .docx
    for chunk in chunk_text(text):
        chunks.append(chunk)
        sources.append(path)
vectors = get_embeddings(chunks)
save_index_with_sources(vectors, chunks, sources, "index.npz")

# Retrieve and answer
results, scores = get_top_k("What is the topic?", "index.npz", k=3, return_scores=True)
# results is a list of (chunk, source) tuples
context = "\n\n".join(chunk for chunk, _ in results)
from hands_on_ai.chat import get_response
print(get_response(f"Answer from this context only:\n{context}\n\nQuestion: What is the topic?"))
```

Key signature: `get_top_k(query, index_path, k=3, return_scores=False)`. It
embeds the query, loads the index, runs the similarity search, and returns
`(chunk, source)` tuples (plus a list of scores when `return_scores=True`).

Note: there are no `index_documents` or `query_documents` functions. Use the
`hands_on_ai.rag.utils` helpers above, or the CLI.

### RAG CLI

```bash
rag index path/to/docs/              # writes ~/.hands-on-ai/index.npz by default
rag ask "What does TCP do?" --context   # -c shows retrieved context
rag ask "Compare TCP and UDP" --scores  # -s shows similarity scores
rag ask "..." --k 5                   # number of chunks to retrieve
rag interactive
```

## Module 3: agent

An agent is a loop: the model decides which tool to call, your Python runs it,
and the result goes back to the model. Tools are just functions.

### The tool contract

A tool is a function that takes a single string and returns a string. Register it
with `register_tool(name, description, function)`. The model reads the
description to decide when and how to use the tool.

```python
from hands_on_ai.agent import run_agent, register_tool

def shout(text):
    """Return the text in uppercase."""
    return text.upper()

register_tool("shout", "Convert text to uppercase. Input: the text.", shout)
print(run_agent("Use the shout tool on 'hello world'"))
```

`run_agent(prompt, model=None, format="auto", max_iterations=5, verbose=False)`.
With `format="auto"` the library routes small local models to a JSON tool-calling
format (more reliable for them) and capable models to ReAct.

### Built-in tools

```python
from hands_on_ai.agent import run_agent, register_calculator_tool
register_calculator_tool()
print(run_agent("What is 23 * 19? Use the calculator tool."))
```

Safety note: do NOT build a calculator with `eval`. Emptying `__builtins__` is
not a sandbox and is escapable to remote code execution. The built-in calculator
parses expressions with `ast` and walks only the allowed nodes. Write tools that
parse input safely and return a string; handle errors by returning an error
string rather than raising.

### Agent CLI

```bash
agent ask "Calculate the area of a circle with radius 5"
agent interactive
```

## Module 4: workflow

Orchestrate multi-step tasks as a folder of numbered stages ("folders over
frameworks"). One model walks the stages in order, writing a readable file at
each step. It is sequential and human-in-the-loop by design.

```
workspace/
├── CONTEXT.md            # optional shared system prompt
├── references/           # stable rules (the "factory")
└── stages/
    ├── 01_research/
    │   ├── CONTEXT.md    # instructions for this stage
    │   └── output/       # output.md is written here
    └── 02_draft/
        ├── CONTEXT.md
        └── output/
```

```python
from hands_on_ai.workflow import Pipeline, init_workspace

init_workspace("essay", ["research", "draft"], system="You are a writing assistant.")
# edit stages/01_research/CONTEXT.md and stages/02_draft/CONTEXT.md

pipe = Pipeline("essay")
pipe.status()        # [ ] 01_research   [ ] 02_draft
pipe.run_next()      # runs stage 01, writes output.md, then stops for review
pipe.run_next()      # runs stage 02 using stage 01's (reviewed) output
```

`run_next()` runs one stage and stops so you can read or edit the output file
before continuing. `run_all()` runs the rest without stopping. Each stage's
prompt is built from its `CONTEXT.md`, any `references/` files, and the previous
stage's output.

## Module 5: eval

Use an LLM to score an output against criteria you define (the LLM-as-judge
pattern). Useful for "is this answer good?" without hand-writing graders.

```python
from hands_on_ai.eval import judge

verdict = judge(
    output="Paris is the capital of France.",
    criteria="accurate and concise",
    question="What is the capital of France?",
)
print(verdict["score"])      # an int 1..scale (default scale 5), or None
print(verdict["reasoning"])  # one short sentence
```

`judge(output, criteria, question=None, model=None, scale=5)` returns a dict with
`score`, `reasoning`, and `raw`. Treat the score as a signal, not a verdict:
judges can be inconsistent, so use clear criteria and average several runs.

## Providers

Any OpenAI-compatible provider works by setting `HANDS_ON_AI_SERVER`,
`HANDS_ON_AI_API_KEY`, and `HANDS_ON_AI_MODEL`.

- Local: Ollama (`http://localhost:11434`, no key), LocalAI, vLLM
- Cloud: OpenAI (`https://api.openai.com`), OpenRouter (`https://openrouter.ai/api`),
  Together AI (`https://api.together.xyz`), Groq (`https://api.groq.com/openai`),
  Google Gemini (`https://generativelanguage.googleapis.com/v1beta/openai`),
  Hugging Face
- For an authenticated Ollama server, set both the server URL and the API key.

## Educational philosophy

Hands-On AI is small on purpose. It captures the essence of each idea in code you
can read in an afternoon, and leaves out production concerns (retries, dashboards,
streaming at scale, vector databases, evaluation harnesses). It is an on-ramp:
once the ideas click, move to production tools (LlamaIndex and a vector database
for RAG; Instructor or Pydantic AI for structured output; LangChain, LangGraph,
or a provider's agent SDK for orchestration). The point is understanding, not
shipping.

## Troubleshooting

```bash
handsonai doctor      # checks connection, shows resolved config, lists models
```

- Connection errors: check the server URL and (for cloud) the API key.
- Model not found: for Ollama, run `ollama pull <model>` first.
- RAG needs an embedding model: `ollama pull nomic-embed-text` for local Ollama.

## Learn more

- Documentation: https://michael-borck.github.io/hands-on-ai/
- Source: https://github.com/michael-borck/hands-on-ai
- DeepWiki (AI chat interface to the repo): https://deepwiki.com/michael-borck/hands-on-ai