# Hands-On AI: LLM Reference Guide This file is a single-file reference you can paste into an AI assistant (Claude, ChatGPT, and so on). Once it is in the context window, the assistant can answer questions and write correct code for the `hands-on-ai` package. ## Overview Hands-On AI is a provider-agnostic, educational AI toolkit that teaches how modern AI systems work by building small, readable versions of them. It works with any OpenAI-compatible LLM provider. - Package name: `hands-on-ai` - Version: 0.4.0 - Python: 3.10+ - Install: `pip install hands-on-ai` (one install includes everything; there are no optional extras to add) It has five modules, meant to be learned in order: 1. `chat`: prompting, system prompts, personalities, and multi-turn conversation 2. `rag`: retrieval-augmented generation over your own documents 3. `agent`: tool use and step-by-step reasoning 4. `workflow`: orchestrating multi-step tasks as folders of stages 5. `eval`: judging output quality with an LLM (LLM-as-judge) ## The core idea Every module is the same shape underneath: one call to the model (`get_response`), wrapped in plain Python. Chat adds a system prompt. RAG adds code to fetch context. An agent adds a loop that runs your functions. A workflow adds folders. The model is the only "AI"; the rest is ordinary code. The package is deliberately small and educational, not a production framework. ## Configuration Configure with environment variables (highest priority), a config file (`~/.hands-on-ai/config.json`), or built-in defaults. ```python import os os.environ["HANDS_ON_AI_SERVER"] = "http://localhost:11434" # provider base URL (/v1 is auto-appended) os.environ["HANDS_ON_AI_API_KEY"] = "your-api-key" # omit for local Ollama os.environ["HANDS_ON_AI_MODEL"] = "llama3" # chat model os.environ["HANDS_ON_AI_EMBEDDING_MODEL"] = "nomic-embed-text" # used by RAG os.environ["HANDS_ON_AI_LOG"] = "debug" # optional logging ``` Defaults: server `http://localhost:11434` (local Ollama), model `llama3`, embedding model `nomic-embed-text`. In notebooks set these before the first call. ## Module 1: chat ### get_response ``` get_response(prompt, model=None, system="You are a helpful assistant.", personality="friendly", stream=False, retries=2, return_usage=False) ``` Stateless single-turn call. Returns the response string. With `return_usage=True` it returns a `(response, usage)` tuple, where `usage` is a dict of token counts (or None if the provider does not report them). ```python from hands_on_ai.chat import get_response print(get_response("What is machine learning?")) print(get_response("Explain gravity.", system="You are a pirate. Answer in pirate slang.")) text, usage = get_response("Hello", return_usage=True) ``` ### Personality bots A "bot" is just `get_response` with a fixed system prompt. Available bots: ```python from hands_on_ai.chat import ( friendly_bot, sarcastic_bot, pirate_bot, shakespeare_bot, teacher_bot, coach_bot, caveman_bot, hacker_bot, therapist_bot, grumpy_professor_bot, alien_bot, emoji_bot, coder_bot, ) print(pirate_bot("Tell me about the ocean.")) ``` Create your own by writing a function that calls `get_response` with a system prompt: ```python def detective_bot(prompt): return get_response(prompt, system="You are a brilliant detective. Reason from clues and evidence.") ``` ### Conversation (multi-turn memory) An LLM is stateless: each `get_response` call is independent. To remember earlier turns, resend the transcript. `Conversation` does that for you. ```python from hands_on_ai.chat import Conversation chat = Conversation(system="You are a helpful tutor.") chat.ask("My name is Sam.") print(chat.ask("What is my name?")) # remembers "Sam" print(chat.total_tokens) # tokens used across the conversation chat.reset() # clear history, keep the system prompt ``` Save and resume a conversation: ```python chat.save("chat.json") later = Conversation.load("chat.json") ``` ### Streaming, token usage, and caching Stream a response as it is generated: ```python from hands_on_ai.chat import stream_response for chunk in stream_response("Tell me a short story"): print(chunk, end="", flush=True) ``` See token usage: `get_response(prompt, return_usage=True)` returns `(text, usage)`, and `get_last_usage()` returns usage from the most recent call. The CLI flag `chat ask "..." --usage` prints it. The interactive REPL (`chat interactive`) streams token-by-token automatically. Opt-in response caching: set `HANDS_ON_AI_CACHE` to a directory and identical `(model, system, prompt)` calls return a saved answer instead of calling the model again (reproducible, free reruns, works offline once warmed). ```bash export HANDS_ON_AI_CACHE=~/.hands-on-ai/cache ``` ### Chat CLI ```bash chat ask "What is AI?" # one question chat ask "Explain loops" --personality coder chat interactive # REPL chat bots # list personalities chat doctor # connection test ``` ## Module 2: rag RAG indexes your documents, retrieves the most relevant chunks for a question, and puts them in the prompt so the model answers from your sources. It needs an embedding model in addition to the chat model. ### Python API The helpers live in `hands_on_ai.rag.utils`: ```python from hands_on_ai.rag.utils import ( load_text_file, chunk_text, get_embeddings, save_index_with_sources, get_top_k, ) # Build an index chunks, sources = [], [] for path in ["notes.txt"]: text = load_text_file(path) # supports .txt, .md, .pdf, .docx for chunk in chunk_text(text): chunks.append(chunk) sources.append(path) vectors = get_embeddings(chunks) save_index_with_sources(vectors, chunks, sources, "index.npz") # Retrieve and answer results, scores = get_top_k("What is the topic?", "index.npz", k=3, return_scores=True) # results is a list of (chunk, source) tuples context = "\n\n".join(chunk for chunk, _ in results) from hands_on_ai.chat import get_response print(get_response(f"Answer from this context only:\n{context}\n\nQuestion: What is the topic?")) ``` Key signature: `get_top_k(query, index_path, k=3, return_scores=False)`. It embeds the query, loads the index, runs the similarity search, and returns `(chunk, source)` tuples (plus a list of scores when `return_scores=True`). Note: there are no `index_documents` or `query_documents` functions. Use the `hands_on_ai.rag.utils` helpers above, or the CLI. ### RAG CLI ```bash rag index path/to/docs/ # writes ~/.hands-on-ai/index.npz by default rag ask "What does TCP do?" --context # -c shows retrieved context rag ask "Compare TCP and UDP" --scores # -s shows similarity scores rag ask "..." --k 5 # number of chunks to retrieve rag interactive ``` ## Module 3: agent An agent is a loop: the model decides which tool to call, your Python runs it, and the result goes back to the model. Tools are just functions. ### The tool contract A tool is a function that takes a single string and returns a string. Register it with `register_tool(name, description, function)`. The model reads the description to decide when and how to use the tool. ```python from hands_on_ai.agent import run_agent, register_tool def shout(text): """Return the text in uppercase.""" return text.upper() register_tool("shout", "Convert text to uppercase. Input: the text.", shout) print(run_agent("Use the shout tool on 'hello world'")) ``` `run_agent(prompt, model=None, format="auto", max_iterations=5, verbose=False)`. With `format="auto"` the library routes small local models to a JSON tool-calling format (more reliable for them) and capable models to ReAct. ### Built-in tools ```python from hands_on_ai.agent import run_agent, register_calculator_tool register_calculator_tool() print(run_agent("What is 23 * 19? Use the calculator tool.")) ``` Safety note: do NOT build a calculator with `eval`. Emptying `__builtins__` is not a sandbox and is escapable to remote code execution. The built-in calculator parses expressions with `ast` and walks only the allowed nodes. Write tools that parse input safely and return a string; handle errors by returning an error string rather than raising. ### Agent CLI ```bash agent ask "Calculate the area of a circle with radius 5" agent interactive ``` ## Module 4: workflow Orchestrate multi-step tasks as a folder of numbered stages ("folders over frameworks"). One model walks the stages in order, writing a readable file at each step. It is sequential and human-in-the-loop by design. ``` workspace/ ├── CONTEXT.md # optional shared system prompt ├── references/ # stable rules (the "factory") └── stages/ ├── 01_research/ │ ├── CONTEXT.md # instructions for this stage │ └── output/ # output.md is written here └── 02_draft/ ├── CONTEXT.md └── output/ ``` ```python from hands_on_ai.workflow import Pipeline, init_workspace init_workspace("essay", ["research", "draft"], system="You are a writing assistant.") # edit stages/01_research/CONTEXT.md and stages/02_draft/CONTEXT.md pipe = Pipeline("essay") pipe.status() # [ ] 01_research [ ] 02_draft pipe.run_next() # runs stage 01, writes output.md, then stops for review pipe.run_next() # runs stage 02 using stage 01's (reviewed) output ``` `run_next()` runs one stage and stops so you can read or edit the output file before continuing. `run_all()` runs the rest without stopping. Each stage's prompt is built from its `CONTEXT.md`, any `references/` files, and the previous stage's output. ## Module 5: eval Use an LLM to score an output against criteria you define (the LLM-as-judge pattern). Useful for "is this answer good?" without hand-writing graders. ```python from hands_on_ai.eval import judge verdict = judge( output="Paris is the capital of France.", criteria="accurate and concise", question="What is the capital of France?", ) print(verdict["score"]) # an int 1..scale (default scale 5), or None print(verdict["reasoning"]) # one short sentence ``` `judge(output, criteria, question=None, model=None, scale=5)` returns a dict with `score`, `reasoning`, and `raw`. Treat the score as a signal, not a verdict: judges can be inconsistent, so use clear criteria and average several runs. ## Providers Any OpenAI-compatible provider works by setting `HANDS_ON_AI_SERVER`, `HANDS_ON_AI_API_KEY`, and `HANDS_ON_AI_MODEL`. - Local: Ollama (`http://localhost:11434`, no key), LocalAI, vLLM - Cloud: OpenAI (`https://api.openai.com`), OpenRouter (`https://openrouter.ai/api`), Together AI (`https://api.together.xyz`), Groq (`https://api.groq.com/openai`), Google Gemini (`https://generativelanguage.googleapis.com/v1beta/openai`), Hugging Face - For an authenticated Ollama server, set both the server URL and the API key. ## Educational philosophy Hands-On AI is small on purpose. It captures the essence of each idea in code you can read in an afternoon, and leaves out production concerns (retries, dashboards, streaming at scale, vector databases, evaluation harnesses). It is an on-ramp: once the ideas click, move to production tools (LlamaIndex and a vector database for RAG; Instructor or Pydantic AI for structured output; LangChain, LangGraph, or a provider's agent SDK for orchestration). The point is understanding, not shipping. ## Troubleshooting ```bash handsonai doctor # checks connection, shows resolved config, lists models ``` - Connection errors: check the server URL and (for cloud) the API key. - Model not found: for Ollama, run `ollama pull ` first. - RAG needs an embedding model: `ollama pull nomic-embed-text` for local Ollama. ## Learn more - Documentation: https://michael-borck.github.io/hands-on-ai/ - Source: https://github.com/michael-borck/hands-on-ai - DeepWiki (AI chat interface to the repo): https://deepwiki.com/michael-borck/hands-on-ai