# Humble AI CLI Lightweight terminal client for conversational LLM sessions with OpenAI or Ollama backends. The app preserves chat context, streams responses, and keeps a local history of every conversation. ## Features - Interactive REPL with streaming responses and “Thinking…” indicators. - Remembers conversation context per session and persists transcripts to `~/.humble-ai-cli/sessions/`. - Optionally chunks context messages when `ollamaContextChunkSize` is set for Ollama models, splitting at the configured BPE token limit to avoid oversized prompts while keeping OpenAI runs unchunked. - Limits how many prior turns are sent back to the model via the `contextRetentionTurns` setting (defaults to the last 3 turns, supports disabling or sending the full history). - Works with either OpenAI or Ollama providers as defined in `~/.humble-ai-cli/config.json`. - Ships with a built-in system prompt and appends optional user rules from `~/.humble-ai-cli/user-rules.md`. - Runs scripted conversations via `humble-ai-cli exec `, executing multiple prompts from a workflow file in a single session. - Built-in slash commands: - `/help` – show available commands. - `/new` – start a fresh session (clears in-memory history). - `/set-model` – select the active model from configured entries. - `/set-tool-mode` – switch MCP tool calls between manual confirmation and auto execution. - `/mcp` – display enabled MCP servers and the tools they expose. - `/toggle-mcp` – enable or disable MCP servers defined in `mcp-servers.json`. - `/exit` – quit the program; pressing `Ctrl+D` on an empty prompt exits as well, while `Ctrl+C` now only cancels an in-progress response. ## Quick Start 1. Configure at least one model in `~/.humble-ai-cli/config.json` (the CLI stays idle until a model is defined). Set a single entry to `"active": true` and include the required provider fields such as `apiKey` (OpenAI) or `baseUrl` (Ollama). Enable automatic MCP tool execution up front with `"toolCallMode": "auto"` if you do not want to confirm each call. 2. (Optional) Register MCP servers in `~/.humble-ai-cli/mcp-servers.json` under a top-level `mcpServers` object. Launch the CLI and use `/toggle-mcp` to enable or disable entries without editing the file, and `/mcp` to list the active servers and their functions. 3. Run the CLI (`go run ./...` from the repo or the `humble-ai-cli` binary) and start chatting. Use `/set-model` to switch between configured models at runtime. 4. To script a conversation, copy `WORKFLOW.sample.md`, update only the model in the `# CONFIGS` section (and MCP servers if needed), then execute `humble-ai-cli exec `. The workflow file bundles config, MCP definitions, and prompts so you can run a repeatable session in one command. ## Prerequisites - Go 1.25.2 (or a Go toolchain that supports a compatible `go` version). Verify with: `go version` - Network access to your chosen provider (OpenAI API or local/remote Ollama). ## Configuration Create the configuration directory if it does not exist: ```bash mkdir -p ~/.humble-ai-cli/sessions ``` Add provider and model details to `~/.humble-ai-cli/config.json`, for example: ```json { "models": [ { "name": "gpt-5", "provider": "openai", "apiKey": "sk-...", "active": true }, { "name": "qwen3:30b-a3b-instruct-2507-q4_K_M", "provider": "ollama", "baseUrl": "http://localhost:11434" } ], "ollamaContextChunkSize": 0, "contextRetentionTurns": 5, "ollamaNumCtx": 50000, "logLevel": "info", "toolCallMode": "auto" } ``` `ollamaContextChunkSize` controls how many BPE tokens can be included in a single context message before it is split into multiple chunks when using Ollama. Omit the field or set it to 0 (the default) to keep chunking disabled; provide a positive value to enable chunking at that token limit. OpenAI models always send context as-is regardless of the value. `contextRetentionTurns` decides how many user-assistant turn pairs are sent alongside each new user prompt. Omit the field to keep the default of the last 3 turns, set it to 0 to send only the latest user prompt without prior context, or use a negative value to always send the full history (no trimming). When a dangling user-only entry exists in history it is treated as a full turn for retention purposes. `ollamaNumCtx` sets the `num_ctx` option sent to the Ollama chat API. Provide a positive number to cap the model context size, or omit/set it to 0 to fall back to the CLI default of 30000 tokens (the value is always sent to Ollama as `num_ctx`). Optional: add additional guardrails in `~/.humble-ai-cli/user-rules.md`. The CLI creates the file if missing and appends its contents to the built-in system prompt for every request. Set `active` to `true` for the model you want the CLI to use by default. Only one model should be active at a time. Set `toolCallMode` to `auto` to automatically run approved MCP tool calls without the confirmation prompt (the default `manual` mode keeps the confirmation step). You can also adjust this within the CLI via `/set-tool-mode auto` or `/set-tool-mode manual`. ## Workflow Execution - Run `humble-ai-cli exec path/to/workflow.md` to execute a predefined conversation and exit after completion. - Format (see `WORKFLOW.sample.md`): the `# CONFIGS` section may include `## Basic Config` (JSON matching `config.json`) and `## MCP Servers` (JSON with the `mcpServers` object from `mcp-servers.json`). If either subsection is missing, the CLI falls back to `$HOME/.humble-ai-cli/config.json` or `$HOME/.humble-ai-cli/mcp-servers.json` respectively. - When a Basic Config block is present, only that JSON is used—missing fields take program defaults rather than values from local config files. It must declare an active model; OpenAI entries require `apiKey` and Ollama entries require `baseUrl`. Missing required fields or malformed workflow files produce an error and abort execution. - The optional `# USER RULES` section replaces `$HOME/.humble-ai-cli/user-rules.md` during workflow execution. If the section exists with content, it is appended to the system prompt; if the section is present but empty, no user rules are added. If the section is absent, the CLI falls back to the local `user-rules.md` file (creating it if missing). - Under `# WORKFLOWS`, each `##` heading marks a step; only the body text becomes the user prompt. Steps run in order within a single session so later prompts reuse earlier context. - Workflow output shows only the final answer for each prompt. Startup guides, waiting messages, and MCP call summaries are suppressed unless `toolCallMode` is `manual`, in which case MCP call details and the Y/N confirmation prompt still appear before execution. ### Logging - Logs are written to `~/.humble-ai-cli/logs/application-hac-YYYY-MM-DD.log`. - Set `logLevel` (debug, info, warn, error) in `config.json` to control verbosity. Debug level includes detailed LLM and MCP traces. ## MCP Server Configuration - Ensure the config directory exists: `mkdir -p ~/.humble-ai-cli`. - Define every MCP server inside a single file `~/.humble-ai-cli/mcp-servers.json`. The root object must contain `mcpServers`, whose properties are server names. Example: ```json { "mcpServers": { "calculator": { "description": "Adds or subtracts numbers for quick estimates.", "enabled": true, "command": "/usr/local/bin/mcp-calculator", "args": ["--port=0"], "env": { "API_TOKEN": "secret" } }, "remote-sse": { "description": "Hosted SSE tool endpoint.", "url": "https://your-server.example.com/mcp/sse", "type": "sse", "headers": { "Authorization": "Bearer token" } }, "remote-http": { "description": "Streamable HTTP endpoint.", "url": "https://your-server.example.com/mcp", "type": "streamable-http" } } } ``` - `command` servers spawn a local process (passing `args` and `env`). - `type` controls how the CLI connects: - Allowed values: `stdio`, `sse`, `streamable-http`. - If only `command` is provided you can omit `type` (defaults to `stdio`), but if you set it explicitly it must be `stdio`. - If a `url` is provided you must set `type`. Use `sse` or `streamable-http`. When both `command` and `url` exist, `type` decides which transport is used (`stdio` → command, `sse`/`streamable-http` → URL). - For remote servers, set `headers` to forward HTTP headers (if omitted, existing `env` entries are still forwarded for compatibility). SSE endpoints that never emit the required `endpoint` event are automatically retried using the streamable HTTP protocol, so existing configs remain resilient. - When the LLM requests a tool call, the CLI prints the server name and description. In `manual` mode it then asks `Call now? (Y/N)`; in `auto` mode it executes immediately after printing the summary. Toggle the behaviour with `/set-tool-mode`. - On first launch the CLI auto-creates an empty `~/.humble-ai-cli/user-rules.md` (if missing) and lists all enabled MCP servers so the LLM understands which tools are available. - Use `/toggle-mcp` inside the CLI to quickly enable or disable specific MCP servers without manually editing the JSON file. ### Prompting Example ``` Please double-check the shipping fee by calling the MCP `shipping-calculator` tool with { "weightKg": 1.8, "distanceKm": 120 } and summarize the total cost. ``` The assistant will pause at the confirmation step, run the MCP tool after approval, and then incorporate the tool result into its answer. ## Running the CLI From the project root: ```bash go run ./... ``` Follow the on-screen prompt to enter questions or slash commands. If no active model is set, the app guides you through `/set-model`. ## Testing Execute all tests (requires Go toolchain): ```bash go test ./... ``` ## Building Use the provided build script to generate platform-specific binaries with embedded version metadata: ```bash ./build.sh ``` The script derives the version from the latest git tag (fallback `dev`), injects both the version and the build timestamp via `-ldflags`, and outputs the following artifacts under `dist/`: - `humble-ai-cli` (linux/amd64) - `humble-ai-cli_amd64.exe` (windows/amd64) - `humble-ai-cli_arm64.exe` (windows/arm64) If you only need a quick local build for your current platform, you can still run `go build -o humble-ai-cli ./...`, but version/date values will remain at their default (`dev`, `unknown`).