# Ollama JavaScript Library The Ollama JavaScript library provides the easiest way to integrate your JavaScript project with [Ollama](https://github.com/jmorganca/ollama). ## Getting Started ``` npm i ollama ``` ## Usage ```javascript import ollama from 'ollama' const response = await ollama.chat({ model: 'llama3.1', messages: [{ role: 'user', content: 'Why is the sky blue?' }], }) console.log(response.message.content) ``` ### Browser Usage To use the library without node, import the browser module. ```javascript import ollama from 'ollama/browser' ``` ## Streaming responses Response streaming can be enabled by setting `stream: true`, modifying function calls to return an `AsyncGenerator` where each part is an object in the stream. ```javascript import ollama from 'ollama' const message = { role: 'user', content: 'Why is the sky blue?' } const response = await ollama.chat({ model: 'llama3.1', messages: [message], stream: true, }) for await (const part of response) { process.stdout.write(part.message.content) } ``` ## Cloud Models Run larger models by offloading to Ollama’s cloud while keeping your local workflow. [You can see models currently available on Ollama's cloud here.](https://ollama.com/search?c=cloud) ### Run via local Ollama 1) Sign in (one-time): ``` ollama signin ``` 2) Pull a cloud model: ``` ollama pull gpt-oss:120b-cloud ``` 3) Use as usual (offloads automatically): ```javascript import { Ollama } from 'ollama' const ollama = new Ollama() const response = await ollama.chat({ model: 'gpt-oss:120b-cloud', messages: [{ role: 'user', content: 'Explain quantum computing' }], stream: true, }) for await (const part of response) { process.stdout.write(part.message.content) } ``` ### Cloud API (ollama.com) Access cloud models directly by pointing the client at `https://ollama.com`. 1) Create an [API key](https://ollama.com/settings/keys), then set the `OLLAMA_API_KEY` environment variable: ``` export OLLAMA_API_KEY=your_api_key ``` 2) Generate a response via the cloud API: ```javascript import { Ollama } from 'ollama' const ollama = new Ollama({ host: 'https://ollama.com', headers: { Authorization: 'Bearer ' + process.env.OLLAMA_API_KEY }, }) const response = await ollama.chat({ model: 'gpt-oss:120b', messages: [{ role: 'user', content: 'Explain quantum computing' }], stream: true, }) for await (const part of response) { process.stdout.write(part.message.content) } ``` ## API The Ollama JavaScript library's API is designed around the [Ollama REST API](https://github.com/jmorganca/ollama/blob/main/docs/api.md) ### chat ```javascript ollama.chat(request) ``` - `request` ``: The request object containing chat parameters. - `model` `` The name of the model to use for the chat. - `messages` ``: Array of message objects representing the chat history. - `role` ``: The role of the message sender ('user', 'system', or 'assistant'). - `content` ``: The content of the message. - `images` ``: (Optional) Images to be included in the message, either as Uint8Array or base64 encoded strings. - `tool_name` ``: (Optional) Add the name of the tool that was executed to inform the model of the result - `format` ``: (Optional) Set the expected format of the response (`json`). - `stream` ``: (Optional) When true an `AsyncGenerator` is returned. - `think` ``: (Optional) Enable model thinking. Use `true`/`false` or specify a level. Requires model support. - `logprobs` ``: (Optional) Return log probabilities for tokens. Requires model support. - `top_logprobs` ``: (Optional) Number of top log probabilities to return per token when `logprobs` is enabled. - `keep_alive` ``: (Optional) How long to keep the model loaded. A number (seconds) or a string with a duration unit suffix ("300ms", "1.5h", "2h45m", etc.) - `tools` ``: (Optional) A list of tool calls the model may make. - `options` ``: (Optional) Options to configure the runtime. - Returns: `` ### generate ```javascript ollama.generate(request) ``` - `request` ``: The request object containing generate parameters. - `model` `` The name of the model to use for the chat. - `prompt` ``: The prompt to send to the model. - `suffix` ``: (Optional) Suffix is the text that comes after the inserted text. - `system` ``: (Optional) Override the model system prompt. - `template` ``: (Optional) Override the model template. - `raw` ``: (Optional) Bypass the prompt template and pass the prompt directly to the model. - `images` ``: (Optional) Images to be included, either as Uint8Array or base64 encoded strings. - `format` ``: (Optional) Set the expected format of the response (`json`). - `stream` ``: (Optional) When true an `AsyncGenerator` is returned. - `think` ``: (Optional) Enable model thinking. Use `true`/`false` or specify a level. Requires model support. - `logprobs` ``: (Optional) Return log probabilities for tokens. Requires model support. - `top_logprobs` ``: (Optional) Number of top log probabilities to return per token when `logprobs` is enabled. - `keep_alive` ``: (Optional) How long to keep the model loaded. A number (seconds) or a string with a duration unit suffix ("300ms", "1.5h", "2h45m", etc.) - `options` ``: (Optional) Options to configure the runtime. - Returns: `` ### pull ```javascript ollama.pull(request) ``` - `request` ``: The request object containing pull parameters. - `model` `` The name of the model to pull. - `insecure` ``: (Optional) Pull from servers whose identity cannot be verified. - `stream` ``: (Optional) When true an `AsyncGenerator` is returned. - Returns: `` ### push ```javascript ollama.push(request) ``` - `request` ``: The request object containing push parameters. - `model` `` The name of the model to push. - `insecure` ``: (Optional) Push to servers whose identity cannot be verified. - `stream` ``: (Optional) When true an `AsyncGenerator` is returned. - Returns: `` ### create ```javascript ollama.create(request) ``` - `request` ``: The request object containing create parameters. - `model` `` The name of the model to create. - `from` ``: The base model to derive from. - `stream` ``: (Optional) When true an `AsyncGenerator` is returned. - `quantize` ``: Quanization precision level (`q8_0`, `q4_K_M`, etc.). - `template` ``: (Optional) The prompt template to use with the model. - `license` ``: (Optional) The license(s) associated with the model. - `system` ``: (Optional) The system prompt for the model. - `parameters` `>`: (Optional) Additional model parameters as key-value pairs. - `messages` ``: (Optional) Initial chat messages for the model. - `adapters` `>`: (Optional) A key-value map of LoRA adapter configurations. - Returns: `` Note: The `files` parameter is not currently supported in `ollama-js`. ### delete ```javascript ollama.delete(request) ``` - `request` ``: The request object containing delete parameters. - `model` `` The name of the model to delete. - Returns: `` ### copy ```javascript ollama.copy(request) ``` - `request` ``: The request object containing copy parameters. - `source` `` The name of the model to copy from. - `destination` `` The name of the model to copy to. - Returns: `` ### list ```javascript ollama.list() ``` - Returns: `` ### show ```javascript ollama.show(request) ``` - `request` ``: The request object containing show parameters. - `model` `` The name of the model to show. - `system` ``: (Optional) Override the model system prompt returned. - `template` ``: (Optional) Override the model template returned. - `options` ``: (Optional) Options to configure the runtime. - Returns: `` ### embed ```javascript ollama.embed(request) ``` - `request` ``: The request object containing embedding parameters. - `model` `` The name of the model used to generate the embeddings. - `input` ` | `: The input used to generate the embeddings. - `truncate` ``: (Optional) Truncate the input to fit the maximum context length supported by the model. - `keep_alive` ``: (Optional) How long to keep the model loaded. A number (seconds) or a string with a duration unit suffix ("300ms", "1.5h", "2h45m", etc.) - `options` ``: (Optional) Options to configure the runtime. - Returns: `` ### web search - Web search capability requires an Ollama account. [Sign up on ollama.com](https://ollama.com/signup) - Create an API key by visiting https://ollama.com/settings/keys ```javascript ollama.webSearch(request) ``` - `request` ``: The search request parameters. - `query` ``: The search query string. - `max_results` ``: (Optional) Maximum results to return (default 5, max 10). - Returns: `` ### web fetch ```javascript ollama.webFetch(request) ``` - `request` ``: The fetch request parameters. - `url` ``: The URL to fetch. - Returns: `` ### ps ```javascript ollama.ps() ``` - Returns: `` ### version ```javascript ollama.version() ``` - Returns: `` ### abort ```javascript ollama.abort() ``` This method will abort **all** streamed generations currently running with the client instance. If there is a need to manage streams with timeouts, it is recommended to have one Ollama client per stream. All asynchronous threads listening to streams (typically the `for await (const part of response)`) will throw an `AbortError` exception. See [examples/abort/abort-all-requests.ts](examples/abort/abort-all-requests.ts) for an example. ## Custom client A custom client can be created with the following fields: - `host` ``: (Optional) The Ollama host address. Default: `"http://127.0.0.1:11434"`. - `fetch` ``: (Optional) The fetch library used to make requests to the Ollama host. - `headers` ``: (Optional) Custom headers to include with every request. ```javascript import { Ollama } from 'ollama' const ollama = new Ollama({ host: 'http://127.0.0.1:11434' }) const response = await ollama.chat({ model: 'llama3.1', messages: [{ role: 'user', content: 'Why is the sky blue?' }], }) ``` ## Custom Headers You can set custom headers that will be included with every request: ```javascript import { Ollama } from 'ollama' const ollama = new Ollama({ host: 'http://127.0.0.1:11434', headers: { Authorization: 'Bearer ', 'X-Custom-Header': 'custom-value', 'User-Agent': 'MyApp/1.0', }, }) ``` ## Building To build the project files run: ```sh npm run build ```