---
name: run-llm-inference
title: "Run LLM Inference"
description: "Run large language model inference through the Telnyx Inference API using an OpenAI-compatible chat completions interface from Node.js. Works as both an HTTP server and a CLI tool."
language: nodejs
framework: express
telnyx_products: [Inference]
channel: [ai]
---

# Run LLM Inference

Run large language model inference through the Telnyx Inference API using an OpenAI-compatible chat completions interface from Node.js. Works as both an HTTP server and a CLI tool.

## Why Telnyx

Telnyx is an **AI Communications Infrastructure** platform — voice, messaging, SIP, AI, and IoT on one private, global network. Inference runs on Telnyx-owned hardware co-located with the telephony network, so you get an OpenAI-compatible API, low-latency responses, and a single API key that also reaches voice, SMS, and SIP.

## Telnyx API Endpoints Used

- **Chat Completions**: `POST /v2/ai/chat/completions` -- [API reference](https://developers.telnyx.com/api/inference/inference-embedding/post-chat-completions)

## Architecture

```
  HTTP request / CLI arg
        │
        ▼
  ┌────────────────────┐
  │  Express app        │
  │  /inference/chat    │
  │  /inference/ask     │
  └─────────┬──────────┘
            │  POST /v2/ai/chat/completions
            ▼
  ┌────────────────────┐
  │  Telnyx Inference   │
  └─────────┬──────────┘
            │
            └──► completion text
```

## Environment Variables

Copy `.env.example` to `.env` and fill in:

| Variable | Type | Example | Required | Description | Where to get it |
|----------|------|---------|----------|-------------|-----------------|
| `TELNYX_API_KEY` | `string` | `KEY0123456789ABCDEF` | **yes** | Telnyx API v2 key (used as the Bearer token) | [Portal](https://portal.telnyx.com/api-keys) |
| `AI_MODEL` | `string` | `meta-llama/Llama-3.3-70B-Instruct` | no | Default model slug. Falls back to `meta-llama/Llama-3.3-70B-Instruct` | [Inference docs](https://developers.telnyx.com/docs/inference) |
| `PORT` | `number` | `5000` | no | HTTP port for server mode. Defaults to `5000` | — |

## Setup

```bash
git clone https://github.com/team-telnyx/telnyx-code-examples.git
cd telnyx-code-examples/run-llm-inference-nodejs
cp .env.example .env    # ← fill in your credentials
npm install
node server.js          # starts on http://localhost:5000
```

### CLI mode

Pass a question as command-line arguments to ask a single question and print the answer, without starting the server:

```bash
node server.js "What is the capital of France?"
```

To force server mode explicitly, pass `--serve`:

```bash
node server.js --serve
```

## API Reference

### `GET /health`

Liveness check. Returns the configured default model.

```bash
curl http://localhost:5000/health
```

**Response:**

```json
{
  "status": "ok",
  "model": "meta-llama/Llama-3.3-70B-Instruct"
}
```

### `POST /inference/chat`

Run a full chat completion. Pass a `messages` array (OpenAI-compatible) and optional generation parameters. Returns the raw Telnyx Inference response.

```bash
curl -X POST http://localhost:5000/inference/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Write a haiku about the ocean."}
    ],
    "max_tokens": 200,
    "temperature": 0.7
  }'
```

**Response:**

```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Vast and endless blue\nWaves whisper to the shoreline\nMoonlight on the deep"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 18,
    "total_tokens": 41
  }
}
```

### `POST /inference/ask`

Ask a single question and get just the answer text back. An optional `system_prompt` steers the model.

```bash
curl -X POST http://localhost:5000/inference/ask \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is the capital of France?",
    "system_prompt": "Answer in one word."
  }'
```

**Response:**

```json
{
  "answer": "Paris."
}
```

## Troubleshooting

| Issue | Cause | Fix |
|-------|-------|-----|
| Connection refused on port 5000 | App isn't running, or `PORT` differs | Run `node server.js` and confirm the port from the startup log. Set `PORT` in `.env` if 5000 is taken. |
| `401 Unauthorized` / `Inference API error: 401` | `TELNYX_API_KEY` is missing or invalid | Generate a new key at [portal.telnyx.com/api-keys](https://portal.telnyx.com/api-keys). Telnyx keys start with `KEY`. |
| `Inference API error: 404` | The requested `model` slug is not available | Use `meta-llama/Llama-3.3-70B-Instruct` or check the model list in the [Inference docs](https://developers.telnyx.com/docs/inference). |
| `Request body must include 'messages' array` | `POST /inference/chat` called without `messages` | Include a non-empty `messages` array in the JSON body. |
| `Request body must include 'question'` | `POST /inference/ask` called without `question` | Include a `question` string in the JSON body. |
| Slow or timed-out responses | Large `max_tokens` or a large model | Lower `max_tokens` or pick a smaller model. |

## Related Examples

- [build-voice-ai-agent-nodejs](https://raw.githubusercontent.com/team-telnyx/telnyx-code-examples/main/build-voice-ai-agent-nodejs/README.md) - Voice AI agent in Node.js
- [create-ai-assistant-nodejs](https://raw.githubusercontent.com/team-telnyx/telnyx-code-examples/main/create-ai-assistant-nodejs/README.md) - Create a managed AI Assistant
- [run-llm-inference-python](https://raw.githubusercontent.com/team-telnyx/telnyx-code-examples/main/run-llm-inference-python/README.md) - The same example in Python

## Resources

- [Inference Guide](https://developers.telnyx.com/docs/inference)
- [Chat Completions API reference](https://developers.telnyx.com/api/inference/inference-embedding/post-chat-completions)
- [Node.js SDK](https://developers.telnyx.com/development/sdk/node)
- [Telnyx Inference product page](https://telnyx.com/products/inference)
- [Inference pricing](https://telnyx.com/pricing/inference)