# How to use the Apple Foundation Model from Python

Call Apple's on-device Foundation Model from Python using the official `openai` SDK, pointed at a local `apfel --serve`. 100% on-device, zero API cost, no network required for inference.

This guide shows the canonical patterns: one-shot completion, streaming, JSON mode, error handling, tool calling, and a real text-summarization example. Every code block was run against a live apfel server; the output below each snippet is the real unedited stdout.

Runnable scripts + tests: [Arthur-Ficial/apfel-guides-lab/scripts/python](https://github.com/Arthur-Ficial/apfel-guides-lab/tree/main/scripts/python).

## Prerequisites

- macOS 26+ Tahoe, Apple Silicon, Apple Intelligence enabled
- `brew install apfel`
- `apfel --serve` running (default port `11434`)
- Python 3.11+
- `pip install openai` (or `uv add openai`)

## 1. One-shot chat completion

Point the `openai` SDK at your local apfel server and call `chat.completions.create`:

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="apple-foundationmodel",
    messages=[
        {"role": "user", "content": "In one sentence, what is the Swift programming language?"},
    ],
    max_tokens=80,
)

print((response.choices[0].message.content or "").strip())
```

Real output:

```text
Swift is a modern, high-performance, and safe programming language developed by Apple for developing iOS, macOS, watchOS, and tvOS applications.
```

Lab script: [`01_oneshot.py`](https://github.com/Arthur-Ficial/apfel-guides-lab/blob/main/scripts/python/01_oneshot.py).

## 2. Streaming

Pass `stream=True` and iterate. Guard against empty `choices` on the final usage chunk:

```python
import sys
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")

stream = client.chat.completions.create(
    model="apple-foundationmodel",
    messages=[{"role": "user", "content": "List three Apple silicon chips, one per line."}],
    max_tokens=80,
    stream=True,
)

for chunk in stream:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta.content or ""
    sys.stdout.write(delta)
    sys.stdout.flush()
print()
```

Real output:

```text
Apple M1  
Apple M2  
Apple M3
```

Lab script: [`02_stream.py`](https://github.com/Arthur-Ficial/apfel-guides-lab/blob/main/scripts/python/02_stream.py).

## 3. JSON mode / structured output

Request `response_format: {"type": "json_object"}` and parse. apfel may wrap output in markdown fences - the fence-strip regex below handles both cases cleanly:

```python
import json, re
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="apple-foundationmodel",
    messages=[{
        "role": "user",
        "content": "Return JSON with fields 'chip', 'year', 'cores'. Describe the Apple M1 chip. Return ONLY JSON.",
    }],
    response_format={"type": "json_object"},
    max_tokens=120,
)

raw = (response.choices[0].message.content or "").strip()
raw = re.sub(r"^```(?:json)?\s*|\s*```$", "", raw, flags=re.MULTILINE).strip()
data = json.loads(raw)
print(json.dumps(data, indent=2, sort_keys=True))
```

Real output:

```json
{
  "chip": "Apple M1",
  "cores": 8,
  "year": 2020
}
```

Lab script: [`03_json.py`](https://github.com/Arthur-Ficial/apfel-guides-lab/blob/main/scripts/python/03_json.py).

## 4. Error handling

apfel returns honest HTTP errors for unsupported features. Embeddings return `501`:

```python
from openai import APIStatusError, OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")

try:
    client.embeddings.create(
        model="apple-foundationmodel",
        input="apfel runs 100% on-device.",
    )
except APIStatusError as e:
    print(f"Got expected error: HTTP {e.status_code} - {e.message}")
```

Real output:

```text
Got expected error: HTTP 501 - Error code: 501 - {'error': {'message': "Embeddings not supported by Apple's on-device model.", 'type': 'invalid_request_error'}}
```

Lab script: [`04_errors.py`](https://github.com/Arthur-Ficial/apfel-guides-lab/blob/main/scripts/python/04_errors.py).

## 5. Tool calling

Define a tool schema, send a prompt, handle the tool call, post the result, get the final answer:

```python
import json
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")

TOOLS = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current temperature in Celsius for a city.",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string", "description": "City name"}},
            "required": ["city"],
        },
    },
}]


def get_weather(city: str, **_: object) -> str:
    fake = {"Vienna": 14, "Cupertino": 19, "Tokyo": 11}
    return json.dumps({"city": city, "temp_c": fake.get(city, 15)})


messages = [{"role": "user", "content": "What is the temperature in Vienna right now?"}]

first = client.chat.completions.create(
    model="apple-foundationmodel", messages=messages, tools=TOOLS, max_tokens=256,
)

msg = first.choices[0].message
messages.append(msg.model_dump(exclude_none=True))

if msg.tool_calls:
    for call in msg.tool_calls:
        args = json.loads(call.function.arguments)
        result = get_weather(**args)
        messages.append({"role": "tool", "tool_call_id": call.id, "content": result})

    final = client.chat.completions.create(
        model="apple-foundationmodel", messages=messages, max_tokens=120,
    )
    print((final.choices[0].message.content or "").strip())
```

Real output:

```text
The current temperature in Vienna is 14°C.
```

Lab script: [`05_tools.py`](https://github.com/Arthur-Ficial/apfel-guides-lab/blob/main/scripts/python/05_tools.py).

## 6. Real example - summarize a file from stdin

```python
import sys
from openai import OpenAI

text = sys.stdin.read().strip()
if not text:
    sys.exit("usage: cat file.txt | python 06_example.py")

client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="apple-foundationmodel",
    messages=[
        {"role": "system", "content": "You are a concise summarizer. Reply with one short paragraph."},
        {"role": "user", "content": f"Summarize:\n\n{text}"},
    ],
    max_tokens=150,
)
print((response.choices[0].message.content or "").strip())
```

```bash
cat README.md | python 06_example.py
```

Real output (summarizing a paragraph about the M1 chip):

```text
The Apple M1 chip, released in November 2020, was Apple's first ARM-based system-on-a-chip for Mac computers. It features an 8-core CPU with four performance and four efficiency cores, plus an integrated GPU with up to 8 cores. The chip combines CPU, GPU, memory, and neural engine on a single die, delivering significant performance-per-watt improvements over the Intel chips it replaced.
```

Lab script: [`06_example.py`](https://github.com/Arthur-Ficial/apfel-guides-lab/blob/main/scripts/python/06_example.py).

## Troubleshooting

- **`Connection refused` on port 11434** - run `apfel --serve` first.
- **`Embeddings not supported`** - apfel is text-only; use sentence-transformers or another embedder for vectors.
- **`JSONDecodeError` in JSON mode** - keep the fence-strip regex; apfel sometimes wraps JSON in `` ```json ... ``` ``.
- **Empty streaming output** - make sure your client handles the final `usage` chunk with empty `choices`. The `if not chunk.choices: continue` above covers it.
- **Model refuses a tool call** - small on-device models occasionally decline. Retry the whole call.

## Tested with

- apfel v1.0.3
- macOS 26.3.1, Apple Silicon
- Python 3.11 / openai 2.31.0
- Date: 2026-04-16

Full runnable test suite + captured outputs: [apfel-guides-lab/tests/test_python.py](https://github.com/Arthur-Ficial/apfel-guides-lab/blob/main/tests/test_python.py).

## See also

- [nodejs.md](nodejs.md) - same thing from Node.js
- [ruby.md](ruby.md) / [php.md](php.md) - same thing from Ruby / PHP
- [bash-curl.md](bash-curl.md) - raw HTTP, no SDK
- [Arthur-Ficial/apfel-guides-lab](https://github.com/Arthur-Ficial/apfel-guides-lab) - runnable proof for all ten languages