---
name: huggingface
description: |
  Import GGUF models from HuggingFace into Ollama. Pull models directly
  using the hf.co/ prefix, track download progress, and use imported
  models for inference.
---

# HuggingFace Model Import

## Overview

Ollama can directly pull GGUF models from HuggingFace using the `hf.co/` prefix. This enables access to thousands of quantized models beyond the official Ollama library.

## Quick Reference

| Action | Syntax |
|--------|--------|
| Pull model | `hf.co/{org}/{repo}:{quantization}` |
| List models | `ollama.list()` |
| Use model | Same as any Ollama model |
| Delete model | `ollama.delete("hf.co/...")` |

## Model Naming Format

```
hf.co/{organization}/{repository}-GGUF:{quantization}
```

**Examples:**

```
hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M
hf.co/TheBloke/Llama-2-7B-Chat-GGUF:Q4_K_M
hf.co/microsoft/Phi-3-mini-4k-instruct-gguf:Q4_K_M
```

## Common Quantizations

| Quantization | Size | Quality | Use Case |
|--------------|------|---------|----------|
| Q2_K | Smallest | Lowest | Testing only |
| Q4_K_M | Medium | Good | Recommended default |
| Q5_K_M | Larger | Better | Quality-focused |
| Q6_K | Large | High | Near-original quality |
| Q8_0 | Largest | Highest | Maximum quality |

## Pull Model from HuggingFace

### With Progress Tracking

```python
import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

print(f"Pulling {HF_MODEL}...")

last_status = ""
for progress in ollama.pull(HF_MODEL, stream=True):
    status = progress.get("status", "")
    digest = progress.get("digest", "")
    total = progress.get("total")

    # Only print when status changes
    if status != last_status:
        if status == "pulling manifest":
            print(f"  {status}")
        elif status.startswith("pulling") and digest:
            short_digest = digest.split(":")[-1][:12] if ":" in digest else digest[:12]
            size_mb = (total / 1024 / 1024) if total else 0
            if size_mb > 100:
                print(f"  pulling {short_digest}... ({size_mb:.0f} MB)")
        elif status in ["verifying sha256 digest", "writing manifest", "success"]:
            print(f"  {status}")

        last_status = status

print("Model pulled successfully!")
```

### Simple Pull

```python
import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

# Non-streaming (blocks until complete)
ollama.pull(HF_MODEL)
print("Model pulled!")
```

## Verify Installation

```python
import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

models = ollama.list()
model_names = [m.get("model", "") for m in models.get("models", [])]

# Check for the HF model
hf_model_installed = any(
    "Nous-Hermes" in name or HF_MODEL in name
    for name in model_names
)

if hf_model_installed:
    print("Model is installed!")
    for name in model_names:
        if "Nous-Hermes" in name or "hf.co" in name:
            print(f"  Name: {name}")
else:
    print("Model not found")
```

## Show Model Details

```python
import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

model_info = ollama.show(HF_MODEL)

print(f"Model: {HF_MODEL}")
if "details" in model_info:
    details = model_info["details"]
    print(f"Family: {details.get('family', 'N/A')}")
    print(f"Parameter Size: {details.get('parameter_size', 'N/A')}")
    print(f"Quantization: {details.get('quantization_level', 'N/A')}")
```

## Use Imported Model

### Generate Text

```python
import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

result = ollama.generate(
    model=HF_MODEL,
    prompt="What is the capital of France?"
)
print(result["response"])
```

### Chat Completion

```python
import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

# Nous-Hermes-2 uses ChatML format natively
response = ollama.chat(
    model=HF_MODEL,
    messages=[
        {"role": "system", "content": "You are Hermes 2, a helpful AI assistant."},
        {"role": "user", "content": "Explain quantum computing in two sentences."}
    ]
)
print(response["message"]["content"])
```

## Delete Imported Model

```python
import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

ollama.delete(HF_MODEL)
print("Model deleted!")
```

## Popular HuggingFace Models

### General Purpose

| Model | HuggingFace Path | Size |
|-------|------------------|------|
| Nous-Hermes-2-Mistral | `hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M` | 4.4 GB |
| Llama-2-7B-Chat | `hf.co/TheBloke/Llama-2-7B-Chat-GGUF:Q4_K_M` | 4.1 GB |
| Mistral-7B-Instruct | `hf.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF:Q4_K_M` | 4.4 GB |

### Code Models

| Model | HuggingFace Path | Size |
|-------|------------------|------|
| CodeLlama-7B | `hf.co/TheBloke/CodeLlama-7B-Instruct-GGUF:Q4_K_M` | 4.1 GB |
| Phind-CodeLlama | `hf.co/TheBloke/Phind-CodeLlama-34B-v2-GGUF:Q4_K_M` | 20 GB |
| WizardCoder | `hf.co/TheBloke/WizardCoder-Python-7B-V1.0-GGUF:Q4_K_M` | 4.1 GB |

### Small/Fast Models

| Model | HuggingFace Path | Size |
|-------|------------------|------|
| Phi-3-mini | `hf.co/microsoft/Phi-3-mini-4k-instruct-gguf:Q4_K_M` | 2.4 GB |
| TinyLlama | `hf.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF:Q4_K_M` | 0.7 GB |

## Finding Models on HuggingFace

1. Go to [huggingface.co/models](https://huggingface.co/models)
2. Filter by:
   - **Library:** GGUF
   - **Task:** Text Generation
3. Look for models with `-GGUF` suffix
4. Check the "Files" tab for available quantizations

## Troubleshooting

### Model Not Found

**Symptom:** Error pulling model

**Check:**
- Repository exists on HuggingFace
- Repository has GGUF files
- Quantization tag is correct

```python
# Verify HuggingFace URL
# https://huggingface.co/{org}/{repo}/tree/main
```

### Download Fails

**Symptom:** Download interrupted or fails

**Fix:**
- Check internet connection
- Try again (Ollama resumes partial downloads)
- Check disk space

### Wrong Prompt Format

**Symptom:** Model gives poor responses

**Fix:**
- Check model card for correct prompt template
- Some models require specific formats (ChatML, Alpaca, etc.)

```python
# ChatML format example (Nous-Hermes-2)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
]

# The ollama library handles format conversion automatically
```

## When to Use This Skill

Use when:
- You need a model not in the official Ollama library
- Testing specific model variants
- Using specialized/fine-tuned models
- Comparing different quantizations

## Resources

- [Ollama Import Docs](https://docs.ollama.com/import)
- [HuggingFace Ollama Integration](https://huggingface.co/docs/hub/ollama)
- [TheBloke's GGUF Models](https://huggingface.co/TheBloke)

## Cross-References

- `bazzite-ai-jupyter:ollama` - Using imported models
- `bazzite-ai-jupyter:chat` - REST API for model management