# FunASR Model Selection Guide

Use this guide when you are choosing a first model, comparing FunASR with Whisper or a cloud ASR provider, or deciding which model alias to expose through the OpenAI-compatible API.

## Fast default path

If you are unsure, start with **SenseVoice-Small**:

```python
from funasr import AutoModel

model = AutoModel(
    model="iic/SenseVoiceSmall",
    vad_model="fsmn-vad",
    spk_model="cam++",
    device="cuda",  # use "cpu" for a portable smoke test
)
result = model.generate(input="meeting.wav")
```

It is the best first choice for demos, private APIs, multilingual transcription, speaker-aware meeting transcripts, and agent voice input. Switch only when your workload has a clear requirement such as Mandarin production accuracy, streaming latency, or LLM-based ASR experiments.

## Decision table

| Need | Start with | Why | Next doc |
|---|---|---|---|
| Fast multilingual private transcription | SenseVoice-Small | Strong default with ASR, emotion tags, audio event tags, and CPU viability. | [README quick start](../README.md#quick-start) |
| Mandarin production ASR | Paraformer-Large | Mature Chinese ASR path with VAD and punctuation. | [Tutorial](./tutorial/README.md) |
| English-only route in the OpenAI API example | `paraformer-en` alias | Smaller English route for API compatibility checks. | [OpenAI API example](../examples/openai_api/) |
| LLM-based ASR or 31-language experiments | Fun-ASR-Nano | LLM-based model path; use vLLM when decoder throughput matters. | [vLLM guide](./vllm_guide.md) |
| Live captions or call-center streams | Runtime WebSocket service | Designed for long-lived streaming sessions and partial results. | [Runtime service docs](../runtime/readme.md) |
| Batch archive processing | SenseVoice-Small or Paraformer-Large | Stable offline transcription path; caller owns manifests, retries, and logs. | [Batch ASR example](../examples/batch_asr_improved.py) |
| Migration from Whisper/cloud ASR | SenseVoice-Small first, then benchmark alternatives | Gives a strong baseline before deeper model-specific tuning. | [Migration guide](./migration_from_whisper.md) |

## OpenAI-compatible API aliases

The `examples/openai_api` server exposes short aliases so application teams do not need to know model repository IDs:

| Alias | Underlying path | Use when |
|---|---|---|
| `sensevoice` | `iic/SenseVoiceSmall` | You want the default private speech API with multilingual ASR, event tags, and good CPU/GPU behavior. |
| `paraformer` | `paraformer-zh` with VAD and punctuation | You want a Mandarin-oriented production route. |
| `paraformer-en` | `paraformer-en` with VAD | You want a compact English route in OpenAI-style clients. |
| `fun-asr-nano` | `FunAudioLLM/Fun-ASR-Nano-2512` | You are evaluating LLM-based ASR, 31-language coverage, or vLLM acceleration. |

Check the live service before wiring clients:

```bash
curl http://localhost:8000/v1/models
python examples/openai_api/smoke_test.py --base-url http://localhost:8000 --model sensevoice
```

For SDK, JavaScript, workflow, Postman, OpenAPI, Docker, and Kubernetes paths, start from the [OpenAI API example](../examples/openai_api/).

## Runtime choice by workload

| Workload | Runtime path | Notes |
|---|---|---|
| Notebook or one-off evaluation | Python `AutoModel` | Shortest path for install, model download, and output-shape checks. |
| Internal HTTP service | OpenAI-compatible API | Reuse OpenAI-style clients, Dify, n8n, LangChain, AutoGen, and HTTP nodes. |
| Repeatable local container demo | Docker Compose API | CPU-first smoke test; adapt the image before using CUDA. |
| Internal cluster service | Kubernetes API template | Private `ClusterIP`, persistent model cache, `/health` probes, and port-forward smoke test. |
| Live audio | Runtime WebSocket service | Validate chunk size, VAD, endpointing, reconnects, and client backpressure with real audio. |
| LLM-based ASR throughput | vLLM path for Fun-ASR-Nano | vLLM accelerates autoregressive decoding; it does not apply to non-autoregressive Paraformer. |

See the [deployment matrix](./deployment_matrix.md) when you are choosing between these paths.

## Benchmark before committing

Do not choose a model from a single clean demo file. Use a small representative set first:

- 20-50 audio files that cover short clips, long meetings, silence, noise, overlapping speakers, domain vocabulary, and target languages.
- Record model name, model revision, FunASR version, device, CPU/GPU type, CUDA/PyTorch version, runtime path, batch size, and whether warmup/model download time is excluded.
- Track quality with your normal WER/CER or human review process, not only transcript readability.
- Track latency, throughput, memory, failures, and upload size limits together.
- Keep at least one public sample for smoke tests and at least one private realistic sample for deployment validation.

For migration work, use the [migration benchmark example](../examples/migration/) and the [migration guide](./migration_from_whisper.md).

## Practical recommendations

- Start with SenseVoice-Small for demos, private APIs, agent voice input, and multilingual workloads.
- Use Paraformer when your production traffic is primarily Mandarin and you want the mature non-autoregressive ASR path.
- Use Fun-ASR-Nano when you specifically want the LLM-based model path or vLLM acceleration experiments.
- Use the streaming runtime when partial results and long-lived connections matter more than a single final transcript.
- Keep model aliases stable in production runbooks so benchmark results and bug reports are reproducible.
- Open a [Deployment Help issue](https://github.com/modelscope/FunASR/issues/new?template=deployment_help.md) with model, device, command, logs, audio duration, and runtime path when you get stuck.