# vllm-mlx Model Scorecard *Auto-generated on 2026-03-05 00:32 UTC* > **Tested on**: Apple M3 Ultra (256GB) > > **Methodology**: All suites use `enable_thinking: false`. Cache cleared between suites. See [README](README.md) for details. ## Comparison Table | Model | Quant | RAM | TTFT | Decode (s) | Decode (l) | Tools | Coding | Reasoning | General | Avg | Date | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | Devstral-Small-2-4bit | 4bit | 13.4 GB | 291ms | 22.5 t/s | 47.2 t/s | 17% | 90% | 70% | 70% | 62% | 2026-03-04 | | GLM-4.5-Air-4bit | 4bit | 60.3 GB | 708ms | 27.5 t/s | 53.6 t/s | 73% | 90% | 70% | 80% | 78% | 2026-03-04 | | GLM-4.7-Flash-8bit | 8bit | 31.9 GB | 362ms | 33.4 t/s | 57.2 t/s | 73% | 100% | 90% | 50% | 78% | 2026-03-04 | | GPT-OSS-20B-mxfp4-q8 | mxfp4-q8 | 12.1 GB | 339ms | 85.7 t/s | 124.2 t/s | 80% | 20% | 60% | 90% | 62% | 2026-03-05 | | Hermes-3-Llama-3.1-8B-4bit | 4bit | 4.6 GB | 152ms | 69.3 t/s | 122.9 t/s | 17% | 20% | 30% | 40% | 27% | 2026-03-04 | | MiniMax-M2.5-4bit | 4bit | 128.9 GB | 1.3s | 46.2 t/s | 49.9 t/s | 87% | 10% | 80% | 90% | 67% | 2026-03-04 | | Mistral-Small-3.2-4bit | 4bit | 13.4 GB | 1.1s | 27.6 t/s | 47.2 t/s | 17% | 80% | 60% | 60% | 54% | 2026-03-04 | | Qwen3-0.6B-4bit | 4bit | 0.4 GB | 94ms | 78.3 t/s | 364.7 t/s | 30% | 20% | 20% | 30% | 25% | 2026-03-04 | | Qwen3-Coder-Next-4bit | 4bit | 44.9 GB | 473ms | 41.5 t/s | 73.5 t/s | 90% | 90% | 70% | 70% | 80% | 2026-03-04 | | Qwen3-Coder-Next-6bit | 6bit | 64.8 GB | 642ms | 34.6 t/s | 65.6 t/s | 87% | 90% | 80% | 70% | 82% | 2026-03-04 | | Qwen3.5-122B-A10B-8bit | 8bit | 129.8 GB | 1.3s | 19.4 t/s | 42.7 t/s | 87% | 90% | 90% | 90% | 89% | 2026-03-04 | | Qwen3.5-122B-A10B-mxfp4 | mxfp4 | 65.0 GB | 714ms | 26.3 t/s | 57 t/s | 90% | 90% | 80% | 90% | 88% | 2026-03-04 | | Qwen3.5-27B-4bit | 4bit | 15.3 GB | 453ms | 17.7 t/s | 37.7 t/s | 83% | 90% | 50% | 80% | 76% | 2026-03-04 | | Qwen3.5-35B-A3B-4bit | 4bit | 19.6 GB | 322ms | 31.7 t/s | 95.2 t/s | 87% | 90% | 50% | 70% | 74% | 2026-03-04 | | Qwen3.5-35B-A3B-8bit | 8bit | 36.9 GB | 456ms | 32.4 t/s | 80 t/s | 90% | 90% | 80% | 80% | 85% | 2026-03-04 | | Qwen3.5-4B-4bit | 4bit | 2.4 GB | 196ms | 43 t/s | 157.6 t/s | 73% | 50% | 50% | 50% | 56% | 2026-03-04 | | Qwen3.5-9B-4bit | 4bit | 5.1 GB | 228ms | 35.4 t/s | 106.4 t/s | 83% | 70% | 60% | 70% | 71% | 2026-03-04 | ## Details ### Devstral-Small-2-4bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=291ms, warm=106ms - **Decode**: short=22.5 t/s, long=47.2 t/s - **RAM**: active=13.4 GB, peak=13.4 GB - **Tool Calling**: 17% (5/30) - **Coding**: 90% (9/10) - **Reasoning**: 70% (7/10) - **General**: 70% (7/10) - **Eval time**: 323.4s ### GLM-4.5-Air-4bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: glm47 - **Server flags**: `--enable-auto-tool-choice --tool-call-parser glm47` - **Date**: 2026-03-04 - **TTFT**: cold=708ms, warm=107ms - **Decode**: short=27.5 t/s, long=53.6 t/s - **RAM**: active=60.3 GB, peak=60.3 GB - **Tool Calling**: 73% (22/30) - **Coding**: 90% (9/10) - **Reasoning**: 70% (7/10) - **General**: 80% (8/10) - **Eval time**: 305.5s ### GLM-4.7-Flash-8bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: glm47 - **Server flags**: `--enable-auto-tool-choice --tool-call-parser glm47` - **Date**: 2026-03-04 - **TTFT**: cold=362ms, warm=113ms - **Decode**: short=33.4 t/s, long=57.2 t/s - **RAM**: active=31.9 GB, peak=31.9 GB - **Tool Calling**: 73% (22/30) - **Coding**: 100% (10/10) - **Reasoning**: 90% (9/10) - **General**: 50% (5/10) - **Eval time**: 230.6s ### GPT-OSS-20B-mxfp4-q8 - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: harmony - **Server flags**: `--enable-auto-tool-choice --tool-call-parser harmony` - **Date**: 2026-03-05 - **TTFT**: cold=339ms, warm=112ms - **Decode**: short=85.7 t/s, long=124.2 t/s - **RAM**: active=12.1 GB, peak=12.6 GB - **Tool Calling**: 80% (24/30) - **Coding**: 20% (2/10) - **Reasoning**: 60% (6/10) - **General**: 90% (9/10) - **Eval time**: 197.6s ### Hermes-3-Llama-3.1-8B-4bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=152ms, warm=72ms - **Decode**: short=69.3 t/s, long=122.9 t/s - **RAM**: active=4.6 GB, peak=4.7 GB - **Tool Calling**: 17% (5/30) - **Coding**: 20% (2/10) - **Reasoning**: 30% (3/10) - **General**: 40% (4/10) - **Eval time**: 111.3s ### MiniMax-M2.5-4bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: minimax - **Server flags**: `--enable-auto-tool-choice --tool-call-parser minimax` - **Date**: 2026-03-04 - **TTFT**: cold=1.3s, warm=136ms - **Decode**: short=46.2 t/s, long=49.9 t/s - **RAM**: active=128.9 GB, peak=128.9 GB - **Tool Calling**: 87% (26/30) - **Coding**: 10% (1/10) - **Reasoning**: 80% (8/10) - **General**: 90% (9/10) - **Eval time**: 610.7s ### Mistral-Small-3.2-4bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=1.1s, warm=104ms - **Decode**: short=27.6 t/s, long=47.2 t/s - **RAM**: active=13.4 GB, peak=13.7 GB - **Tool Calling**: 17% (5/30) - **Coding**: 80% (8/10) - **Reasoning**: 60% (6/10) - **General**: 60% (6/10) - **Eval time**: 369.5s ### Qwen3-0.6B-4bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=94ms, warm=73ms - **Decode**: short=78.3 t/s, long=364.7 t/s - **RAM**: active=0.4 GB, peak=0.4 GB - **Tool Calling**: 30% (9/30) - **Coding**: 20% (2/10) - **Reasoning**: 20% (2/10) - **General**: 30% (3/10) - **Eval time**: 38.8s ### Qwen3-Coder-Next-4bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=473ms, warm=27ms - **Decode**: short=41.5 t/s, long=73.5 t/s - **RAM**: active=44.9 GB, peak=45.0 GB - **Tool Calling**: 90% (27/30) - **Coding**: 90% (9/10) - **Reasoning**: 70% (7/10) - **General**: 70% (7/10) - **Eval time**: 218.9s ### Qwen3-Coder-Next-6bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=642ms, warm=29ms - **Decode**: short=34.6 t/s, long=65.6 t/s - **RAM**: active=64.8 GB, peak=64.8 GB - **Tool Calling**: 87% (26/30) - **Coding**: 90% (9/10) - **Reasoning**: 80% (8/10) - **General**: 70% (7/10) - **Eval time**: 250.8s ### Qwen3.5-122B-A10B-8bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=1.3s, warm=32ms - **Decode**: short=19.4 t/s, long=42.7 t/s - **RAM**: active=129.8 GB, peak=129.9 GB - **Tool Calling**: 87% (26/30) - **Coding**: 90% (9/10) - **Reasoning**: 90% (9/10) - **General**: 90% (9/10) - **Eval time**: 342.5s ### Qwen3.5-122B-A10B-mxfp4 - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=714ms, warm=27ms - **Decode**: short=26.3 t/s, long=57 t/s - **RAM**: active=65.0 GB, peak=65.1 GB - **Tool Calling**: 90% (27/30) - **Coding**: 90% (9/10) - **Reasoning**: 80% (8/10) - **General**: 90% (9/10) - **Eval time**: 261.5s ### Qwen3.5-27B-4bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=453ms, warm=29ms - **Decode**: short=17.7 t/s, long=37.7 t/s - **RAM**: active=15.3 GB, peak=15.4 GB - **Tool Calling**: 83% (25/30) - **Coding**: 90% (9/10) - **Reasoning**: 50% (5/10) - **General**: 80% (8/10) - **Eval time**: 451.7s ### Qwen3.5-35B-A3B-4bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=322ms, warm=33ms - **Decode**: short=31.7 t/s, long=95.2 t/s - **RAM**: active=19.6 GB, peak=19.6 GB - **Tool Calling**: 87% (26/30) - **Coding**: 90% (9/10) - **Reasoning**: 50% (5/10) - **General**: 70% (7/10) - **Eval time**: 168.2s ### Qwen3.5-35B-A3B-8bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=456ms, warm=30ms - **Decode**: short=32.4 t/s, long=80 t/s - **RAM**: active=36.9 GB, peak=36.9 GB - **Tool Calling**: 90% (27/30) - **Coding**: 90% (9/10) - **Reasoning**: 80% (8/10) - **General**: 80% (8/10) - **Eval time**: 186.0s ### Qwen3.5-4B-4bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=196ms, warm=29ms - **Decode**: short=43 t/s, long=157.6 t/s - **RAM**: active=2.4 GB, peak=2.5 GB - **Tool Calling**: 73% (22/30) - **Coding**: 50% (5/10) - **Reasoning**: 50% (5/10) - **General**: 50% (5/10) - **Eval time**: 111.8s ### Qwen3.5-9B-4bit - **Hardware**: Apple M3 Ultra (256GB) - **Parser**: hermes - **Server flags**: `--enable-auto-tool-choice --tool-call-parser hermes` - **Date**: 2026-03-04 - **TTFT**: cold=228ms, warm=28ms - **Decode**: short=35.4 t/s, long=106.4 t/s - **RAM**: active=5.1 GB, peak=5.2 GB - **Tool Calling**: 83% (25/30) - **Coding**: 70% (7/10) - **Reasoning**: 60% (6/10) - **General**: 70% (7/10) - **Eval time**: 179.8s --- ## How to Add Your Results 1. Start vllm-mlx with your model: `vllm-mlx serve --port 8000` 2. Run the eval: `python evals/run_eval.py --model "" --quantization ` 3. Your results are saved to `evals/results/.json` 4. Regenerate this table: `python evals/generate_scorecard.py` 5. Submit a PR with your JSON file!