Rank,Model,Multi Turn Overall Acc,Base,Miss Func,Miss Param,Long Context 1,xLAM-2-70b-fc-r (FC),77.38%,82.50%,77.00%,74.00%,76.00% 2,xLAM-2-8b-fc-r (FC),70.00%,76.00%,72.00%,65.00%,67.00% 3,xLAM-2-32b-fc-r (FC),69.50%,81.50%,72.50%,67.50%,56.50% 4,Claude-Opus-4-5-20251101 (FC),68.38%,81.00%,64.00%,58.00%,70.50% 5,GLM-4.6 (FC thinking),68.00%,74.50%,68.00%,63.00%,66.50% 6,Gemini-3-Pro-Preview (FC),63.12%,69.00%,63.00%,56.50%,64.00% 7,BitAgent-Bounty-8B,62.38%,75.00%,49.50%,68.00%,57.00% 8,o3-2025-04-16 (Prompt),62.25%,68.00%,63.50%,54.50%,63.00% 9,Claude-Sonnet-4-5-20250929 (FC),61.37%,69.00%,65.00%,52.50%,59.00% 10,Gemini-3-Pro-Preview (Prompt),60.75%,64.50%,60.00%,54.50%,64.00% 11,Grok-4-1-fast-reasoning (FC),58.87%,70.50%,59.50%,43.00%,62.50% 12,xLAM-2-3b-fc-r (FC),58.38%,71.50%,59.00%,57.50%,45.50% 13,Arch-Agent-32B,54.25%,64.50%,58.00%,53.00%,41.50% 14,Claude-Haiku-4-5-20251001 (FC),53.62%,63.50%,42.50%,52.50%,56.00% 15,Nanbeige4-3B-Thinking-2511 (FC),51.12%,58.50%,54.00%,45.00%,47.00% 16,Moonshotai-Kimi-K2-Instruct (FC),50.63%,62.00%,41.00%,44.50%,55.00% 17,Command A Reasoning (FC),50.12%,61.50%,41.00%,49.50%,48.50% 18,Qwen3-32B (FC),47.87%,56.00%,52.50%,40.00%,43.00% 19,Grok-4-0709 (Prompt),47.00%,55.50%,46.00%,36.00%,50.50% 20,Grok-4-1-fast-non-reasoning (FC),46.75%,58.00%,39.50%,37.50%,52.00% 21,Qwen3-235B-A22B-Instruct-2507 (FC),45.38%,57.50%,35.00%,33.50%,55.50% 22,DeepSeek-V3.2-Exp (Prompt + Thinking),44.88%,55.00%,49.00%,27.00%,48.50% 23,Qwen3-235B-A22B-Instruct-2507 (Prompt),44.62%,54.00%,42.50%,31.50%,50.50% 24,GPT-5.2-2025-12-11 (Prompt),43.75%,54.50%,40.50%,33.50%,46.50% 25,Qwen3-32B (Prompt),43.25%,54.00%,46.00%,36.50%,36.50% 26,Qwen3-8B (FC),41.75%,50.50%,42.00%,40.00%,34.50% 27,o4-mini-2025-04-16 (FC),41.75%,51.00%,30.00%,40.50%,45.50% 28,Nanbeige3.5-Pro-Thinking (FC),40.00%,56.00%,34.00%,29.00%,41.00% 29,GPT-4.1-2025-04-14 (FC),38.88%,47.50%,32.50%,32.50%,43.00% 30,ToolACE-2-8B (FC),38.38%,49.00%,28.00%,30.50%,46.00% 31,DeepSeek-V3.2-Exp (FC),37.38%,41.50%,39.50%,33.50%,35.00% 32,Gemini-2.5-Flash (FC),36.25%,41.50%,36.00%,32.00%,35.50% 33,xLAM-2-1b-fc-r (FC),36.00%,45.50%,36.00%,37.00%,25.50% 34,Arch-Agent-3B,34.88%,42.00%,37.50%,31.00%,29.00% 35,Qwen3-14B (FC),34.75%,39.00%,34.00%,33.50%,32.50% 36,GPT-5-nano-2025-08-07 (FC),34.50%,44.00%,23.50%,32.50%,38.00% 37,GPT-4.1-mini-2025-04-14 (FC),34.13%,43.50%,22.50%,30.50%,40.00% 38,Grok-4-0709 (FC),33.88%,44.00%,19.00%,28.50%,44.00% 39,Qwen3-8B (Prompt),33.38%,41.50%,38.50%,27.00%,26.50% 40,Qwen3-30B-A3B-Instruct-2507 (FC),30.00%,43.50%,10.50%,25.00%,41.00% 41,Command A (FC),29.50%,38.00%,23.00%,32.00%,25.00% 42,GPT-5.2-2025-12-11 (FC),28.12%,36.50%,18.00%,27.50%,30.50% 43,GPT-5-mini-2025-08-07 (FC),27.50%,36.50%,17.00%,23.50%,33.00% 44,Arch-Agent-1.5B,26.62%,35.50%,27.50%,21.50%,22.00% 45,Qwen3-14B (Prompt),26.13%,16.50%,37.50%,31.00%,19.50% 46,Hammer2.1-7b (FC),23.87%,24.00%,28.50%,21.50%,21.50% 47,GPT-4.1-nano-2025-04-14 (FC),23.62%,39.50%,7.50%,17.50%,30.00% 48,Qwen3-30B-A3B-Instruct-2507 (Prompt),23.50%,33.00%,16.00%,16.00%,29.00% 49,Qwen3-4B-Instruct-2507 (FC),22.12%,26.50%,21.00%,15.50%,25.50% 50,Llama-3.3-70B-Instruct (FC),21.50%,26.00%,19.00%,14.50%,26.50% 51,Qwen3-4B-Instruct-2507 (Prompt),20.50%,24.50%,21.50%,16.00%,20.00% 52,Llama-4-Maverick-17B-128E-Instruct-FP8 (FC),20.25%,27.00%,22.00%,14.00%,18.00% 53,Gemini-2.5-Flash (Prompt),16.75%,14.50%,16.50%,17.50%,18.50% 54,o4-mini-2025-04-16 (Prompt),16.62%,16.50%,18.00%,17.50%,14.50% 55,Hammer2.1-3b (FC),16.50%,22.00%,12.50%,16.00%,15.50% 56,Claude-Opus-4-5-20251101 (Prompt),16.12%,20.50%,9.00%,21.50%,13.50% 57,Hammer2.1-1.5b (FC),15.62%,20.00%,16.50%,9.50%,16.50% 58,o3-2025-04-16 (FC),14.75%,16.50%,11.50%,14.50%,16.50% 59,Mistral-Small-2506 (Prompt),14.75%,20.50%,17.00%,9.50%,12.00% 60,mistral-large-2411 (FC),14.12%,18.50%,11.50%,13.00%,13.50% 61,mistral-large-2411 (Prompt),13.75%,20.00%,5.00%,11.00%,19.00% 62,Gemini-2.5-Flash-Lite (FC),13.50%,20.00%,1.50%,15.00%,17.50% 63,Mistral-small-2506 (FC),11.50%,17.50%,6.00%,10.50%,12.00% 64,Llama-3.1-8B-Instruct (Prompt),11.12%,13.00%,9.00%,9.50%,13.00% 65,Qwen3-1.7B (FC),11.00%,15.00%,6.00%,12.00%,11.00% 66,Gemma-3-27b-it (Prompt),10.75%,16.50%,4.50%,8.00%,14.00% 67,Mistral-Medium-2505 (FC),10.75%,15.50%,7.00%,7.50%,13.00% 68,CoALM-70B,10.62%,11.00%,14.00%,9.00%,8.50% 69,Mistral-Medium-2505,9.88%,13.50%,6.50%,6.00%,13.50% 70,GPT-4.1-2025-04-14 (Prompt),9.75%,10.50%,11.00%,8.00%,9.50% 71,Llama-4-Scout-17B-16E-Instruct (FC),9.00%,12.00%,7.00%,7.50%,9.50% 72,Command R7B (FC),8.25%,12.00%,0.50%,10.50%,10.00% 73,CoALM-8B,8.00%,10.00%,7.00%,8.00%,7.00% 74,Open-Mistral-Nemo-2407 (FC),7.75%,12.50%,6.50%,7.50%,4.50% 75,Gemini-2.5-Flash-Lite (Prompt),7.63%,10.00%,5.00%,6.50%,9.00% 76,Granite-3.1-8B-Instruct (FC),7.50%,11.50%,2.00%,7.50%,9.00% 77,Granite-3.2-8B-Instruct (FC),7.38%,9.50%,3.00%,8.00%,9.00% 78,Falcon3-10B-Instruct (FC),6.50%,6.50%,9.50%,5.00%,5.00% 79,Gemma-3-12b-it (Prompt),5.75%,6.50%,7.50%,5.00%,4.00% 80,GPT-5-mini-2025-08-07 (Prompt),5.50%,5.50%,5.00%,4.50%,7.00% 81,Granite-20b-FunctionCalling (FC),5.38%,9.00%,3.00%,6.50%,3.00% 82,Falcon3-7B-Instruct (FC),5.00%,7.00%,4.00%,5.00%,4.00% 83,Llama-3.2-3B-Instruct (FC),4.00%,5.00%,3.50%,4.00%,3.50% 84,MiniCPM3-4B-FC (FC),3.88%,6.50%,2.00%,4.50%,2.50% 85,Phi-4 (Prompt),3.88%,9.00%,0.00%,3.50%,3.00% 86,Qwen3-0.6B (FC),3.62%,5.50%,2.00%,3.00%,4.00% 87,MiniCPM3-4B (Prompt),3.50%,4.50%,4.50%,2.00%,3.00% 88,Hammer2.1-0.5b (FC),2.88%,4.50%,0.50%,4.00%,2.50% 89,RZN-T (Prompt),2.88%,4.50%,2.00%,2.50%,2.50% 90,Bielik-11B-v2.3-Instruct (Prompt),2.62%,4.50%,0.50%,3.00%,2.50% 91,Granite-4.0-350m (FC),2.50%,5.00%,0.50%,2.50%,2.00% 92,GPT-4.1-mini-2025-04-14 (Prompt),2.50%,1.50%,4.50%,2.50%,1.50% 93,Amazon-Nova-2-Lite-v1:0 (FC),2.12%,2.50%,1.50%,2.00%,2.50% 94,GPT-4.1-nano-2025-04-14 (Prompt),2.00%,2.50%,1.00%,2.50%,2.00% 95,Amazon-Nova-Pro-v1:0 (FC),1.88%,1.50%,0.50%,2.50%,3.00% 96,Claude-Haiku-4-5-20251001 (Prompt),1.75%,1.50%,0.00%,4.00%,1.50% 97,Claude-Sonnet-4-5-20250929 (Prompt),1.62%,2.00%,0.00%,3.00%,1.50% 98,Qwen3-0.6B (Prompt),1.38%,1.50%,1.50%,1.50%,1.00% 99,Amazon-Nova-Micro-v1:0 (FC),1.38%,1.50%,1.00%,2.00%,1.00% 100,Falcon3-3B-Instruct (FC),1.00%,1.50%,0.50%,0.50%,1.50% 101,Open-Mistral-Nemo-2407 (Prompt),0.75%,0.50%,1.00%,0.00%,1.50% 102,GPT-5-nano-2025-08-07 (Prompt),0.75%,1.00%,1.00%,0.00%,1.00% 103,palmyra-x-004 (FC),0.38%,0.50%,0.00%,0.50%,0.50% 104,Gemma-3-4b-it (Prompt),0.38%,0.50%,0.00%,0.50%,0.50% 105,Gemma-3-1b-it (Prompt),0.00%,0.00%,0.00%,0.00%,0.00% 106,Ministral-8B-Instruct-2410 (FC),0.00%,0.00%,0.00%,0.00%,0.00% 107,Llama-3.2-1B-Instruct (FC),0.00%,0.00%,0.00%,0.00%,0.00% 108,Llama-3.1-Nemotron-Ultra-253B-v1 (FC),0.00%,0.00%,0.00%,0.00%,0.00% 109,Falcon3-1B-Instruct (FC),0.00%,0.00%,0.00%,0.00%,0.00%