SlopRank Dashboard

Model Rankings

Rank Model Score Visualization Confidence Interval
1 o1-preview 0.179404
0.179064 [0.155861, 0.200286]
2 gpt-4o 0.178305
0.178198 [0.156553, 0.200585]
3 deepseek-chat 0.167105
0.166883 [0.144720, 0.191760]
4 gemini-2.0-flash-thinking-exp-1219 0.164732
0.165014 [0.142054, 0.187329]
5 claude-3-5-sonnet-latest 0.155571
0.155903 [0.133843, 0.177003]
6 gemini-exp-1206 0.154884
0.154936 [0.133611, 0.179618]

Statistical Significance

Comparison Significance
o1-preview_vs_gpt-4o Not significant
gpt-4o_vs_deepseek-chat Not significant
deepseek-chat_vs_gemini-2.0-flash-thinking-exp-1219 Not significant
gemini-2.0-flash-thinking-exp-1219_vs_claude-3-5-sonnet-latest Not significant
claude-3-5-sonnet-latest_vs_gemini-exp-1206 Not significant

Rankings by Category

Creativity

Rank Model Score Visualization
1 o1-preview 8.8571
2 gemini-exp-1206 8.8333
3 deepseek-chat 8.5000
4 gemini-2.0-flash-thinking-exp-1219 8.0455
5 gpt-4o 7.9231
6 claude-3-5-sonnet-latest 6.8571

Economic

Rank Model Score Visualization
1 gpt-4o 8.3333
2 deepseek-chat 8.0000
3 gemini-exp-1206 8.0000
4 claude-3-5-sonnet-latest 7.8889
5 gemini-2.0-flash-thinking-exp-1219 7.7500
6 o1-preview 7.5000

Knowledge

Rank Model Score Visualization
1 gemini-2.0-flash-thinking-exp-1219 7.0000
2 claude-3-5-sonnet-latest 6.8571
3 gemini-exp-1206 6.5714
4 gpt-4o 6.1667
5 o1-preview 5.8333
6 deepseek-chat 4.3333

Medical

Rank Model Score Visualization
1 gpt-4o 8.5000
2 deepseek-chat 7.1667
3 gemini-exp-1206 6.7143
4 o1-preview 6.2000
5 gemini-2.0-flash-thinking-exp-1219 6.1429
6 claude-3-5-sonnet-latest 5.0000

Reasoning

Rank Model Score Visualization
1 o1-preview 8.8000
2 deepseek-chat 8.7667
3 gemini-exp-1206 8.6111
4 gpt-4o 8.2121
5 gemini-2.0-flash-thinking-exp-1219 8.2069
6 claude-3-5-sonnet-latest 6.9655

Technical

Rank Model Score Visualization
1 gemini-exp-1206 9.2500
2 o1-preview 8.6667
3 deepseek-chat 8.5000
4 claude-3-5-sonnet-latest 8.0000
5 gemini-2.0-flash-thinking-exp-1219 7.3333
6 gpt-4o 7.0000

Endorsement Graph

Endorsement Graph