--- name: elo-ratings-math description: Explains the mathematical principles behind Elo rating systems, including expected score calculation, rating updates, and the K-factor. Use when implementing or understanding competitive rating systems. --- # Elo Ratings Mathematics ## Overview The Elo rating system is a method for calculating the relative skill levels of players in competitor-versus-competitor games. Originally developed by Arpad Elo for chess, it's now used in many competitive contexts including sports, video games, and online platforms. ## Core Mathematical Principles ### 1. Expected Score Formula The expected score for a player is the probability of winning based on the rating difference between two players. **Formula:** ``` E_A = 1 / (1 + 10^((R_B - R_A) / 400)) ``` Where: - `E_A` = Expected score for player A (between 0 and 1) - `R_A` = Current rating of player A - `R_B` = Current rating of player B - `10^x` = 10 raised to the power of x **Interpretation:** - E_A = 1.0 means player A is expected to win with certainty - E_A = 0.5 means both players are equally matched (50% win probability) - E_A = 0.0 means player A is expected to lose with certainty **Example:** If player A has rating 1600 and player B has rating 1400: ``` E_A = 1 / (1 + 10^((1400 - 1600) / 400)) E_A = 1 / (1 + 10^(-200 / 400)) E_A = 1 / (1 + 10^(-0.5)) E_A = 1 / (1 + 0.316) E_A ≈ 0.76 ``` Player A is expected to score 0.76 (76% chance of winning). ### 2. Rating Update Formula After a game, ratings are updated based on the actual outcome compared to the expected outcome. **Formula:** ``` R'_A = R_A + K × (S_A - E_A) ``` Where: - `R'_A` = New rating for player A - `R_A` = Old rating for player A - `K` = K-factor (determines rating volatility) - `S_A` = Actual score (1 for win, 0.5 for draw, 0 for loss) - `E_A` = Expected score (from formula above) **The Update Difference:** ``` ΔR_A = K × (S_A - E_A) ``` This difference represents: - Positive value: Player performed better than expected (rating increases) - Negative value: Player performed worse than expected (rating decreases) - Zero: Player performed exactly as expected (no rating change) ### 3. The K-Factor The K-factor controls how much ratings can change after each game. **Common K-factor values:** - **K = 32**: High volatility, used for beginners or provisional ratings - **K = 24**: Medium volatility, used for intermediate players - **K = 16**: Low volatility, used for established/expert players - **K = 10**: Very stable, used for top-level players **Adaptive K-factor example (FIDE chess system):** ``` K = 40 if games_played < 30 K = 20 if rating < 2400 K = 10 if rating >= 2400 ``` ### 4. Rating Difference and Win Probability The relationship between rating difference and expected win probability: | Rating Difference | Expected Score | Win Probability | |-------------------|----------------|-----------------| | 0 | 0.50 | 50% | | 50 | 0.57 | 57% | | 100 | 0.64 | 64% | | 200 | 0.76 | 76% | | 300 | 0.85 | 85% | | 400 | 0.91 | 91% | | 500 | 0.95 | 95% | | 600 | 0.97 | 97% | **Formula for any rating difference:** ``` Win_Probability = 1 / (1 + 10^(-ΔR / 400)) ``` Where `ΔR = R_A - R_B` ### 5. Two-Player Zero-Sum Property In a two-player game, the rating changes are equal and opposite: ``` ΔR_A = -ΔR_B ``` This is because: ``` E_A + E_B = 1 S_A + S_B = 1 (for decisive games) ``` Therefore: ``` ΔR_A = K × (S_A - E_A) ΔR_B = K × (S_B - E_B) = K × ((1 - S_A) - (1 - E_A)) = -K × (S_A - E_A) = -ΔR_A ``` ## Comprehensive Example **Scenario:** Player A (rating 1800) plays Player B (rating 1700), K = 32 **Step 1: Calculate Expected Scores** ``` E_A = 1 / (1 + 10^((1700 - 1800) / 400)) E_A = 1 / (1 + 10^(-0.25)) E_A = 1 / (1 + 0.562) E_A ≈ 0.64 E_B = 1 - E_A ≈ 0.36 ``` **Step 2: Actual Outcome - Player B Wins (upset!)** ``` S_A = 0 (loss) S_B = 1 (win) ``` **Step 3: Calculate Rating Changes** ``` ΔR_A = 32 × (0 - 0.64) = 32 × (-0.64) = -20.48 ≈ -20 ΔR_B = 32 × (1 - 0.36) = 32 × (0.64) = 20.48 ≈ +20 ``` **Step 4: New Ratings** ``` R'_A = 1800 + (-20) = 1780 R'_B = 1700 + 20 = 1720 ``` Player B gained 20 points for the upset victory, while player A lost 20 points. ## Multi-Player Extensions For games with more than two players, the Elo system can be extended: **Pairwise Comparison Method:** Each player's rating change is the sum of their changes against all opponents: ``` ΔR_i = K × Σ(S_ij - E_ij) ``` Where: - `i` = player being rated - `j` = each opponent - `S_ij` = actual score against opponent j - `E_ij` = expected score against opponent j ## Mathematical Properties **1. Conservation of Rating Points:** In a closed system with only two-player games, the total rating points remain constant. **2. Logistic Distribution:** The expected score formula uses a logistic curve, which creates smooth probability transitions. **3. Rating Scale Calibration:** The choice of 400 in the formula means a 400-point difference corresponds to a 10:1 odds ratio (91% vs 9% win probability). **4. Convergence:** Over many games, ratings converge toward players' true skill levels, with convergence speed determined by K-factor. ## Implementation Considerations When implementing Elo ratings: 1. **Initial Ratings:** Typically start players at 1200, 1500, or 1600 2. **Minimum Ratings:** Consider setting a floor (e.g., 100) to prevent negative ratings 3. **Rating Inflation/Deflation:** Monitor average ratings over time 4. **Provisional Periods:** Use higher K-factors for new players 5. **Inactivity Decay:** Consider rating decay for inactive players 6. **Draw Handling:** Use S = 0.5 for both players in draws ## Extensions and Variants **Glicko and Glicko-2:** Adds rating deviation (RD) to account for uncertainty: ``` RD² = rating variance (higher = more uncertain) ``` **TrueSkill:** Microsoft's system using Bayesian inference with skill mean (μ) and skill standard deviation (σ). **Elo with Home Advantage:** Add a constant to the home player's rating in expected score calculation: ``` E_home = 1 / (1 + 10^((R_away - (R_home + H)) / 400)) ``` Where H is the home advantage (typically 30-100 points). ## References - Elo, A. E. (1978). *The Rating of Chessplayers, Past and Present* - FIDE Handbook: Rating Regulations - Glickman, M. E. (1999). "Parameter estimation in large dynamic paired comparison experiments"