{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Avrupa Futbol Takımlarının Regresyon (Regression)

Modeli İle Analizi -Scikit-learn\n", "\n", "

\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dataset' i Kaggle'dan indirebilirsiniz. \n", "\n", "+25,000 maç\n", "+10,000 futbolcu\n", "11 Ülkenin şampiyon kulüpleri\n", "2008 - 2016 Sezonu\n", "Takım ve Futbolcuların Özellikleri\n", "Ayrıntılı Maç Detayları (Gol Tipleri, pozisyonlar, korner, çalım , faul, kartlar vs...) +10,000 maç için." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Gerekli Küphanelerin İçe Aktarımı

\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import sqlite3\n", "import pandas as pd \n", "from sklearn.tree import DecisionTreeRegressor\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import mean_squared_error\n", "from math import sqrt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Veritabanından Veriyi Pandas DataFrame'ine Çekme\n", "

\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# sqlite3 ile veritabanı bağlantısı kurma\n", "cnx = sqlite3.connect('database.sqlite')\n", "df = pd.read_sql_query(\"SELECT * FROM Player_Attributes\", cnx)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idplayer_fifa_api_idplayer_api_iddateoverall_ratingpotentialpreferred_footattacking_work_ratedefensive_work_ratecrossing...visionpenaltiesmarkingstanding_tacklesliding_tacklegk_divinggk_handlinggk_kickinggk_positioninggk_reflexes
012183535059422016-02-18 00:00:0067.071.0rightmediummedium49.0...54.048.065.069.069.06.011.010.08.08.0
122183535059422015-11-19 00:00:0067.071.0rightmediummedium49.0...54.048.065.069.069.06.011.010.08.08.0
232183535059422015-09-21 00:00:0062.066.0rightmediummedium49.0...54.048.065.066.069.06.011.010.08.08.0
342183535059422015-03-20 00:00:0061.065.0rightmediummedium48.0...53.047.062.063.066.05.010.09.07.07.0
452183535059422007-02-22 00:00:0061.065.0rightmediummedium48.0...53.047.062.063.066.05.010.09.07.07.0
\n", "

5 rows × 42 columns

\n", "
" ], "text/plain": [ " id player_fifa_api_id player_api_id date overall_rating \\\n", "0 1 218353 505942 2016-02-18 00:00:00 67.0 \n", "1 2 218353 505942 2015-11-19 00:00:00 67.0 \n", "2 3 218353 505942 2015-09-21 00:00:00 62.0 \n", "3 4 218353 505942 2015-03-20 00:00:00 61.0 \n", "4 5 218353 505942 2007-02-22 00:00:00 61.0 \n", "\n", " potential preferred_foot attacking_work_rate defensive_work_rate crossing \\\n", "0 71.0 right medium medium 49.0 \n", "1 71.0 right medium medium 49.0 \n", "2 66.0 right medium medium 49.0 \n", "3 65.0 right medium medium 48.0 \n", "4 65.0 right medium medium 48.0 \n", "\n", " ... vision penalties marking standing_tackle sliding_tackle \\\n", "0 ... 54.0 48.0 65.0 69.0 69.0 \n", "1 ... 54.0 48.0 65.0 69.0 69.0 \n", "2 ... 54.0 48.0 65.0 66.0 69.0 \n", "3 ... 53.0 47.0 62.0 63.0 66.0 \n", "4 ... 53.0 47.0 62.0 63.0 66.0 \n", "\n", " gk_diving gk_handling gk_kicking gk_positioning gk_reflexes \n", "0 6.0 11.0 10.0 8.0 8.0 \n", "1 6.0 11.0 10.0 8.0 8.0 \n", "2 6.0 11.0 10.0 8.0 8.0 \n", "3 5.0 10.0 9.0 7.0 7.0 \n", "4 5.0 10.0 9.0 7.0 7.0 \n", "\n", "[5 rows x 42 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "183978" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "silmeden_once = df.shape[0]\n", "silmeden_once" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['id', 'player_fifa_api_id', 'player_api_id', 'date', 'overall_rating',\n", " 'potential', 'preferred_foot', 'attacking_work_rate',\n", " 'defensive_work_rate', 'crossing', 'finishing', 'heading_accuracy',\n", " 'short_passing', 'volleys', 'dribbling', 'curve', 'free_kick_accuracy',\n", " 'long_passing', 'ball_control', 'acceleration', 'sprint_speed',\n", " 'agility', 'reactions', 'balance', 'shot_power', 'jumping', 'stamina',\n", " 'strength', 'long_shots', 'aggression', 'interceptions', 'positioning',\n", " 'vision', 'penalties', 'marking', 'standing_tackle', 'sliding_tackle',\n", " 'gk_diving', 'gk_handling', 'gk_kicking', 'gk_positioning',\n", " 'gk_reflexes'],\n", " dtype='object')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Analizde Kullanacağımız Özellikleri (Features) Tanımlama\n", "

\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "features = [\n", " 'potential', 'crossing', 'finishing', 'heading_accuracy',\n", " 'short_passing', 'volleys', 'dribbling', 'curve', 'free_kick_accuracy',\n", " 'long_passing', 'ball_control', 'acceleration', 'sprint_speed',\n", " 'agility', 'reactions', 'balance', 'shot_power', 'jumping', 'stamina',\n", " 'strength', 'long_shots', 'aggression', 'interceptions', 'positioning',\n", " 'vision', 'penalties', 'marking', 'standing_tackle', 'sliding_tackle',\n", " 'gk_diving', 'gk_handling', 'gk_kicking', 'gk_positioning',\n", " 'gk_reflexes']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Y (Target-Sonuç) Tanımlama\n", "

\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "target = ['overall_rating']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "Data Temizleme

\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3624" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bos_satir_sayisi = df.isnull().any(axis=1).sum()\n", "bos_satir_sayisi" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "180354" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = df.dropna()\n", "sildikten_sonra = df.shape[0]\n", "sildikten_sonra" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3624" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Bu işlemin sonucu bos_satir_sayisi = 3624 değerine eşit olmalı\n", "silmeden_once - sildikten_sonra " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Gördüğünüz gibi sonuç istediğimiz değere eşit çıktı" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Özellikleri (Features) ve Target Değerlerini Ayrıştıma\n", "

\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X = df[features]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y = df[target]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(180354, 34)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.shape" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "potential 66.0\n", "crossing 49.0\n", "finishing 44.0\n", "heading_accuracy 71.0\n", "short_passing 61.0\n", "volleys 44.0\n", "dribbling 51.0\n", "curve 45.0\n", "free_kick_accuracy 39.0\n", "long_passing 64.0\n", "ball_control 49.0\n", "acceleration 60.0\n", "sprint_speed 64.0\n", "agility 59.0\n", "reactions 47.0\n", "balance 65.0\n", "shot_power 55.0\n", "jumping 58.0\n", "stamina 54.0\n", "strength 76.0\n", "long_shots 35.0\n", "aggression 63.0\n", "interceptions 41.0\n", "positioning 45.0\n", "vision 54.0\n", "penalties 48.0\n", "marking 65.0\n", "standing_tackle 66.0\n", "sliding_tackle 69.0\n", "gk_diving 6.0\n", "gk_handling 11.0\n", "gk_kicking 10.0\n", "gk_positioning 8.0\n", "gk_reflexes 8.0\n", "Name: 2, dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.iloc[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sonuç (Target) değerlerine bakalım." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
overall_rating
067.0
167.0
262.0
361.0
461.0
574.0
674.0
773.0
873.0
973.0
1073.0
1174.0
1273.0
1371.0
1471.0
1571.0
1670.0
1770.0
1870.0
1970.0
2070.0
2170.0
2269.0
2369.0
2469.0
2569.0
2669.0
2769.0
2869.0
2968.0
3065.0
3164.0
3254.0
3351.0
3452.0
3547.0
3653.0
3753.0
3865.0
3966.0
4066.0
4167.0
4268.0
4368.0
4468.0
4569.0
4670.0
4771.0
4870.0
4970.0
\n", "
" ], "text/plain": [ " overall_rating\n", "0 67.0\n", "1 67.0\n", "2 62.0\n", "3 61.0\n", "4 61.0\n", "5 74.0\n", "6 74.0\n", "7 73.0\n", "8 73.0\n", "9 73.0\n", "10 73.0\n", "11 74.0\n", "12 73.0\n", "13 71.0\n", "14 71.0\n", "15 71.0\n", "16 70.0\n", "17 70.0\n", "18 70.0\n", "19 70.0\n", "20 70.0\n", "21 70.0\n", "22 69.0\n", "23 69.0\n", "24 69.0\n", "25 69.0\n", "26 69.0\n", "27 69.0\n", "28 69.0\n", "29 68.0\n", "30 65.0\n", "31 64.0\n", "32 54.0\n", "33 51.0\n", "34 52.0\n", "35 47.0\n", "36 53.0\n", "37 53.0\n", "38 65.0\n", "39 66.0\n", "40 66.0\n", "41 67.0\n", "42 68.0\n", "43 68.0\n", "44 68.0\n", "45 69.0\n", "46 70.0\n", "47 71.0\n", "48 70.0\n", "49 70.0" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y.head(50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Dataset' i Eğitim (Train) ve Test Kümelerine Ayrıma

\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=324)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "(1) Lineer Regresyon (Linear Regression): \n", "

\n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "regression = LinearRegression()\n", "regression.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Lineer Regresyon Modeli İle Tahmin Yapma\n", "

\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 66.51284879],\n", " [ 79.77234615],\n", " [ 66.57371825],\n", " ..., \n", " [ 69.23780133],\n", " [ 64.58351696],\n", " [ 73.6881185 ]])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_prediction = regression.predict(X_test)\n", "y_prediction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Tahmin Edilmesi Gereken Sonuç (Target) Değerinin Ortalamısı Nedir? \n", "

\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
overall_rating
count59517.000000
mean68.635818
std7.041297
min33.000000
25%64.000000
50%69.000000
75%73.000000
max94.000000
\n", "
" ], "text/plain": [ " overall_rating\n", "count 59517.000000\n", "mean 68.635818\n", "std 7.041297\n", "min 33.000000\n", "25% 64.000000\n", "50% 69.000000\n", "75% 73.000000\n", "max 94.000000" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_test.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "68.63 oranında bir ortalamaya sahip. Tahmin değerlerimizinde bu civarda çıkmasını bekliyoruz." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Lineer Regresyon Modelinin Doğruluğunu Root Mean Square Error Kullanarak Bulma\n", "\n", "

\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Eğer sonuç 0 çıkarsa Modelimiz mükemmel bir tahmin yapmıştır." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rmse = sqrt(mean_squared_error(y_true = y_test, y_pred = y_prediction))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.805303046855223\n" ] } ], "source": [ "print(rmse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "(2) Karar Ağacı (Decision Tree):\n", "

\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DecisionTreeRegressor(criterion='mse', max_depth=20, max_features=None,\n", " max_leaf_nodes=None, min_impurity_split=1e-07,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort=False, random_state=None,\n", " splitter='best')" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "regression = DecisionTreeRegressor(max_depth=20)\n", "regression.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Karar Ağacı Modeli İle Tahmin Yapma\n", "

\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 62. , 84. , 62.38666667, ..., 71. ,\n", " 62. , 73. ])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_prediction = regression.predict(X_test)\n", "y_prediction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "For comparision: What is the mean of the expected target value in test set ?\n", "

\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
overall_rating
count59517.000000
mean68.635818
std7.041297
min33.000000
25%64.000000
50%69.000000
75%73.000000
max94.000000
\n", "
" ], "text/plain": [ " overall_rating\n", "count 59517.000000\n", "mean 68.635818\n", "std 7.041297\n", "min 33.000000\n", "25% 64.000000\n", "50% 69.000000\n", "75% 73.000000\n", "max 94.000000" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_test.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Karar Ağacı Modelinin Doğruluğunu Root Mean Square Error Kullanarak Bulma\n", "\n", "

\n" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rmse = sqrt(mean_squared_error(y_true = y_test, y_pred = y_prediction))" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.461295383115322\n" ] } ], "source": [ "print(rmse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sonuç:\n", "\n", "#### Karar Ağacı Modeli (Decision Tree) , Linear REgression Modeline göre daha iyi bir sonuç verdi." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }