{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:2.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "Avrupa Futbol Takımlarının Regresyon (Regression)  <br><br> Modeli İle Analizi -Scikit-learn\n",
    "\n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dataset' i <a href=\"https://www.kaggle.com\">Kaggle</a>'dan indirebilirsiniz. \n",
    "\n",
    "+25,000 maç\n",
    "+10,000 futbolcu\n",
    "11 Ülkenin şampiyon kulüpleri\n",
    "2008 - 2016 Sezonu\n",
    "Takım ve Futbolcuların Özellikleri\n",
    "Ayrıntılı Maç Detayları (Gol Tipleri, pozisyonlar, korner, çalım , faul, kartlar vs...)  +10,000 maç için."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "Gerekli Küphanelerin İçe Aktarımı<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import sqlite3\n",
    "import pandas as pd \n",
    "from sklearn.tree import DecisionTreeRegressor\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import mean_squared_error\n",
    "from math import sqrt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "Veritabanından Veriyi Pandas DataFrame'ine Çekme\n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sqlite3 ile veritabanı bağlantısı kurma\n",
    "cnx = sqlite3.connect('database.sqlite')\n",
    "df = pd.read_sql_query(\"SELECT * FROM Player_Attributes\", cnx)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>player_fifa_api_id</th>\n",
       "      <th>player_api_id</th>\n",
       "      <th>date</th>\n",
       "      <th>overall_rating</th>\n",
       "      <th>potential</th>\n",
       "      <th>preferred_foot</th>\n",
       "      <th>attacking_work_rate</th>\n",
       "      <th>defensive_work_rate</th>\n",
       "      <th>crossing</th>\n",
       "      <th>...</th>\n",
       "      <th>vision</th>\n",
       "      <th>penalties</th>\n",
       "      <th>marking</th>\n",
       "      <th>standing_tackle</th>\n",
       "      <th>sliding_tackle</th>\n",
       "      <th>gk_diving</th>\n",
       "      <th>gk_handling</th>\n",
       "      <th>gk_kicking</th>\n",
       "      <th>gk_positioning</th>\n",
       "      <th>gk_reflexes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>218353</td>\n",
       "      <td>505942</td>\n",
       "      <td>2016-02-18 00:00:00</td>\n",
       "      <td>67.0</td>\n",
       "      <td>71.0</td>\n",
       "      <td>right</td>\n",
       "      <td>medium</td>\n",
       "      <td>medium</td>\n",
       "      <td>49.0</td>\n",
       "      <td>...</td>\n",
       "      <td>54.0</td>\n",
       "      <td>48.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>218353</td>\n",
       "      <td>505942</td>\n",
       "      <td>2015-11-19 00:00:00</td>\n",
       "      <td>67.0</td>\n",
       "      <td>71.0</td>\n",
       "      <td>right</td>\n",
       "      <td>medium</td>\n",
       "      <td>medium</td>\n",
       "      <td>49.0</td>\n",
       "      <td>...</td>\n",
       "      <td>54.0</td>\n",
       "      <td>48.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>218353</td>\n",
       "      <td>505942</td>\n",
       "      <td>2015-09-21 00:00:00</td>\n",
       "      <td>62.0</td>\n",
       "      <td>66.0</td>\n",
       "      <td>right</td>\n",
       "      <td>medium</td>\n",
       "      <td>medium</td>\n",
       "      <td>49.0</td>\n",
       "      <td>...</td>\n",
       "      <td>54.0</td>\n",
       "      <td>48.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>66.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>218353</td>\n",
       "      <td>505942</td>\n",
       "      <td>2015-03-20 00:00:00</td>\n",
       "      <td>61.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>right</td>\n",
       "      <td>medium</td>\n",
       "      <td>medium</td>\n",
       "      <td>48.0</td>\n",
       "      <td>...</td>\n",
       "      <td>53.0</td>\n",
       "      <td>47.0</td>\n",
       "      <td>62.0</td>\n",
       "      <td>63.0</td>\n",
       "      <td>66.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>218353</td>\n",
       "      <td>505942</td>\n",
       "      <td>2007-02-22 00:00:00</td>\n",
       "      <td>61.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>right</td>\n",
       "      <td>medium</td>\n",
       "      <td>medium</td>\n",
       "      <td>48.0</td>\n",
       "      <td>...</td>\n",
       "      <td>53.0</td>\n",
       "      <td>47.0</td>\n",
       "      <td>62.0</td>\n",
       "      <td>63.0</td>\n",
       "      <td>66.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 42 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   id  player_fifa_api_id  player_api_id                 date  overall_rating  \\\n",
       "0   1              218353         505942  2016-02-18 00:00:00            67.0   \n",
       "1   2              218353         505942  2015-11-19 00:00:00            67.0   \n",
       "2   3              218353         505942  2015-09-21 00:00:00            62.0   \n",
       "3   4              218353         505942  2015-03-20 00:00:00            61.0   \n",
       "4   5              218353         505942  2007-02-22 00:00:00            61.0   \n",
       "\n",
       "   potential preferred_foot attacking_work_rate defensive_work_rate  crossing  \\\n",
       "0       71.0          right              medium              medium      49.0   \n",
       "1       71.0          right              medium              medium      49.0   \n",
       "2       66.0          right              medium              medium      49.0   \n",
       "3       65.0          right              medium              medium      48.0   \n",
       "4       65.0          right              medium              medium      48.0   \n",
       "\n",
       "      ...       vision  penalties  marking  standing_tackle  sliding_tackle  \\\n",
       "0     ...         54.0       48.0     65.0             69.0            69.0   \n",
       "1     ...         54.0       48.0     65.0             69.0            69.0   \n",
       "2     ...         54.0       48.0     65.0             66.0            69.0   \n",
       "3     ...         53.0       47.0     62.0             63.0            66.0   \n",
       "4     ...         53.0       47.0     62.0             63.0            66.0   \n",
       "\n",
       "   gk_diving  gk_handling  gk_kicking  gk_positioning  gk_reflexes  \n",
       "0        6.0         11.0        10.0             8.0          8.0  \n",
       "1        6.0         11.0        10.0             8.0          8.0  \n",
       "2        6.0         11.0        10.0             8.0          8.0  \n",
       "3        5.0         10.0         9.0             7.0          7.0  \n",
       "4        5.0         10.0         9.0             7.0          7.0  \n",
       "\n",
       "[5 rows x 42 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "183978"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "silmeden_once = df.shape[0]\n",
    "silmeden_once"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['id', 'player_fifa_api_id', 'player_api_id', 'date', 'overall_rating',\n",
       "       'potential', 'preferred_foot', 'attacking_work_rate',\n",
       "       'defensive_work_rate', 'crossing', 'finishing', 'heading_accuracy',\n",
       "       'short_passing', 'volleys', 'dribbling', 'curve', 'free_kick_accuracy',\n",
       "       'long_passing', 'ball_control', 'acceleration', 'sprint_speed',\n",
       "       'agility', 'reactions', 'balance', 'shot_power', 'jumping', 'stamina',\n",
       "       'strength', 'long_shots', 'aggression', 'interceptions', 'positioning',\n",
       "       'vision', 'penalties', 'marking', 'standing_tackle', 'sliding_tackle',\n",
       "       'gk_diving', 'gk_handling', 'gk_kicking', 'gk_positioning',\n",
       "       'gk_reflexes'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "Analizde Kullanacağımız Özellikleri (Features) Tanımlama\n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "features = [\n",
    "       'potential', 'crossing', 'finishing', 'heading_accuracy',\n",
    "       'short_passing', 'volleys', 'dribbling', 'curve', 'free_kick_accuracy',\n",
    "       'long_passing', 'ball_control', 'acceleration', 'sprint_speed',\n",
    "       'agility', 'reactions', 'balance', 'shot_power', 'jumping', 'stamina',\n",
    "       'strength', 'long_shots', 'aggression', 'interceptions', 'positioning',\n",
    "       'vision', 'penalties', 'marking', 'standing_tackle', 'sliding_tackle',\n",
    "       'gk_diving', 'gk_handling', 'gk_kicking', 'gk_positioning',\n",
    "       'gk_reflexes']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "Y (Target-Sonuç) Tanımlama\n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "target = ['overall_rating']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "Data Temizleme<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3624"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bos_satir_sayisi = df.isnull().any(axis=1).sum()\n",
    "bos_satir_sayisi"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "180354"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.dropna()\n",
    "sildikten_sonra = df.shape[0]\n",
    "sildikten_sonra"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3624"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Bu işlemin sonucu bos_satir_sayisi = 3624 değerine eşit olmalı\n",
    "silmeden_once - sildikten_sonra "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Gördüğünüz gibi sonuç istediğimiz değere eşit çıktı"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "Özellikleri (Features) ve Target Değerlerini Ayrıştıma\n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X = df[features]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "y = df[target]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(180354, 34)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "potential             66.0\n",
       "crossing              49.0\n",
       "finishing             44.0\n",
       "heading_accuracy      71.0\n",
       "short_passing         61.0\n",
       "volleys               44.0\n",
       "dribbling             51.0\n",
       "curve                 45.0\n",
       "free_kick_accuracy    39.0\n",
       "long_passing          64.0\n",
       "ball_control          49.0\n",
       "acceleration          60.0\n",
       "sprint_speed          64.0\n",
       "agility               59.0\n",
       "reactions             47.0\n",
       "balance               65.0\n",
       "shot_power            55.0\n",
       "jumping               58.0\n",
       "stamina               54.0\n",
       "strength              76.0\n",
       "long_shots            35.0\n",
       "aggression            63.0\n",
       "interceptions         41.0\n",
       "positioning           45.0\n",
       "vision                54.0\n",
       "penalties             48.0\n",
       "marking               65.0\n",
       "standing_tackle       66.0\n",
       "sliding_tackle        69.0\n",
       "gk_diving              6.0\n",
       "gk_handling           11.0\n",
       "gk_kicking            10.0\n",
       "gk_positioning         8.0\n",
       "gk_reflexes            8.0\n",
       "Name: 2, dtype: float64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.iloc[2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sonuç (Target) değerlerine bakalım."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>overall_rating</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>67.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>67.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>62.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>61.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>61.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>74.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>74.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>73.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>73.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>73.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>73.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>74.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>73.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>71.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>71.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>71.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>69.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>69.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>69.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>69.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>69.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>69.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>69.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>68.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>65.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>64.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>54.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>51.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>52.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>47.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>53.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>53.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>65.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>66.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>66.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>67.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>68.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>68.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>68.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>69.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>71.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>70.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    overall_rating\n",
       "0             67.0\n",
       "1             67.0\n",
       "2             62.0\n",
       "3             61.0\n",
       "4             61.0\n",
       "5             74.0\n",
       "6             74.0\n",
       "7             73.0\n",
       "8             73.0\n",
       "9             73.0\n",
       "10            73.0\n",
       "11            74.0\n",
       "12            73.0\n",
       "13            71.0\n",
       "14            71.0\n",
       "15            71.0\n",
       "16            70.0\n",
       "17            70.0\n",
       "18            70.0\n",
       "19            70.0\n",
       "20            70.0\n",
       "21            70.0\n",
       "22            69.0\n",
       "23            69.0\n",
       "24            69.0\n",
       "25            69.0\n",
       "26            69.0\n",
       "27            69.0\n",
       "28            69.0\n",
       "29            68.0\n",
       "30            65.0\n",
       "31            64.0\n",
       "32            54.0\n",
       "33            51.0\n",
       "34            52.0\n",
       "35            47.0\n",
       "36            53.0\n",
       "37            53.0\n",
       "38            65.0\n",
       "39            66.0\n",
       "40            66.0\n",
       "41            67.0\n",
       "42            68.0\n",
       "43            68.0\n",
       "44            68.0\n",
       "45            69.0\n",
       "46            70.0\n",
       "47            71.0\n",
       "48            70.0\n",
       "49            70.0"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y.head(50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "Dataset' i Eğitim (Train) ve Test Kümelerine Ayrıma <br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=324)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "(1) Lineer Regresyon (Linear Regression): \n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "regression = LinearRegression()\n",
    "regression.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "Lineer Regresyon Modeli İle Tahmin Yapma\n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 66.51284879],\n",
       "       [ 79.77234615],\n",
       "       [ 66.57371825],\n",
       "       ..., \n",
       "       [ 69.23780133],\n",
       "       [ 64.58351696],\n",
       "       [ 73.6881185 ]])"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_prediction = regression.predict(X_test)\n",
    "y_prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "Tahmin Edilmesi Gereken Sonuç (Target) Değerinin Ortalamısı Nedir?  \n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>overall_rating</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>59517.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>68.635818</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>7.041297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>33.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>64.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>69.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>73.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>94.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       overall_rating\n",
       "count    59517.000000\n",
       "mean        68.635818\n",
       "std          7.041297\n",
       "min         33.000000\n",
       "25%         64.000000\n",
       "50%         69.000000\n",
       "75%         73.000000\n",
       "max         94.000000"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_test.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "68.63 oranında bir ortalamaya sahip. Tahmin değerlerimizinde bu civarda çıkmasını bekliyoruz."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "Lineer Regresyon Modelinin Doğruluğunu Root Mean Square Error Kullanarak Bulma\n",
    "\n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Eğer sonuç 0 çıkarsa Modelimiz mükemmel bir tahmin yapmıştır."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "rmse = sqrt(mean_squared_error(y_true = y_test, y_pred = y_prediction))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.805303046855223\n"
     ]
    }
   ],
   "source": [
    "print(rmse)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "(2) Karar Ağacı (Decision Tree):\n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DecisionTreeRegressor(criterion='mse', max_depth=20, max_features=None,\n",
       "           max_leaf_nodes=None, min_impurity_split=1e-07,\n",
       "           min_samples_leaf=1, min_samples_split=2,\n",
       "           min_weight_fraction_leaf=0.0, presort=False, random_state=None,\n",
       "           splitter='best')"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "regression = DecisionTreeRegressor(max_depth=20)\n",
    "regression.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "Karar Ağacı Modeli İle Tahmin Yapma\n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 62.        ,  84.        ,  62.38666667, ...,  71.        ,\n",
       "        62.        ,  73.        ])"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_prediction = regression.predict(X_test)\n",
    "y_prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "For comparision: What is the mean of the expected target value in test set ?\n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>overall_rating</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>59517.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>68.635818</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>7.041297</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>33.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>64.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>69.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>73.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>94.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       overall_rating\n",
       "count    59517.000000\n",
       "mean        68.635818\n",
       "std          7.041297\n",
       "min         33.000000\n",
       "25%         64.000000\n",
       "50%         69.000000\n",
       "75%         73.000000\n",
       "max         94.000000"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_test.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-family: Arial; font-size:1.75em;color:purple; font-style:bold\"><br>\n",
    "\n",
    "Karar Ağacı Modelinin Doğruluğunu Root Mean Square Error Kullanarak Bulma\n",
    "\n",
    "<br><br></p>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "rmse = sqrt(mean_squared_error(y_true = y_test, y_pred = y_prediction))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.461295383115322\n"
     ]
    }
   ],
   "source": [
    "print(rmse)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sonuç:\n",
    "\n",
    "#### Karar Ağacı Modeli (Decision Tree) , Linear REgression Modeline göre daha iyi bir sonuç verdi."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}