{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Multiple Inputs in Keras\n", "> In this chapter, you will extend your 2-input model to 3 inputs, and learn how to use Keras' summary and plot functions to understand the parameters and topology of your neural networks. By the end of the chapter, you will understand how to extend a 2-input model to 3 inputs and beyond.This is the Summary of lecture \"Advanced Deep Learning with Keras\", via datacamp.\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Tensorflow-Keras, Deep_Learning]\n", "- image: images/team_strength_model.png" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import tensorflow as tf\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "plt.rcParams['figure.figsize'] = (8, 8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Three-input models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make an input layer for home vs. away\n", "Now you will make an improvement to the model you used in the previous chapter for regular season games. You know there is a well-documented home-team advantage in basketball, so you will add a new input to your model to capture this effect.\n", "\n", "This model will have three inputs: team_id_1, team_id_2, and home. The team IDs will be integers that you look up in your team strength model from the previous chapter, and home will be a binary variable, 1 if team_1 is playing at home, 0 if they are not." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
seasonteam_1team_2homescore_diffscore_1score_2won
019853745666401781641
1198512674931777701
2198528835931763561
319851846988111670541
4198526751029811286741
\n", "
" ], "text/plain": [ " season team_1 team_2 home score_diff score_1 score_2 won\n", "0 1985 3745 6664 0 17 81 64 1\n", "1 1985 126 7493 1 7 77 70 1\n", "2 1985 288 3593 1 7 63 56 1\n", "3 1985 1846 9881 1 16 70 54 1\n", "4 1985 2675 10298 1 12 86 74 1" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "games_season = pd.read_csv('./dataset/games_season.csv')\n", "games_season.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
seasonteam_1team_2homeseed_diffscore_diffscore_1score_2won
01985288730-3-941500
1198559297304661551
2198598847305-459630
319857328803950411
41985392041001-954630
\n", "
" ], "text/plain": [ " season team_1 team_2 home seed_diff score_diff score_1 score_2 won\n", "0 1985 288 73 0 -3 -9 41 50 0\n", "1 1985 5929 73 0 4 6 61 55 1\n", "2 1985 9884 73 0 5 -4 59 63 0\n", "3 1985 73 288 0 3 9 50 41 1\n", "4 1985 3920 410 0 1 -9 54 63 0" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "games_tourney = pd.read_csv('./dataset/games_tourney.csv')\n", "games_tourney.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model: \"Team-Strength-Model\"\n", "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "input_1 (InputLayer) [(None, 1)] 0 \n", "_________________________________________________________________\n", "Team-Strength (Embedding) (None, 1, 1) 10888 \n", "_________________________________________________________________\n", "flatten (Flatten) (None, 1) 0 \n", "=================================================================\n", "Total params: 10,888\n", "Trainable params: 10,888\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ] } ], "source": [ "from tensorflow.keras.layers import Embedding, Input, Flatten\n", "from tensorflow.keras.models import Model\n", "\n", "# Count the unique number of teams\n", "n_teams = np.unique(games_season['team_1']).shape[0]\n", "\n", "# Create an embedding layer\n", "team_lookup = Embedding(input_dim=n_teams,\n", " output_dim=1,\n", " input_length=1,\n", " name='Team-Strength')\n", "\n", "# Create an input layer for the team ID\n", "teamid_in = Input(shape=(1, ))\n", "\n", "# Lookup the input in the team strength embedding layer\n", "strength_lookup = team_lookup(teamid_in)\n", "\n", "# Flatten the output\n", "strength_lookup_flat = Flatten()(strength_lookup)\n", "\n", "# Combine the operations into a single, re-usable model\n", "team_strength_model = Model(teamid_in, strength_lookup_flat, name='Team-Strength-Model')\n", "\n", "team_strength_model.summary()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from tensorflow.keras.layers import Concatenate, Dense\n", "\n", "# Create an Input for each team\n", "team_in_1 = Input(shape=(1, ), name='Team-1-In')\n", "team_in_2 = Input(shape=(1, ), name='Team-2-In')\n", "\n", "# Create an input for home vs away\n", "home_in = Input(shape=(1, ), name='Home-In')\n", "\n", "# Lookup the team inputs in the team strength model\n", "team_1_strength = team_strength_model(team_in_1)\n", "team_2_strength = team_strength_model(team_in_2)\n", "\n", "# Combine the team strengths with the home input using a Concatenate layer, \n", "# then add a Dense layer\n", "\n", "out = Concatenate()([team_1_strength, team_2_strength, home_in])\n", "out = Dense(1)(out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make a model and compile it\n", "Now that you've input and output layers for the 3-input model, wrap them up in a Keras model class, and then compile the model, so you can fit it to data and use it to make predictions on new data.\n", "\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Make a model\n", "model = Model([team_in_1, team_in_2, home_in], out)\n", "\n", "# Compile the model\n", "model.compile(optimizer='adam', loss='mean_absolute_error')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fit the model and evaluate\n", "Now that you've defined a new model, fit it to the regular season basketball data.\n", "\n", "Use the `model` you fit in the previous exercise (which was trained on the regular season data) and evaluate the model on data for tournament games (`games_tourney`)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "138/138 [==============================] - 0s 2ms/step - loss: 12.2876 - val_loss: 11.5519\n", "11.686502456665039\n" ] } ], "source": [ "# Fit the model to the games_season dataset\n", "model.fit([games_season['team_1'], games_season['team_2'], games_season['home']],\n", " games_season['score_diff'],\n", " epochs=1, verbose=True, validation_split=0.1, batch_size=2048)\n", "\n", "# Evaluate the model on the games_touney dataset\n", "print(model.evaluate([games_tourney['team_1'], games_tourney['team_2'], games_tourney['home']], \n", " games_tourney['score_diff'], verbose=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summarizing and plotting models\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model summaries\n", "In this exercise, you will take a closer look at the summary of one of your 3-input models available in your workspace as model. Note how many layers the model has, how many parameters it has, and how many of those parameters are trainable/non-trainable.\n", "\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model: \"model\"\n", "__________________________________________________________________________________________________\n", "Layer (type) Output Shape Param # Connected to \n", "==================================================================================================\n", "Team-1-In (InputLayer) [(None, 1)] 0 \n", "__________________________________________________________________________________________________\n", "Team-2-In (InputLayer) [(None, 1)] 0 \n", "__________________________________________________________________________________________________\n", "Team-Strength-Model (Model) (None, 1) 10888 Team-1-In[0][0] \n", " Team-2-In[0][0] \n", "__________________________________________________________________________________________________\n", "Home-In (InputLayer) [(None, 1)] 0 \n", "__________________________________________________________________________________________________\n", "concatenate (Concatenate) (None, 3) 0 Team-Strength-Model[1][0] \n", " Team-Strength-Model[2][0] \n", " Home-In[0][0] \n", "__________________________________________________________________________________________________\n", "dense (Dense) (None, 1) 4 concatenate[0][0] \n", "==================================================================================================\n", "Total params: 10,892\n", "Trainable params: 10,892\n", "Non-trainable params: 0\n", "__________________________________________________________________________________________________\n" ] } ], "source": [ "model.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting models\n", "In addition to summarizing your model, you can also plot your model to get a more intuitive sense of it." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from tensorflow.keras.utils import plot_model\n", "\n", "# Plot model\n", "plot_model(model, to_file='../images/team_strength_model.png')\n", "\n", "# Display the image\n", "data = plt.imread('../images/team_strength_model.png')\n", "plt.imshow(data);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stacking models\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add the model predictions to the tournament data\n", "In lesson 1 of this chapter, you used the regular season model to make predictions on the tournament dataset, and got pretty good results! Try to improve your predictions for the tournament by modeling it specifically.\n", "\n", "You'll use the prediction from the regular season model as an input to the tournament model. This is a form of \"model stacking.\"\n", "\n", "To start, take the regular season model from the previous lesson, and predict on the tournament data. Add this prediction to the tournament data as a new column." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Predict\n", "games_tourney['pred'] = model.predict([games_tourney['team_1'], \n", " games_tourney['team_2'], \n", " games_tourney['home']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create an input layer with multiple columns\n", "In this exercise, you will look at a different way to create models with multiple inputs. This method only works for purely numeric data, but its a much simpler approach to making multi-variate neural networks.\n", "\n", "Now you have three numeric columns in the tournament dataset: `'seed_diff'`, `'home'`, and `'pred'`. In this exercise, you will create a neural network that uses a single input layer to process all three of these numeric inputs.\n", "\n", "This model should have a single output to predict the tournament game score difference." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Create an input layer with 3 columns\n", "input_tensor = Input(shape=(3, ))\n", "\n", "# Pass it to a Dense layer with 1 unit\n", "output_tensor = Dense(1)(input_tensor)\n", "\n", "# Create a model\n", "model = Model(input_tensor, output_tensor)\n", "\n", "# Compile the model\n", "model.compile(optimizer='adam', loss='mean_absolute_error')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fit the model\n", "Now that you've enriched the tournament dataset and built a model to make use of the new data, fit that model to the tournament data.\n", "\n", "Note that this `model` has only one input layer that is capable of handling all 3 inputs, so it's inputs and outputs do not need to be a list.\n", "\n", "Tournament games are split into a training set and a test set. The tournament games before 2010 are in the training set, and the ones after 2010 are in the test set." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "games_tourney_train = games_tourney[games_tourney['season'] <= 2010]\n", "games_tourney_test = games_tourney[games_tourney['season'] > 2010]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "103/103 [==============================] - 0s 715us/step - loss: 9.2063\n" ] } ], "source": [ "# Fit the model\n", "model.fit(games_tourney_train[['home', 'seed_diff', 'pred']],\n", " games_tourney_train['score_diff'],\n", " epochs=1,\n", " verbose=True);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate the model\n", "Now that you've fit your model to the tournament training data, evaluate it on the tournament test data. Recall that the tournament test data contains games from after 2010.\n", "\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "30/30 [==============================] - 0s 763us/step - loss: 9.1752\n", "9.175172805786133\n" ] } ], "source": [ "# Evaluate the model on the games_tourney_test dataset\n", "print(model.evaluate(games_tourney_test[['home', 'seed_diff', 'pred']],\n", " games_tourney_test['score_diff'],\n", " verbose=True))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }