{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Course 2 week 1 lecture notebook Exercise 03\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Combine features\n", "\n", "In this exercise, you will practice how to combine features in a pandas dataframe. This will help you in the graded assignment at the end of the week. \n", "\n", "In addition, you will explore why it makes more sense to multiply two features rather than add them in order to create interaction terms.\n", "\n", "First, you will generate some data to work with." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import pandas\n", "import pandas as pd\n", "\n", "# Import a pre-defined function that generates data\n", "from utils import load_data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Generate features and labels\n", "X, y = load_data(100)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeSystolic_BPDiastolic_BPCholesterol
077.19634078.78420887.02656982.760275
163.529850105.17167683.39611380.923284
269.003986117.58225991.16196692.915422
382.63821094.13120869.47042395.766098
478.346286105.38518687.250583120.868124
\n", "
" ], "text/plain": [ " Age Systolic_BP Diastolic_BP Cholesterol\n", "0 77.196340 78.784208 87.026569 82.760275\n", "1 63.529850 105.171676 83.396113 80.923284\n", "2 69.003986 117.582259 91.161966 92.915422\n", "3 82.638210 94.131208 69.470423 95.766098\n", "4 78.346286 105.385186 87.250583 120.868124" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Age', 'Systolic_BP', 'Diastolic_BP', 'Cholesterol'], dtype='object')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "feature_names = X.columns\n", "feature_names" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Combine strings\n", "Even though you can visually see feature names and type the name of the combined feature, you can programmatically create interaction features so that you can apply this to any dataframe.\n", "\n", "Use f-strings to combine two strings. There are other ways to do this, but Python's f-strings are quite useful." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "name1: Age\n", "name2: Systolic_BP\n" ] } ], "source": [ "name1 = feature_names[0]\n", "name2 = feature_names[1]\n", "\n", "print(f\"name1: {name1}\")\n", "print(f\"name2: {name2}\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Age_&_Systolic_BP'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Combine the names of two features into a single string, separated by '_&_' for clarity\n", "combined_names = f\"{name1}_&_{name2}\"\n", "combined_names" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add two columns\n", "- Add the values from two columns and put them into a new column.\n", "- You'll do something similar in this week's assignment." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeSystolic_BPDiastolic_BPCholesterolAge_&_Systolic_BP
077.1963478.78420887.02656982.760275155.980548
163.52985105.17167683.39611380.923284168.701526
\n", "
" ], "text/plain": [ " Age Systolic_BP Diastolic_BP Cholesterol Age_&_Systolic_BP\n", "0 77.19634 78.784208 87.026569 82.760275 155.980548\n", "1 63.52985 105.171676 83.396113 80.923284 168.701526" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X[combined_names] = X['Age'] + X['Systolic_BP']\n", "X.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Why we multiply two features instead of adding\n", "\n", "Why do you think it makes more sense to multiply two features together rather than adding them together?\n", "\n", "Please take a look at two features, and compare what you get when you add them, versus when you multiply them together." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
v1v2v1 + v2v1 x v2
01100101100
11200201200
21300301300
32100102200
42200202400
52300302600
63100103300
73200203600
83300303900
\n", "
" ], "text/plain": [ " v1 v2 v1 + v2 v1 x v2\n", "0 1 100 101 100\n", "1 1 200 201 200\n", "2 1 300 301 300\n", "3 2 100 102 200\n", "4 2 200 202 400\n", "5 2 300 302 600\n", "6 3 100 103 300\n", "7 3 200 203 600\n", "8 3 300 303 900" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Generate a small dataset with two features\n", "df = pd.DataFrame({'v1': [1,1,1,2,2,2,3,3,3],\n", " 'v2': [100,200,300,100,200,300,100,200,300]\n", " })\n", "\n", "# add the two features together\n", "df['v1 + v2'] = df['v1'] + df['v2']\n", "\n", "# multiply the two features together\n", "df['v1 x v2'] = df['v1'] * df['v2']\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It may not be immediately apparent how adding or multiplying makes a difference; either way you get unique values for each of these operations.\n", "\n", "To view the data in a more helpful way, rearrange the data (pivot it) so that:\n", "- feature 1 is the row index \n", "- feature 2 is the column name. \n", "- Then set the sum of the two features as the value. \n", "\n", "Display the resulting data in a heatmap." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Import seaborn in order to use a heatmap plot\n", "import seaborn as sns" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "v1 + v2\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
v2100200300
v1
1101201301
2102202302
3103203303
\n", "
" ], "text/plain": [ "v2 100 200 300\n", "v1 \n", "1 101 201 301\n", "2 102 202 302\n", "3 103 203 303" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEGCAYAAABFBX+4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAARuUlEQVR4nO3df7BndV3H8edrlwVLEMIFwt0tcIIMUgFzoUxDyQTGcfvDIWyUhRjvVFRQzBjgH0STReZgOE5M28AoDiOCkFJhuRpqWgstuIKwGuuv2HVhB0F+RKH33nd/fM/it22/997v7vfe7/24z8fOmXu+n3O+53zm/PHe9/f9+ZxzUlVIktqxZNwdkCQNx8AtSY0xcEtSYwzcktQYA7ckNWa/cXdgkP32X+F0l3n2heUnj7sLP/RO+uKfjbsL+4RlR/x09vYY33/063OOOcuWv3ivz7c3zLglqTGLNuOWpAU1PTXuHsyZgVuSAKYmx92DOTNwSxJQNT3uLsyZgVuSAKYN3JLUFjNuSWqMg5OS1BgzbklqSzmrRJIa4+CkJDXGUokkNcbBSUlqjBm3JDXGwUlJaoyDk5LUlipr3JLUFmvcktQYSyWS1BgzbklqzNT3x92DOTNwSxI0VSrxZcGSBL1SyVyXGSRZleSOJA8kuT/JhV37CUk2JNmUZGOS1V17krwvyZYk9yY5abaumnFLEowy454ELq6qe5IcBNydZD3wbuCKqvpEkjO7z6cCZwDHdMvJwDXd34EM3JIEIwvcVbUd2N6tP5VkM7ACKOAF3W4HA9/u1tcA11dVARuSHJLkyO44u2XgliSghhicTDIBTPQ1rauqdbvZ7yjgROBO4CLgn5K8h16Z+he63VYAD/V9bWvXZuCWpBkNMR2wC9L/L1D3S3IgcAtwUVU9meRPgN+vqluSnAVcC/zynnTVwUlJgl6pZK7LLJIsoxe0b6iqW7vmtcDO9ZuB1d36NmBV39dXdm0DGbglCUY5qyT0sunNVXVV36ZvA7/Urb8OeLBbvw04p5tdcgrwxEz1bbBUIkk9o5tV8irgbcB9STZ1bZcBbweuTrIf8D/8oEZ+O3AmsAV4BjhvthMYuCUJRnbLe1V9HsiAza/Yzf4FXDDMOQzckgQw6YsUJKktPmRKkhrT0LNKDNySBGbcktSchjLuBZ/HnWTWqS6StOBGNI97IYzjBpwrBm1IMtE97nDj9PR/LWSfJO3rJifnvozZvJRKktw7aBNwxKDv9d//v9/+K2oeuiZJu1fthJz5qnEfAbwBeHyX9gD/Ok/nlKQ911CNe74C998DB1bVpl03JPnMPJ1Tkvbcvh64q+r8Gbb9+nycU5L2yiIYdJwrpwNKEsDU1Lh7MGcGbkkCSyWS1BwDtyQ1xhq3JLWlpp3HLUltsVQiSY1xVokkNcaMW5IaY+CWpMb4kClJaowZtyQ1xumAktQYZ5VIUlvKUokkNcZSiSQ1xmeVSFJjzLglqTGTDk5KUlsslUhSYyyVSFJbnA4oSa0x45akxjQUuJeMuwOStChMTc19mUGSVUnuSPJAkvuTXLjL9ouTVJLl3eckeV+SLUnuTXLSbF0145YkRvrOyUng4qq6J8lBwN1J1lfVA0lWAb8C/Gff/mcAx3TLycA13d+BzLglCXqlkrkuM6iq7VV1T7f+FLAZWNFtfi/wDqD/IGuA66tnA3BIkiNnOoeBW5Kg9zzuOS5JJpJs7FsmdnfIJEcBJwJ3JlkDbKuqL+2y2wrgob7PW/lBoN8tSyWSBEMNTlbVOmDdTPskORC4BbiIXvnkMnplkr1m4JYkGOmskiTL6AXtG6rq1iQvBY4GvpQEYCVwT5LVwDZgVd/XV3ZtAxm4JQmoqdHcgJNeZL4W2FxVVwFU1X3A4X37fBP4uap6NMltwO8kuZHeoOQTVbV9pnMs2sCdcXdgH7B0STt3ijUrDiM1Y3QZ96uAtwH3JdnUtV1WVbcP2P924ExgC/AMcN5sJ1i0gVuSFtKopgNW1eeZJfesqqP61gu4YJhzGLglCZq6c9LALUkADVUODdySBNRkO5HbwC1JYMYtSa0Z4bNK5p2BW5LAjFuSWmPGLUmtMeOWpLbU5Lh7MHcGbkkCyoxbkhpj4JaktphxS1JjDNyS1Jiaaudh0gZuScKMW5KaU9Nm3JLUFDNuSWpMlRm3JDXFjFuSGjPtrBJJaouDk5LUGAO3JDWm2nkct4FbksCMW5Ka43RASWrMlLNKJKktZtyS1Bhr3JLUGGeVSFJjzLglqTFT00vG3YU5M3BLEm2VSvbov5gkrx91RyRpnKYrc17GbU9/G1w72w5JXpLktCQH7tJ++h6eU5LmTVXmvIzbwFJJktsGbQJeONNBk/wecAGwGbg2yYVV9fFu858C/7gHfZWkeTOqUkmSVcD1wBFAAeuq6uokhwIfAY4CvgmcVVWPJwlwNXAm8AxwblXdM9M5Zqpxvxp4K/D0rv0CVs/S97cDr6iqp5McBXw0yVFVdXX3/d1KMgFMACxZejBLljx/ltNI0miMsAQyCVxcVfckOQi4O8l64Fzg01V1ZZJLgEuAPwTOAI7plpOBa7q/A80UuDcAz1TVZ3fdkOSrs3R8SVU9DVBV30xyKr3g/ZPMELirah2wDmDZ/isaGiqQ1LpRzSqpqu3A9m79qSSbgRXAGuDUbrcPAp+hF7jXANdXVQEbkhyS5MjuOLs1sKdVdUZV3ZHkD5Ks2GXba2bp+yNJTujb/2ngjcBy4KWzfFeSFlwNsSSZSLKxb5nY3TG7isOJwJ3AEX3B+GF6pRToBfWH+r62tWsbaC7TAQ8CPpnkMXr1mZur6pFZvnMOvZ8Lz6mqSeCcJH89h3NK0oIaplTSXx0YpJuYcQtwUVU92StlP/f9SrLHVYVZfxtU1RVVdTy9wcYjgc8m+dQs39laVQ8P2PaFPeqpJM2jUc4qSbKMXtC+oapu7ZofSXJkt/1IYEfXvg1Y1ff1lV3bQMMUdXbQS++/Axw+xPckadGbHmKZSTdL5Fpgc1Vd1bfpNmBtt74W+Hhf+znpOQV4Yqb6NsyhVJLkt4GzgMOAm4G3V9UDs31PklpSg+dNDOtVwNuA+5Js6touA64EbkpyPvAtenEV4HZ6UwG30JsOeN5sJ5hLjXsVvRrNpln3lKRGTY5oOmBVfZ7Bs+dO283+Ra8UPWezBu6qunSYA0pSi0aYcc87HzIlScxeu15MDNyShBm3JDXHjFuSGjNlxi1JbWnozWUGbkkCmDbjlqS2tPQ4UgO3JOHgpCQ1ZzqWSiSpKVPj7sAQDNyShLNKJKk5ziqRpMY4q0SSGmOpRJIa43RASWrMlBm3JLXFjFuSGmPglqTGjOiVkwvCwC1JmHFLUnO85V2SGuM8bklqjKUSSWqMgVuSGuOzSiSpMda4JakxzioZgTT0GqFWLUlLPw4btWTpuHugOZpuqFiyaAO3JC0kByclqTHt5NsGbkkCzLglqTmTDY35GLglibZKJUvG3QFJWgymh1hmk+S6JDuSfHmX9t9N8pUk9yd5d1/7pUm2JPlqkjfMdnwzbkli5NMBPwC8H7h+Z0OS1wJrgJdX1bNJDu/ajwPOBo4HXgR8KsmxVTVwarkZtyTRK5XMdZn1WFWfAx7bpfm3gCur6tlunx1d+xrgxqp6tqq+AWwBVs90fAO3JDFcqSTJRJKNfcvEHE5xLPDqJHcm+WySV3btK4CH+vbb2rUNZKlEkoCpIUolVbUOWDfkKfYDDgVOAV4J3JTkxUMe47kDSdI+bwHmcW8Fbq2qAu5KMg0sB7YBq/r2W9m1DWSpRJKAGuLfHvoY8FqAJMcC+wOPArcBZyc5IMnRwDHAXTMdyIxbkhhtxp3kw8CpwPIkW4HLgeuA67opgt8D1nbZ9/1JbgIeACaBC2aaUQIGbkkCRjsdsKreMmDTWwfs/y7gXXM9voFbkmjrzkkDtyQBkw2FbgO3JMHeDDouOAO3JOFjXSWpOWbcktQYM25JasxUmXFLUlN8y7skNcYatyQ1xhq3JDXGUokkNcZSiSQ1xlklktQYSyWS1BgHJyWpMda4JakxlkokqTHl4KQktWXKjFuS2mKpBEiyGqiq+vckxwGnA1+pqtvn65yStKf2+VJJksuBM4D9kqwHTgbuAC5JcmL3RmNJWjTMuOHNwAnAAcDDwMqqejLJe4A7GfAa+iQTwATA0qWHsGTp8+epe5L0fzkdECaragp4JsnXqupJgKr67yQD57lX1TpgHcD+B6xs5ypKap63vMP3kvxoVT0DvGJnY5KDaesGJUn7CEsl8JqqehagqvoD9TJg7TydU5L22D4fuHcG7d20Pwo8Oh/nlKS9sc/PKpGk1uzzGbcktcZZJZLUmKlqZ96EgVuSsMYtSc2xxi1JjbHGLUmNmW6oVLJk3B2QpMWghvg3myTXJdmR5Mt9bX+R5CtJ7k3yt0kO6dt2aZItSb6a5A2zHd/ALUn0ZpXMdZmDD9B7lHW/9cDPVtXLgP8ALgXoHnt9NnB8952/SrJ0poMbuCWJXqlkrstsqupzwGO7tH2yqia7jxuAld36GuDGqnq2qr4BbAFWz3R8A7ckMVypJMlEko19y8SQp/sN4BPd+grgob5tW7u2gRyclCSGG5zsfwT1sJK8E5gEbtiT74OBW5KAhZkOmORc4I3AafWDO362Aav6dlvZtQ1kqUSSgKmamvOyJ5KcDrwDeFP3roKdbgPOTnJAkqOBY4C7ZjqWGbckMdpb3pN8GDgVWJ5kK3A5vVkkBwDrkwBsqKrfrKr7k9wEPECvhHJB9waxgQzcksRob3mvqrfspvnaGfZ/FwPexbs7Bm5JwodMSVJzWrrl3cAtSfiQKUlqji9SkKTGWOOWpMZY45akxphxS1JjfHWZJDXGjFuSGuOsEklqjIOTktQYSyWS1BjvnJSkxphxS1JjWqpxp6X/ZRa7JBPdu+g0T7zG889rvPj56rLRGvZNzxqe13j+eY0XOQO3JDXGwC1JjTFwj5Z1wfnnNZ5/XuNFzsFJSWqMGbckNcbALUmNMXAPIcl1SXYk+XJf26FJ1id5sPv7Y117krwvyZYk9yY5aXw9b0OSVUnuSPJAkvuTXNi1e41HKMnzktyV5Evddb6iaz86yZ3d9fxIkv279gO6z1u67UeNs/8ycA/rA8Dpu7RdAny6qo4BPt19BjgDOKZbJoBrFqiPLZsELq6q44BTgAuSHIfXeNSeBV5XVS8HTgBOT3IK8OfAe6vqp4DHgfO7/c8HHu/a39vtpzEycA+hqj4HPLZL8xrgg936B4Ff7Wu/vno2AIckOXJhetqmqtpeVfd0608Bm4EVeI1HqrteT3cfl3VLAa8DPtq173qdd17/jwKnJckCdVe7YeDee0dU1fZu/WHgiG59BfBQ335buzbNQfdz/ETgTrzGI5dkaZJNwA5gPfA14LtVNdnt0n8tn7vO3fYngBcubI/Vz8A9QtWbW+n8yr2U5EDgFuCiqnqyf5vXeDSqaqqqTgBWAquBl4y5SxqCgXvvPbLz53n3d0fXvg1Y1bffyq5NM0iyjF7QvqGqbu2avcbzpKq+C9wB/Dy9UtPOJ4b2X8vnrnO3/WDgOwvcVfUxcO+924C13fpa4ON97ed0Mx9OAZ7o+7mv3ejqptcCm6vqqr5NXuMRSnJYkkO69R8BXk9vPOEO4M3dbrte553X/83AP5d37o2Vd04OIcmHgVOB5cAjwOXAx4CbgJ8AvgWcVVWPdUHo/fRmoTwDnFdVG8fR71Yk+UXgX4D7gJ1vbr2MXp3bazwiSV5Gb7BxKb3k7aaq+uMkLwZuBA4Fvgi8taqeTfI84EP0xhweA86uqq+Pp/cCA7ckNcdSiSQ1xsAtSY0xcEtSYwzcktQYA7ckNcbArWYkOSHJv3VPtLs3ya+Nu0/SODgdUM1Iciy9u94fTPIi4G7gZ7q7/6R9hhm3FqUkVya5oO/zHwFvqqoHAarq2/RufT9sPD2UxsfArcXqI8BZfZ/P6toASLIa2J/eU+2kfcp+s+8iLbyq+mKSw7uSyGH0HuT/EDz3oKkPAWuranqm40g/jAzcWsxupvdQox+ny7aTvAD4B+Cd3csTpH2Og5NatJIcD/wNvYd6/RK9R4l+Avi7qvrLcfZNGidr3Fq0qup+4CBgW/e41rOA1wDnJtnULSeMtZPSGJhxS1JjzLglqTEGbklqjIFbkhpj4Jakxhi4JakxBm5JaoyBW5Ia879ZcqFWENK3uQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Pivot the data so that v1 + v2 is the value\n", "\n", "df_add = df.pivot(index='v1',\n", " columns='v2',\n", " values='v1 + v2'\n", " )\n", "print(\"v1 + v2\\n\")\n", "display(df_add)\n", "print()\n", "sns.heatmap(df_add);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that it doesn't seem like you can easily distinguish clearly when you vary feature 1 (which ranges from 1 to 3), since feature 2 is so much larger in magnitude (100 to 300). This is because you added the two features together." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### View the 'multiply' interaction\n", "\n", "Now pivot the data so that:\n", "- feature 1 is the row index \n", "- feature 2 is the column name. \n", "- The values are 'v1 x v2' \n", "\n", "Use a heatmap to visualize the table." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "v1 x v2\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
v2100200300
v1
1100200300
2200400600
3300600900
\n", "
" ], "text/plain": [ "v2 100 200 300\n", "v1 \n", "1 100 200 300\n", "2 200 400 600\n", "3 300 600 900" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEKCAYAAAAyx7/DAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAATQUlEQVR4nO3df7BndX3f8eeL30n8sYCIuEsK6irBEhekBJuGOjBOwDgu6SglHQtlGNdOSaLTTltJZ2rslE7smBKdzDC9BhWsEZD8cGtIGkQ0/RGwJGwoP2JZFWd3u7BRflgkJd573/3j+7n67e398b13v9/93g/7fDCf+Z7v55zvOZ85M7z3fd/nc85JVSFJ6scR0x6AJGltDNyS1BkDtyR1xsAtSZ0xcEtSZwzcktQZA7ckjVmS9yZ5MMlDSd7X+k5IcmeSR9vn8a0/ST6aZHeSB5Kcs9r+DdySNEZJ/jrwbuA84A3A25K8Bng/cFdVbQXuat8BLgG2trYDuGG1Yxi4JWm8fgy4t6qeq6pZ4MvA3wG2Aze1bW4CLm3L24Gba+AeYFOSU1Y6wFGTGffBO+qYzd7SOWFnnXDatIfwgnf2cSv+/6cx+fhjt+dg9/G9b3195JhzzEmvfg+D7HjBTFXNtOUHgeuSnAj8JfBW4D7g5Kra37Z5HDi5LW8G9gzta2/r288yNmzglqSNqgXpmWXWPZLkQ8AfAt8FdgFzi7apJOtOTi2VSBLA/NzobRVVdWNVvbGqLgCeAv4n8MRCCaR9Hmib7wNOHfr5lta3LAO3JAHMzY7eVpHk5e3zRxnUt38T2Alc2Ta5EvhcW94JXNFml5wPPDNUUlmSpRJJAqrmx7m732o17u8B11TV00l+BbgtydXAN4HL2rZ3MKiD7waeA65abecGbkkCmB9f4K6qn1qi79vARUv0F3DNWvZv4JYkgPFm3BNl4JYkGOmi40Zh4JYkMOOWpN7UCLNFNgoDtyTBWC9OTpqBW5LAUokkdceLk5LUGTNuSeqMFyclqTNenJSkvlRZ45akvljjlqTOWCqRpM6YcUtSZ+a+N+0RjMzALUlgqUSSumOpRJI6Y8YtSZ0xcEtSX8qLk5LUGWvcktQZSyWS1BkzbknqjBm3JHXGjFuSOjPrixQkqS9m3JLUGWvcktQZM25J6kxHGfcRh/qASa461MeUpFXV/Ohtyg554AY+uNyKJDuS3Jfkvvn57x7KMUk63M3Ojt6mbCKlkiQPLLcKOHm531XVDDADcNQxm2sCQ5OkpdV4Qk6S1wG3DnW9CviXwCbg3cBftP5fqqo72m+uBa4G5oBfrKr/tNIxJlXjPhn4aeCpRf0B/tuEjilJ6zemGndVfRXYBpDkSGAf8DvAVcD1VfXh4e2TnAlcDrweeCXwhSSvraq55Y4xqcD9eeBFVbVr8YokX5rQMSVp/SZzcfIi4GtV9c0ky22zHbilqp4HvpFkN3Ae8MfL/WAiNe6qurqq/ssy6/7eJI4pSQdlDRcnh6/HtbZjmb1eDnxm6PvPJ3kgyceTHN/6NgN7hrbZ2/qWNY2Lk5K08czNjdyqaqaqzh1qM4t3l+QY4O3AZ1vXDcCrGZRR9gO/ut6hOo9bkmASpZJLgD+tqicAFj4BknyMQUkZBjXwU4d+t6X1LcuMW5JgELhHbaP5OYbKJElOGVr3s8CDbXkncHmSY5OcDmwFvrLSjs24JQnGemNNkh8B3gK8Z6j73ybZBhTw2MK6qnooyW3Aw8AscM1KM0rAwC1JANT8+G4dqarvAicu6vv7K2x/HXDdqPs3cEsSdPWsEgO3JMFgxkgnDNySBGbcktQdA7ckdWZMD5k6FAzckgRm3JLUnTFOB5w0A7ckgbNKJKk3ZalEkjpjqUSSOrMBXgI8KgO3JIEZtyR1Z9aLk5LUF0slktQZSyWS1BenA0pSb8y4JakzBm5J6oy3vEtSX8b5zslJM3BLElgqkaTuOKtEkjpjxi1JnTFwS1Jfas5SyUE764TTpj2EF7z3HH36tIfwgvfOs/ZMewgalRm3JPXF6YCS1BsDtyR1pp8St4FbkgBqtp/IbeCWJDDjlqTe9HRx8ohpD0CSNoT5NbRVJNmU5PYkf57kkSRvSnJCkjuTPNo+j2/bJslHk+xO8kCSc1bbv4Fbkhhk3KO2EXwE+IOqOgN4A/AI8H7grqraCtzVvgNcAmxtbQdww2o7N3BLEowt407yUuAC4EaAqvqrqnoa2A7c1Da7Cbi0LW8Hbq6Be4BNSU5Z6RgGbkkCanb0lmRHkvuG2o6hXZ0O/AXwiST3J/mNJD8CnFxV+9s2jwMnt+XNwPAttntb37K8OClJQK1hVklVzQAzy6w+CjgH+IWqujfJR/hBWWTh95Vk3VdDzbglCcZ5cXIvsLeq7m3fb2cQyJ9YKIG0zwNt/T7g1KHfb2l9yzJwSxKDjHvUtuJ+qh4H9iR5Xeu6CHgY2Alc2fquBD7XlncCV7TZJecDzwyVVJZkqUSSWFupZAS/AHw6yTHA14GrGCTKtyW5GvgmcFnb9g7grcBu4Lm27YoM3JIE1FzGt6+qXcC5S6y6aIltC7hmLfs3cEsSY8+4J8rALUlAzY8v4540A7ckYcYtSd2pMuOWpK6YcUtSZ+bHOKtk0gzckoQXJyWpOwZuSepM9fMCHAO3JIEZtyR1x+mAktSZOWeVSFJfzLglqTPWuCWpM84qkaTOmHFLUmfm5vt5k6OBW5Loq1Syrn9ikrxl3AORpGmar4zcpm29fxvcuNoGSc5IclGSFy3qv3idx5SkianKyG3ali2VJNm53CrgxJV2muQXGbz88hHgxiTvraqFV9H/G+AP1jFWSZqYnkolK9W4fwp4F/Dsov4A562y33cDb6yqZ5OcBtye5LSq+kj7/ZKS7AB2AGx58at42Q+/YpXDSNJ4bIQSyKhWCtz3AM9V1ZcXr0jy1VX2e0RVPQtQVY8leTOD4P3XWCFwV9UMMANw9it+sqN//yT1rqdZJcuOtKouqaq7k/zjJJsXrbtglf0+kWTb0PbPAm8DXgacdTADlqRJqDW0aRtlOuCLgT9M8iRwK/DZqnpild9cAcwOd1TVLHBFkn+/rpFK0gT1VCpZ9W+DqvpgVb2ewcXGU4AvJ/nCKr/ZW1WPL7Puv65rpJI0QS+IWSVLOAA8DnwbePlkhiNJ09HRS95Xz7iT/KMkXwLuYjAN8N1V9eOTHpgkHUpFRm7TNkrGfSrwvqraNenBSNK0zG6AEsioVg3cVXXtoRiIJE3TRsikR+VDpiSJvmrcBm5Joq+Mu59bhSRpgubX0EaR5Mgk9yf5fPv+ySTfSLKrtW2tP0k+mmR3kgeSnLPavs24JQmYG3/G/V4GD9p7yVDfP62q2xdtdwmwtbWfAG5on8sy45YkYD6jt9Uk2QL8DPAbIxx6O3BzDdwDbEpyyko/MHBLEjBPRm5JdiS5b6jtWLS7XwP+Gf9/ZeW6Vg65PsmxrW8zsGdom72tb1kGbklibQ+ZqqqZqjp3qM0s7CfJ24ADVfUniw5xLXAG8DeAE4B/vt6xGrglibFenPxJ4O1JHgNuAS5M8h+qan8rhzwPfIIfvNdgH4MbHRdsaX3LMnBLEjCfjNxWUlXXVtWWqjoNuBz4YlW9a6FunSTApcCD7Sc7GTw5NUnOB56pqv0rHcNZJZIEzE3+EJ9OchKDl8nsAv5h678DeCuwG3gOuGq1HRm4JYnRZousVVV9CfhSW75wmW2KwWOzR2bgliQGs0p6YeCWJDbGK8lGZeCWJCZTKpkUA7ck4dMBJak7c2bcktQXM25J6oyBW5I609ErJw3ckgRm3JLUnUNwy/vYGLglCedxS1J3LJVIUmcM3JLUGZ9VIkmdscYtSZ1xVskYnH3cim+n1xi886w9q2+kg/KST3xi2kPQiOY7KpZs2MAtSYeSFyclqTP95NsGbkkCzLglqTuz6SfnNnBLEpZKJKk7lkokqTNOB5SkzvQTtg3ckgRYKpGk7sx1lHMbuCUJM25J6k6ZcUtSX8y4JakzPU0HPGLaA5CkjaDW0FaS5LgkX0nyZ0keSvLB1n96knuT7E5ya5JjWv+x7fvutv601cZq4JYkYJYaua3ieeDCqnoDsA24OMn5wIeA66vqNcBTwNVt+6uBp1r/9W27FRm4JYnBxclR/1txPwPPtq9Ht1bAhcDtrf8m4NK2vL19p62/KMmKL1IzcEsSg4uTo7YkO5LcN9R2DO8ryZFJdgEHgDuBrwFPV9Vs22QvsLktbwb2ALT1zwAnrjRWL05KEmubDlhVM8DMCuvngG1JNgG/A5xx0AMcYsYtSawt4x5VVT0N3A28CdiUZCFZ3gLsa8v7gFMB2vqXAt9eab8GbkkC5qpGbitJclLLtEnyQ8BbgEcYBPB3tM2uBD7Xlne277T1X6xa+SCWSiSJsc7jPgW4KcmRDJLj26rq80keBm5J8q+B+4Eb2/Y3Ap9Ksht4Erh8tQMYuCWJ8d3yXlUPAGcv0f914Lwl+v8P8M61HMPALUl4y7skdaenW94N3JKETweUpO6sNltkIzFwSxKWSiSpO16clKTOWOOWpM5YKpGkzqxyl/mGYuCWJGDOjFuS+mKpBEhyHoOXQfz3JGcCFwN/XlV3TOqYkrReh32pJMkHgEuAo5LcCfwEg0cavj/J2VV13SSOK0nrZcY9eKbsNuBY4HFgS1V9J8mHgXuBJQN3e/3PDoC/ecLZvO7Fr5rQ8CTp/9XTdMBJvUhhtqrmquo54GtV9R2AqvpLVpjnXlUzVXVuVZ1r0JZ0KI3rRQqHwqQy7r9K8sMtcL9xoTPJS+nrBiVJhwlLJXBBVT0PUFXDgfpofvCKHknaMA77wL0QtJfo/xbwrUkcU5IOxmE/q0SSenPYZ9yS1JueZpUYuCUJmKt+5k0YuCUJa9yS1B1r3JLUGWvcktSZeUslktQXM25J6oyzSiSpM5ZKJKkzlkokqTNm3JLUGTNuSerMXM1Newgjm9QbcCSpK1U1cltNko8nOZDkwaG+X06yL8mu1t46tO7aJLuTfDXJT6+2fzNuSWLst7x/Evh14OZF/ddX1YeHO5KcCVwOvB54JfCFJK+tWv5PADNuSWK8GXdV/RHw5IiH3g7cUlXPV9U3gN3AeSv9wMAtSQxmlYzakuxIct9Q2zHiYX4+yQOtlHJ869sM7BnaZm/rW5aBW5IYzCoZ+b+qmao6d6jNjHCIG4BXA9uA/cCvrnes1rglicnf8l5VTywsJ/kY8Pn2dR9w6tCmW1rfssy4JYnx1riXkuSUoa8/CyzMONkJXJ7k2CSnA1uBr6y0LzNuSWK8d04m+QzwZuBlSfYCHwDenGQbUMBjwHsAquqhJLcBDwOzwDUrzSgBA7ckAeN9dVlV/dwS3TeusP11wHWj7t/ALUn46jJJ6o4vC5akzvgiBUnqjI91laTOWCqRpM74PG5J6owZtyR1pqcad3r6V2ajS7JjxIfNaJ08x5PnOd74fFbJeI36aEetn+d48jzHG5yBW5I6Y+CWpM4YuMfLuuDkeY4nz3O8wXlxUpI6Y8YtSZ0xcEtSZwzca9DezHwgyYNDfSckuTPJo+3z+NafJB9Nsru91fmc6Y28D0lOTXJ3koeTPJTkva3fczxGSY5L8pUkf9bO8wdb/+lJ7m3n89Ykx7T+Y9v33W39adMcvwzca/VJ4OJFfe8H7qqqrcBd7TvAJQzeHbeVwbzYGw7RGHs2C/yTqjoTOB+4JsmZeI7H7Xngwqp6A4M3jl+c5HzgQ8D1VfUa4Cng6rb91cBTrf/6tp2myMC9BlX1R8CTi7q3Aze15ZuAS4f6b66Be4BNi14WqkWqan9V/Wlb/t/AI8BmPMdj1c7Xs+3r0a0VcCFwe+tffJ4Xzv/twEVJcoiGqyUYuA/eyVW1vy0/DpzcljcDe4a229v6NIL25/jZwL14jscuyZFJdgEHgDuBrwFPV9Vs22T4XH7/PLf1zwAnHtoRa5iBe4xqMLfS+ZUHKcmLgN8C3ldV3xle5zkej6qaq6ptwBbgPOCMKQ9Ja2DgPnhPLPx53j4PtP59wKlD221pfVpBkqMZBO1PV9Vvt27P8YRU1dPA3cCbGJSaFp4YOnwuv3+e2/qXAt8+xEPVEAP3wdsJXNmWrwQ+N9R/RZv5cD7wzNCf+1pCq5veCDxSVf9uaJXneIySnJRkU1v+IeAtDK4n3A28o222+DwvnP93AF8s79ybKu+cXIMknwHeDLwMeAL4APC7wG3AjwLfBC6rqidbEPp1BrNQngOuqqr7pjHuXiT5W8B/Bv4HsPDm1l9iUOf2HI9Jkh9ncLHxSAbJ221V9a+SvAq4BTgBuB94V1U9n+Q44FMMrjk8CVxeVV+fzugFBm5J6o6lEknqjIFbkjpj4Jakzhi4JakzBm5J6oyBW91Isi3JH7cn2j2Q5O9Oe0zSNDgdUN1I8loGd70/muSVwJ8AP9bu/pMOG2bc2pCS/EqSa4a+/zLw9qp6FKCq/heDW99Pms4IpekxcGujuhW4bOj7Za0PgCTnAccweKqddFg5avVNpEOvqu5P8vJWEjmJwYP898D3HzT1KeDKqppfaT/SC5GBWxvZZxk81OgVtGw7yUuA3wP+RXt5gnTY8eKkNqwkrwc+xuChXn+bwaNEfx/4j1X1a9McmzRN1ri1YVXVQ8CLgX3tca2XARcA/yDJrta2TXWQ0hSYcUtSZ8y4JakzBm5J6oyBW5I6Y+CWpM4YuCWpMwZuSeqMgVuSOvN/AdERFKbmo1DHAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df_mult = df.pivot(index='v1',\n", " columns='v2',\n", " values='v1 x v2'\n", " )\n", "print('v1 x v2')\n", "display(df_mult)\n", "print()\n", "sns.heatmap(df_mult);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how when you multiply the features, the heatmap looks more like a 'grid' shape instead of three vertical bars. \n", "\n", "This means that you are more clearly able to make a distinction as feature 1 varies from 1 to 2 to 3." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Discussion\n", "\n", "When you find the interaction between two features, you ideally hope to see how varying one feature makes an impact on the interaction term. This is better achieved by multiplying the two features together rather than adding them together. \n", "\n", "Another way to think of this is that you want to separate the feature space into a \"grid\", which you can do by multiplying the features together.\n", "\n", "In this week's assignment, you will create interaction terms!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### This is the end of this practice section.\n", "\n", "Please continue on with the lecture videos!\n", "\n", "---" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }