{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multi-variable Linear Regression\n",
    "> In this post, it will cover the basic concept of Multi-variable Linear Regression. Unlike Simple Linear Regression, Multi-variable Linear Regression have several dependent variables, so its hypothesis is different from we saw in previous posts.\n",
    "\n",
    "- toc: true \n",
    "- badges: true\n",
    "- comments: true\n",
    "- author: Chanseok Kang\n",
    "- categories: [Python, Tensorflow, Machine_Learning]\n",
    "- image: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plt.rcParams['figure.figsize'] = (10, 8)\n",
    "plt.style.use('seaborn')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hypothesis\n",
    "\n",
    "In the simple Linear Regression, we expressed the hypothesis like this,\n",
    "\n",
    "$$ H(x) = W x + b $$\n",
    "\n",
    "But, most of real-world problem is related on various variables. For the case with 3 variables, hypothesis can be expanded with 3-variables,\n",
    "\n",
    "$$ H(x_1, x_2, x_3) = w_1 x_1 + w_2 x_2 + w_3 x_3 +b $$\n",
    "\n",
    "And also, cost function for this hypothesis will be different from the one we saw in previous post.\n",
    "\n",
    "$$ \\text{cost}(W, b) = \\frac{1}{m} \\sum_{i=1}^{m} (H(x_1, x_2, x_3) - y_i)^2 $$\n",
    "\n",
    "## Multi-variables\n",
    "\n",
    "In general, if we have $n$ variables for regression, the hypothesis will be,\n",
    "\n",
    "$$ H(x_1, x_2, x_3, \\dots, x_n) = w_1 x_1 + w_2 x_2 + w_3 x_3 + \\dots + w_n x_n + b $$\n",
    "\n",
    "If we express with mathematical form, it is hard to display it in one line. Instead, we can express it with matrix multiplication form. Suppose $X$ is the vector of $x$, and $W$ is the vector of $w$, hypothesis with matrix form will be like this,\n",
    "\n",
    "$$ H(X) =  W \\cdot X = \\begin{bmatrix} w_1 & w_2 & \\dots & w_n \\end{bmatrix} \\cdot \\begin{bmatrix} x_1 \\\\ x_2 \\\\ \\dots \\\\ x_n \\end{bmatrix} = w_1 x_1 + w_2 x_2 + \\dots + w_n x_n $$\n",
    "\n",
    "Or we can reverse the order of $W$ and $X$, (same notation)\n",
    "\n",
    "$$ H(X) =  X \\cdot W = \\begin{bmatrix} x_1 & x_2 & \\dots & x_n \\end{bmatrix} \\cdot \\begin{bmatrix} w_1 \\\\ w_2 \\\\ \\dots \\\\ w_n \\end{bmatrix} = w_1 x_1 + w_2 x_2 + \\dots + w_n x_n $$\n",
    "\n",
    "> Note: We omit the bias term($b$) for simplicity. If you really want to express the hypothesis with bias, just increase the shape by 1 and adding bias term,\n",
    "\n",
    "$$ H(X) =  X \\cdot W = \\begin{bmatrix} x_1 & x_2 & \\dots & x_n & 1 \\end{bmatrix} \\cdot \\begin{bmatrix} w_1 \\\\ w_2 \\\\ \\dots \\\\ w_n \\\\ b \\end{bmatrix} = w_1 x_1 + w_2 x_2 + \\dots + w_n x_n + b $$\n",
    "\n",
    "The advantage of matrix multiplication is parallelization. In the previous example, we just express the formula with just one row of $X$. What if $X$ has lots of rows? It can also expand from previous formula.\n",
    "\n",
    "$$ H(X) =  X \\cdot W = \\begin{bmatrix} x_{11} & x_{12} & \\dots & x_{1n} \\\\ x_{21} & x_{22} & \\dots & x_{2n} \\end{bmatrix} \\cdot \\begin{bmatrix} w_1 \\\\ w_2 \\\\ \\dots \\\\ w_n \\end{bmatrix} = \\begin{bmatrix} w_1 x_{11} + w_2 x_{12} + \\dots + w_n x_{1n} \\\\ w_1 x_{21} + w_2 x_{22} + \\dots + w_n x_{2n} \\end{bmatrix}$$\n",
    "\n",
    "Also, we can expand the dimension of weight term, meaning that there is another layer of weight vector. With matrix multiplication, we don't need to expand it manually, and usually GPU (short for Graphic Processing Unit) has an advantage to calculate the matrix multiplication thanks to its architecture."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multi-variable Linear Regression in Tensorflow\n",
    "Suppose we have three dependent variable. Then, the hypothesis will be\n",
    "\n",
    "$$ H(x_1, x_2, x_3) = w_1 x_1 + w_2 x_2 + w_3 x_3 $$\n",
    "\n",
    "We have datasets and initialize the weights, and define the hypothesis in tensorflow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Data\n",
    "x1 = [73., 93., 89., 96., 73.]\n",
    "x2 = [80., 88., 91., 98., 66.]\n",
    "x3 = [75., 93., 90., 100., 70.]\n",
    "y = [152., 185., 180., 196., 142.]\n",
    "\n",
    "# random weights\n",
    "w1 = tf.Variable(tf.random.normal([1]))\n",
    "w2 = tf.Variable(tf.random.normal([1]))\n",
    "w3 = tf.Variable(tf.random.normal([1]))\n",
    "b = tf.Variable(tf.random.normal([1]))\n",
    "\n",
    "# Hypothesis\n",
    "h = w1 * x1 + w2 * x2 + w3 * x3 + b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can build the training process with Gradient Descent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch:     0 | cost:    7312.8384\n",
      "epoch:   100 | cost:      35.9158\n",
      "epoch:   200 | cost:      34.0776\n",
      "epoch:   300 | cost:      32.3360\n",
      "epoch:   400 | cost:      30.6861\n",
      "epoch:   500 | cost:      29.1231\n",
      "epoch:   600 | cost:      27.6422\n",
      "epoch:   700 | cost:      26.2393\n",
      "epoch:   800 | cost:      24.9103\n",
      "epoch:   900 | cost:      23.6510\n"
     ]
    }
   ],
   "source": [
    "learning_rate = 0.00001\n",
    "\n",
    "for e in range(1000):\n",
    "    # Record the gradient history of the cost function\n",
    "    with tf.GradientTape() as tape:\n",
    "        h = w1 * x1 + w2 * x2 + w3 * x3 + b\n",
    "        cost = tf.reduce_mean(tf.square(h - y))\n",
    "    \n",
    "    # Calculate the gradient of each weight\n",
    "    w1_grad, w2_grad, w3_grad, b_grad = tape.gradient(cost, [w1, w2, w3, b])\n",
    "    \n",
    "    # update the weight\n",
    "    w1.assign_sub(learning_rate * w1_grad)\n",
    "    w2.assign_sub(learning_rate * w2_grad)\n",
    "    w3.assign_sub(learning_rate * w3_grad)\n",
    "    b.assign_sub(learning_rate * b_grad)\n",
    "    \n",
    "    if e % 100 == 0:\n",
    "        print(\"epoch: {:5} | cost: {:12.4f}\".format(e, cost.numpy()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can express it with matrix multiplication form. To do this, the dataset is merged with one numpy array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = np.array([\n",
    "    [73., 80., 75., 152.],\n",
    "    [93., 88., 93., 185.],\n",
    "    [89., 91., 90., 180.],\n",
    "    [96., 98., 100., 196.],\n",
    "    [73., 66., 70., 142. ]\n",
    "], dtype=np.float32)\n",
    "\n",
    "X = data[:, :-1]\n",
    "y = data[:, [-1]]\n",
    "\n",
    "W = tf.Variable(tf.random.normal(shape=[X.shape[1], 1]))\n",
    "b = tf.Variable(tf.random.normal(shape=[1]))\n",
    "\n",
    "# Replace hypothesis with predict function\n",
    "def predict(X):\n",
    "    return tf.matmul(X, W) + b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Note: it may be confused to make y from slicing with `[1]`. Because it must maintain the matrix form, we generate the y like that. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(5,)"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[:, -1].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(5, 1)"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[:, [-1]].shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Same learning process with gradient descent is applied,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch:     0 | cost:  103242.9219\n",
      "epoch:   100 | cost:       1.8692\n",
      "epoch:   200 | cost:       1.8591\n",
      "epoch:   300 | cost:       1.8491\n",
      "epoch:   400 | cost:       1.8392\n",
      "epoch:   500 | cost:       1.8294\n",
      "epoch:   600 | cost:       1.8198\n",
      "epoch:   700 | cost:       1.8103\n",
      "epoch:   800 | cost:       1.8009\n",
      "epoch:   900 | cost:       1.7917\n",
      "epoch:  1000 | cost:       1.7825\n",
      "epoch:  1100 | cost:       1.7735\n",
      "epoch:  1200 | cost:       1.7645\n",
      "epoch:  1300 | cost:       1.7557\n",
      "epoch:  1400 | cost:       1.7469\n",
      "epoch:  1500 | cost:       1.7382\n",
      "epoch:  1600 | cost:       1.7296\n",
      "epoch:  1700 | cost:       1.7211\n",
      "epoch:  1800 | cost:       1.7127\n",
      "epoch:  1900 | cost:       1.7044\n"
     ]
    }
   ],
   "source": [
    "learning_rate = 0.00001\n",
    "\n",
    "for e in range(2000):\n",
    "    # Record the gradient history of the cost function\n",
    "    with tf.GradientTape() as tape:\n",
    "        cost = tf.reduce_mean(tf.square(predict(X) - y))\n",
    "    \n",
    "    # Calculate the gradient of each weight\n",
    "    W_grad, b_grad = tape.gradient(cost, [W, b])\n",
    "    \n",
    "    # update the weight\n",
    "    W.assign_sub(learning_rate * W_grad)\n",
    "    b.assign_sub(learning_rate * b_grad)\n",
    "    \n",
    "    if e % 100 == 0:\n",
    "        print(\"epoch: {:5} | cost: {:12.4f}\".format(e, cost.numpy()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see from the result, cost is decreased significantly while 100 epoch are passed. And you can also notice the advantage from matrix multiplication that we don't need to define weight vector manually. ($w_1, w_2, w_3 \\to W$)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "In this post, we expand the linear regression from single variable to multi variables. And using matrix multiplication notation, it helps to operate gradient descent effectively."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}