{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "4fba706f-796a-408b-a294-d640c4821d99", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scipy.optimize import curve_fit" ] }, { "cell_type": "markdown", "id": "2b327e19-d1c1-4d83-94b9-873dc2517188", "metadata": {}, "source": [ "# Linear VS Nonlinear Model " ] }, { "cell_type": "markdown", "id": "3cd25926-6806-48fb-a745-748df2152162", "metadata": {}, "source": [ "Take a look at these models\n", "$$\n", "\\begin{aligned}\n", "&Y_{i}=\\beta_{1}+\\beta_{2}\\left(\\frac{1}{X_{i}}\\right)+u_{i}\\\\\n", "&Y_{i}=\\beta_{1}+\\beta_{2} \\ln X_{i}+u_{i}\\\\\n", "&\\text { In } Y_{i}=\\beta_{1}+\\beta_{2} X_{i}+u_{i}\\\\\n", "&\\ln Y_{i}=\\ln \\beta_{1}+\\beta_{2} \\ln X_{i}+u_{i}\\\\\n", "&\\ln Y_{i}=\\beta_{1}-\\beta_{2}\\left(\\frac{1}{X_{i}}\\right)+u_{i}\n", "\\end{aligned}\n", "$$\n", "The variables might have some nonlinear form, but parameters are all linear (the $4$th model can denote $\\alpha_1=\\ln{\\beta_1}$), as long as we can convert them into linear form with some mathematical manipulation, we call them **intrinsically linear models**. \n" ] }, { "cell_type": "markdown", "id": "e4027fde-8158-4037-a41e-c4f2c560d592", "metadata": {}, "source": [ "How about these two models?\n", "\\begin{aligned}\n", "&Y_{i}=e^{\\beta_{1}+\\beta_{2} X_{i}+u_{i}} \\\\\n", "&Y_{i}=\\frac{1}{1+e^{\\beta_{1}+\\beta_{2} X_{i}+u_{i}}} \\\\\n", "\\end{aligned}\n", "The first one can be easily converted into linear one by taking natural log\n", "$$\n", "\\ln{Y_i}=\\beta_{1}+\\beta_{2} X_{i}+u_{i}\n", "$$\n", "The second one is bit tricky, we will deal with it in more details in chapter of binary choice model. But you can be assured that with a little manipulation the model becomes\n", "$$\n", "\\ln \\left(\\frac{1-Y_{i}}{Y_{i}}\\right)=\\beta_{1}+\\beta_{2} X_{i}+u_{i}\n", "$$\n", "which is also intrinsically linear." ] }, { "cell_type": "markdown", "id": "5df6706e-a6da-48aa-8628-32ae4604bf2f", "metadata": {}, "source": [ "These two models are **intrinsically nonlinear model**, there is no way to turn them into linear form.\n", "\\begin{aligned}\n", "&Y_{i}=\\beta_{1}+\\left(0.75-\\beta_{1}\\right) e^{-\\beta_{2}\\left(X_{i}-2\\right)}+u_{i} \\\\\n", "&Y_{i}=\\beta_{1}+\\beta_{2}^{3} X_{i}+u_{i}\\\\\n", "\\end{aligned}" ] }, { "cell_type": "markdown", "id": "63fe3d16-e606-416c-b514-c3e0a987a8bc", "metadata": {}, "source": [ "Can we transform Cobb-Douglas model into linear form? The first one can, by taking natural log. But the second one has an additive disturbance term, which make it intrinsically nonlinear.\n", "\\begin{aligned}\n", "&Y_{i}=\\beta_{1} X_{2 i}^{\\beta_{2}} X_{3 i}^{\\beta_{3}} u_{i}\\\\\n", "&Y_{i}=\\beta_{1} X_{2 i}^{\\beta_{2}} X_{3 i}^{\\beta_{3}}+ u_{i}\\\\\n", "\\end{aligned}" ] }, { "cell_type": "markdown", "id": "2e0ee933-0309-49d7-8816-f8209fc02701", "metadata": {}, "source": [ "Here is another famous economic model, _constant elasticity of substitution_ (CES) production function.\n", "$$\n", "Y_{i}=A\\left[\\delta K_{i}^{-\\beta}+(1-\\delta) L_{i}^{-\\beta}\\right]^{-1 / \\beta}u_i\n", "$$\n", "No matter what you do with it, it can't be transformed into linear form, thus it is intrinsically nonlinear" ] }, { "cell_type": "markdown", "id": "10de1167-ae4b-4809-860b-6bdbaa85ea74", "metadata": {}, "source": [ "# OLS On A Nonlinear Model " ] }, { "cell_type": "markdown", "id": "0a57e6a5-d680-45c8-9b0a-9ad0bb62d888", "metadata": {}, "source": [ "Consider an intrinsically nonlinear model\n", "$$\n", "Y_{i}=\\beta_{1} e^{\\beta_{2}X_{i}}+u_{i}\n", "$$\n", "Use the OLS algorithm that minimize $RSS$\n", "$$\n", "\\begin{gathered}\n", "\\sum_{i=0}^n u_{i}^{2}=\\sum_{i=0}^n\\left(Y_{i}-\\beta_{1} e^{\\beta_{2} X_{i}}\\right)^{2}\n", "\\end{gathered}\n", "$$\n", "Take partial derivative with respect to both $\\beta_1$ and $\\beta_2$, the first order conditions are\n", "$$\n", "\\begin{gathered}\n", "\\frac{\\partial \\sum_{i=0}^n u_{i}^{2}}{\\partial \\beta_{1}}=2 \\sum_{i=0}^n\\left(Y_{i}-\\beta_{1} e^{\\beta_{2} X_{i}}\\right)\\left(-1 e^{\\beta_{2} X_{i}}\\right) =0\\\\\n", "\\frac{\\partial \\sum_{i=0}^n u_{i}^{2}}{\\partial \\beta_{2}}=2 \\sum_{i=0}^n\\left(Y_{i}-\\beta_{1} e^{\\beta_{2} X_{i}}\\right)\\left(-\\beta_{1} e^{\\beta_{2} X_{i}} X_{i}\\right)=0\n", "\\end{gathered}\n", "$$" ] }, { "cell_type": "markdown", "id": "59ca3c1e-af9f-43a1-94da-6bb1a8fe7a2e", "metadata": {}, "source": [ "Collecting terms and denote the estimated coefficients as $b_1$ and $b_2$\n", "$$\n", "\\begin{aligned}\n", "\\sum_{i=0}^n Y_{i} e^{b_{2} X_{i}} &=b_{1} e^{2 {b}_{2} X_{i}} \\\\\n", "\\sum_{i=0}^n Y_{i} X_{i} e^{b_{2} X_{i}} &={b}_{1} \\sum_{i=0}^n X_{i} e^{2 {b}_{2} X_{i}}\n", "\\end{aligned}\n", "$$" ] }, { "cell_type": "markdown", "id": "43c087fa-061e-47f2-b50b-08eac64cf9d4", "metadata": {}, "source": [ "These are solutions, but not **closed-form solution**, i.e. solve by plugging in data. So even if you have these formula, we can't input in Python, because unknowns are expressed in terms of unknowns." ] }, { "cell_type": "markdown", "id": "adb925ae-0c0e-46c1-9aeb-ec9e6db05ca5", "metadata": {}, "source": [ "# Gauss-Newton Iterative Method " ] }, { "cell_type": "markdown", "id": "db723ebb-cedc-4ff4-9c49-4a8962547edf", "metadata": {}, "source": [ "We will not talk about details of this algorithm, it only confuses you more than clarification. But this **Gauss-Newton Iterative Method** is kind of trial and error method that gradually approaching the optimized coefficients. It feeds the $RSS$ formula with parameters, record the result, then try another set of parameters, if $RSS$ gets smaller, the algorithm keeps feed parameters until the $RSS$ have no significant improvement." ] }, { "cell_type": "markdown", "id": "141a694a-b3fb-4b92-bd01-27153d2ffabd", "metadata": {}, "source": [ "Define the function\n", "$$\n", "Y_{i}=\\beta_{1} e^{\\beta_{2}X_{i}}\n", "$$" ] }, { "cell_type": "code", "execution_count": 2, "id": "b229a5bc-9e36-4d23-b7c3-b66f9ff0cac5", "metadata": {}, "outputs": [], "source": [ "def exp_func(x, beta1, beta2):\n", " return beta1 * np.exp(beta2 * x)" ] }, { "cell_type": "markdown", "id": "c58d2d63-aeb7-4eec-b96d-94b81b3df151", "metadata": {}, "source": [ "Simulate data $Y$ then estimate the parameters with ```curve_fit``` function" ] }, { "cell_type": "code", "execution_count": 3, "id": "a6f1f5b7-9716-4fe3-8727-a2110ee6cc0f", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "xdata = np.linspace(0, 1, 50)\n", "y = exp_func(xdata, 2, 3)\n", "\n", "y_noise = 5 * np.random.randn(len(y))\n", "ydata = y + y_noise\n", "fig, ax = plt.subplots(figsize=(12, 8))\n", "ax.scatter(xdata, ydata, label=\"data\", color=\"purple\")\n", "popt, pcov = curve_fit(exp_func, xdata, ydata)\n", "ax.plot(xdata, exp_func(xdata, popt[0], popt[1]), lw=3, color=\"tomato\")\n", "ax.set_title(r\"$Y_{i}=\\beta_{1} e^{\\beta_{2}X_{i}}$\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "1eb8bb44-08ea-4a52-8baa-1c4b13602fc6", "metadata": {}, "source": [ "Given the fact that this is elementary course on econometrics, we will not go any deeper in this topic. In Advanced Econometrics, we will have a very extensive discussion of nonlinear regression." ] }, { "cell_type": "markdown", "id": "2e705046-4f1d-45c9-bb9d-46d52c2574ca", "metadata": {}, "source": [ "# Shanghai Covid " ] }, { "cell_type": "code", "execution_count": 62, "id": "f5eaf3dc-2bc3-40cc-9e40-e8693c0cf08a", "metadata": {}, "outputs": [], "source": [ "df_shcovid = pd.read_excel(\"Shanghai Covid.xlsx\")" ] }, { "cell_type": "code", "execution_count": 63, "id": "636c604b-9656-4dd0-a845-d668f90d7260", "metadata": {}, "outputs": [], "source": [ "df_shcovid.columns = [\"Date\", \"Cases\"]\n", "df_shcovid = df_shcovid.dropna()" ] }, { "cell_type": "markdown", "id": "6c3cace8-ec00-47bc-84ef-110fe5452989", "metadata": {}, "source": [ "Define the function\n", "$$\n", "Y_{i}= \\beta_1 e^{\\beta_{2}X_{i}}\n", "$$" ] }, { "cell_type": "markdown", "id": "efeea1ba-6199-48c1-94dd-a3445a8bf6ad", "metadata": {}, "source": [ "Take log on both sides. \n", "$$\n", "\\ln{Y_i}=\\ln{\\beta_1}+\\beta_2 X_i \n", "$$" ] }, { "cell_type": "code", "execution_count": 80, "id": "0c8e47ec-3d58-480c-a3bf-7975c6cbb302", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: Cases R-squared: 0.959\n", "Model: OLS Adj. R-squared: 0.957\n", "Method: Least Squares F-statistic: 463.3\n", "Date: Thu, 07 Apr 2022 Prob (F-statistic): 2.65e-15\n", "Time: 15:01:54 Log-Likelihood: -3.9519\n", "No. Observations: 22 AIC: 11.90\n", "Df Residuals: 20 BIC: 14.09\n", "Df Model: 1 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const 5.4038 0.125 43.157 0.000 5.143 5.665\n", "x1 0.2197 0.010 21.524 0.000 0.198 0.241\n", "==============================================================================\n", "Omnibus: 3.277 Durbin-Watson: 0.593\n", "Prob(Omnibus): 0.194 Jarque-Bera (JB): 1.649\n", "Skew: -0.355 Prob(JB): 0.438\n", "Kurtosis: 1.862 Cond. No. 23.8\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n" ] } ], "source": [ "logY = np.log(df_shcovid[\"Cases\"])\n", "X = np.arange(len(Y))\n", "\n", "X = sm.add_constant(X)\n", "model = sm.OLS(Y, X).fit()\n", "\n", "print_model = model.summary()\n", "print(print_model)" ] }, { "cell_type": "code", "execution_count": 90, "id": "5eb93636-1fe2-4e11-8838-4f72e4465d76", "metadata": {}, "outputs": [], "source": [ "beta_1 = np.exp(model.params[0])\n", "beta_2 = model.params[1]\n", "\n", "Y = beta_1 * np.exp(beta_2 * X[:, 1])" ] }, { "cell_type": "code", "execution_count": 94, "id": "7d4e08db-63de-4ec6-ae73-e0c5429aab65", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(figsize=(12, 5), nrows=1, ncols=2)\n", "fig.autofmt_xdate(rotation=45)\n", "ax[0].scatter(df_shcovid[\"Date\"], df_shcovid[\"Cases\"])\n", "ax[0].set_ylabel(\"No-Symptom Case\")\n", "ax[1].plot(df_shcovid[\"Date\"], model.fittedvalues)\n", "ax[0].plot(df_shcovid[\"Date\"], Y)\n", "ax[1].set_ylabel(\"Log No-Symptom Case\")\n", "ax[1].scatter(df_shcovid[\"Date\"], ln_case)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "8e2543b8-b4b8-4b86-8e38-9913e4385af6", "metadata": {}, "source": [ "$$\n", "Y_{i}= 5.4 e^{0.21X_{i}}\n", "$$" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 5 }