{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "id": "a6be4a95", "metadata": { "id": "a6be4a95" }, "source": [ "# Running the First Regression in Python" ] }, { "cell_type": "markdown", "id": "b7890b49", "metadata": { "id": "b7890b49" }, "source": [ "Suppose this is your first time to write the code. Perhaps, you want to run a simple regression using two series of asset prices to fin the equity beta. Let's use a step-by-step approach to complete the task.\n", "\n", " Step 1: Download two assets' prices from the web\n", " Step 2: Put them onto a matrix form\n", " Step 3: Run the OLS\n", " Step 4: Plot data" ] }, { "cell_type": "markdown", "id": "d844db66", "metadata": { "id": "d844db66" }, "source": [ "### Step 1: Download data\n", "We will use yahoo finance package (https://pypi.org/project/yfinance/) to download Yahoo Finance data from the web. We need to (1) install and (2) import this package." ] }, { "cell_type": "code", "execution_count": null, "id": "47c20bb7", "metadata": { "id": "47c20bb7" }, "outputs": [], "source": [ "!pip install yfinance # to install, remove # and run the cell\n", "import yfinance as yf # to import" ] }, { "cell_type": "code", "execution_count": 3, "id": "479eb94a", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "479eb94a", "outputId": "f4bb78f6-0070-43fd-d54a-339ddec9e1c4" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "[*********************100%***********************] 1 of 1 completed\n", "[*********************100%***********************] 1 of 1 completed\n" ] } ], "source": [ "# download\n", "mystock = yf.download(\"TSLA\", start=\"2011-01-01\", end=\"2022-05-31\", interval='1mo')['Adj Close'].rename('TSLA')\n", "index = yf.download(\"SPY\", start=\"2011-01-01\", end=\"2022-05-31\", interval='1mo')['Adj Close'].rename('SPY')" ] }, { "cell_type": "markdown", "id": "9621734b", "metadata": { "id": "9621734b" }, "source": [ "### Step 2: Put two time series onto a matrix\n", "We need pandas module, so let's install and import it. https://pandas.pydata.org/" ] }, { "cell_type": "code", "execution_count": 4, "id": "6b7ec623", "metadata": { "id": "6b7ec623" }, "outputs": [], "source": [ "#!pip install pandas # Actually, you have this alread when you isntalled Anaconda.\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 5, "id": "d16cf519", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 455 }, "id": "d16cf519", "outputId": "de9b7b39-5378-4ccf-ff78-3c9e03266ecd" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " TSLA SPY\n", "Date \n", "2011-02-01 -0.008714 0.034737\n", "2011-03-01 0.161574 -0.004206\n", "2011-04-01 -0.005405 0.033431\n", "2011-05-01 0.092029 -0.011214\n", "2011-06-01 -0.033510 -0.021720\n", "... ... ...\n", "2022-01-01 -0.113609 -0.049413\n", "2022-02-01 -0.070768 -0.029517\n", "2022-03-01 0.238009 0.034377\n", "2022-04-01 -0.191945 -0.084935\n", "2022-05-01 -0.129197 0.002257\n", "\n", "[136 rows x 2 columns]" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TSLASPY
Date
2011-02-01-0.0087140.034737
2011-03-010.161574-0.004206
2011-04-01-0.0054050.033431
2011-05-010.092029-0.011214
2011-06-01-0.033510-0.021720
.........
2022-01-01-0.113609-0.049413
2022-02-01-0.070768-0.029517
2022-03-010.2380090.034377
2022-04-01-0.191945-0.084935
2022-05-01-0.1291970.002257
\n", "

136 rows × 2 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 5 } ], "source": [ "# combine two asset prices onto one matrix called pandas dataframe\n", "data = pd.concat([mystock, index], axis=1)\n", "\n", "# drop missing observations\n", "data2 = data.dropna()\n", "\n", "# compute monthly returns and drop the first observation\n", "data3 = data2.pct_change().dropna()\n", "data3" ] }, { "cell_type": "markdown", "id": "68175734", "metadata": { "id": "68175734" }, "source": [ "### Step 3: Run OLS\n", "We need to install and import statsmodels module. https://www.statsmodels.org/stable/index.html" ] }, { "cell_type": "code", "execution_count": 6, "id": "5dcb73ad", "metadata": { "id": "5dcb73ad" }, "outputs": [], "source": [ "#!pip install statsmodels\n", "import statsmodels.formula.api as smf\n", "import statsmodels.api as sm" ] }, { "cell_type": "code", "execution_count": 7, "id": "ddfd0025", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ddfd0025", "outputId": "d8ce23ce-dfe5-431e-98c9-a294b99f98d5" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: TSLA R-squared: 0.159\n", "Model: OLS Adj. R-squared: 0.153\n", "Method: Least Squares F-statistic: 25.35\n", "Date: Fri, 26 Aug 2022 Prob (F-statistic): 1.51e-06\n", "Time: 06:44:56 Log-Likelihood: 54.624\n", "No. Observations: 136 AIC: -105.2\n", "Df Residuals: 134 BIC: -99.42\n", "Df Model: 1 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0319 0.015 2.198 0.030 0.003 0.061\n", "SPY 1.7553 0.349 5.035 0.000 1.066 2.445\n", "==============================================================================\n", "Omnibus: 43.835 Durbin-Watson: 1.592\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 109.887\n", "Skew: 1.285 Prob(JB): 1.37e-24\n", "Kurtosis: 6.576 Cond. No. 24.9\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n" ] } ], "source": [ "# run OLS\n", "formula = 'TSLA ~ SPY' # set dep var and indep var\n", "results = smf.ols(formula, data3).fit() # run OLS\n", "print(results.summary()) # print " ] }, { "cell_type": "markdown", "id": "6839b273", "metadata": { "id": "6839b273" }, "source": [ "### beta of TSLA = 1.7553" ] }, { "cell_type": "markdown", "id": "a3f1572a", "metadata": { "id": "a3f1572a" }, "source": [ "### Step 4: Plot the result\n", "We need to install and import matplotlib module. https://matplotlib.org/" ] }, { "cell_type": "code", "execution_count": 8, "id": "0b0e60bb", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 426 }, "id": "0b0e60bb", "outputId": "23bc81df-dfdf-4757-c5a6-dbb18ab49359" }, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } } ], "source": [ "#!pip install matplotlib #again, if you installed Anaconda, you have this already.\n", "import matplotlib.pyplot as plt\n", "\n", "fig, ax=plt.subplots(figsize=(10,6))\n", "fig = sm.graphics.plot_partregress_grid(results, fig=fig)" ] }, { "cell_type": "markdown", "id": "f58ce303", "metadata": { "id": "f58ce303" }, "source": [ "### Extra 1: using scipy module, we can get the same beta!" ] }, { "cell_type": "code", "execution_count": 9, "id": "69ce9117", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "69ce9117", "outputId": "f92795cd-cf90-41d9-e6e9-881e0dcf186a" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "1.7553\n", "0.0319\n", "0.4\n", "0.0\n" ] } ], "source": [ "#!pip install scipy\n", "from scipy import stats\n", "\n", "beta,alpha,r_value,p_value,std_err = stats.linregress(data3['SPY'],data3[\"TSLA\"])\n", "\n", "print(beta.round(4))\n", "print(alpha.round(4))\n", "print(r_value.round(2))\n", "print(p_value.round(4))" ] }, { "cell_type": "markdown", "id": "b192016a", "metadata": { "id": "b192016a" }, "source": [ "### Extra 2: using a beta formula, we can get the same beta.\n", "\n", "#$$ \n", "\\beta_{tsla} = \\frac{\\sigma_{tsla,spy}}{\\sigma_{spy}^2}\n", "$$\n" ] }, { "cell_type": "code", "execution_count": 38, "id": "642827eb", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "642827eb", "outputId": "6a698632-0057-4813-9059-2ebaaa0babc0" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ " TSLA SPY\n", "TSLA 0.376962 0.034166\n", "SPY 0.034166 0.019465\n", "\n", "\n", "1.7553\n" ] } ], "source": [ "# find covariance matrix\n", "cov = data3.cov() * 12\n", "print(cov)\n", "print('\\n') # to give a space \n", "print(round(cov.iloc[0,1]/cov.iloc[1,1], 4))" ] }, { "cell_type": "markdown", "id": "e3ff4c65", "metadata": { "id": "e3ff4c65" }, "source": [ "### Extra3: using linear algebra, we can get the same beta.\n", "Need to install numpy and import it. You probably have this alreay. So skip installation. Just import it. https://numpy.org/" ] }, { "cell_type": "code", "source": [ "# warnings are annoying, so I include below to supress them. You do not need to do this.\n", "import warnings\n", "warnings.simplefilter(action='ignore', category=FutureWarning)" ], "metadata": { "id": "ApyNN1N27OM9" }, "id": "ApyNN1N27OM9", "execution_count": 32, "outputs": [] }, { "cell_type": "code", "execution_count": 34, "id": "c7903709", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "c7903709", "outputId": "1991f267-a746-4de1-b615-78ce415c2cd7" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "1.7553\n" ] } ], "source": [ "import numpy as np\n", "\n", "X = data3['SPY']\n", "y = data3['TSLA']\n", "X_ols = sm.add_constant(X) # add a constant vector \n", "#print(X_ols)\n", "\n", "# compute beta using matrix operation\n", "beta = np.linalg.inv(X_ols.T.dot(X_ols)).dot(X_ols.T.dot(y))\n", "print(round(beta[1], 4))" ] }, { "cell_type": "code", "source": [], "metadata": { "id": "joV__6ek6nwX" }, "id": "joV__6ek6nwX", "execution_count": null, "outputs": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "colab": { "name": "First_Regression.ipynb", "provenance": [], "collapsed_sections": [], "include_colab_link": true } }, "nbformat": 4, "nbformat_minor": 5 }