{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Predicting Time Series Data\n", "> If you want to predict patterns from data over time, there are special considerations to take in how you choose and construct your model. This chapter covers how to gain insights into the data before fitting your model, as well as best-practices in using predictive modeling for time series data. This is the Summary of lecture \"Machine Learning for Time Series Data in Python\", via datacamp.\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Time_Series_Analysis, Machine_Learning]\n", "- image: images/price_percentile.png" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "plt.rcParams['figure.figsize'] = (10, 5)\n", "plt.style.use('fivethirtyeight')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predicting data over time\n", "- Correlation and regression\n", " - Regression is similar to calculating correlation, with some key differences\n", " - Regression: A process that results in a formal model of the data\n", " - Correlation: A statistic that describes the data. Less information than regression model\n", "- Correlation between variables often changes over time\n", " - Time series often have patterns that change over time\n", " - Two timeseries that seem correlated at one moment may not remain so over time.\n", "- Scoring regression models\n", " - Two most common methods:\n", " - Correlation ($r$)\n", " - Coefficient of Determination ($R^2$)\n", " - The value of $R^2$ is bounded on the top by 1, and can be infinitely low\n", " - Values closer to 1 mean the model does a better jot of predicting outputs \\\n", " $1 - \\frac{\\text{error}(model)}{\\text{variance}(testdata)}$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | EBAY | \n", "YHOO | \n", "
---|---|---|
date | \n", "\n", " | \n", " |
2010-01-04 | \n", "23.900000 | \n", "17.100000 | \n", "
2010-01-05 | \n", "23.650000 | \n", "17.230000 | \n", "
2010-01-06 | \n", "23.500000 | \n", "17.170000 | \n", "
2010-01-07 | \n", "23.229998 | \n", "16.700001 | \n", "
2010-01-08 | \n", "23.509999 | \n", "16.700001 | \n", "
\n", " | AAPL | \n", "ABT | \n", "AIG | \n", "AMAT | \n", "ARNC | \n", "BAC | \n", "BSX | \n", "C | \n", "CHK | \n", "CMCSA | \n", "... | \n", "QCOM | \n", "RF | \n", "SBUX | \n", "T | \n", "V | \n", "VZ | \n", "WFC | \n", "XOM | \n", "XRX | \n", "YHOO | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
2010-01-04 | \n", "214.009998 | \n", "54.459951 | \n", "29.889999 | \n", "14.30 | \n", "16.650013 | \n", "15.690000 | \n", "9.01 | \n", "3.40 | \n", "28.090001 | \n", "16.969999 | \n", "... | \n", "46.939999 | \n", "5.42 | \n", "23.049999 | \n", "28.580000 | \n", "88.139999 | \n", "33.279869 | \n", "27.320000 | \n", "69.150002 | \n", "8.63 | \n", "17.100000 | \n", "
2010-01-05 | \n", "214.379993 | \n", "54.019953 | \n", "29.330000 | \n", "14.19 | \n", "16.130013 | \n", "16.200001 | \n", "9.04 | \n", "3.53 | \n", "28.970002 | \n", "16.740000 | \n", "... | \n", "48.070000 | \n", "5.60 | \n", "23.590000 | \n", "28.440001 | \n", "87.129997 | \n", "33.339868 | \n", "28.070000 | \n", "69.419998 | \n", "8.64 | \n", "17.230000 | \n", "
2010-01-06 | \n", "210.969995 | \n", "54.319953 | \n", "29.139999 | \n", "14.16 | \n", "16.970013 | \n", "16.389999 | \n", "9.16 | \n", "3.64 | \n", "28.650002 | \n", "16.620001 | \n", "... | \n", "47.599998 | \n", "5.67 | \n", "23.420000 | \n", "27.610001 | \n", "85.959999 | \n", "31.919873 | \n", "28.110001 | \n", "70.019997 | \n", "8.56 | \n", "17.170000 | \n", "
2010-01-07 | \n", "210.580000 | \n", "54.769952 | \n", "28.580000 | \n", "14.01 | \n", "16.610014 | \n", "16.930000 | \n", "9.09 | \n", "3.65 | \n", "28.720002 | \n", "16.969999 | \n", "... | \n", "48.980000 | \n", "6.17 | \n", "23.360001 | \n", "27.299999 | \n", "86.760002 | \n", "31.729875 | \n", "29.129999 | \n", "69.800003 | \n", "8.60 | \n", "16.700001 | \n", "
2010-01-08 | \n", "211.980005 | \n", "55.049952 | \n", "29.340000 | \n", "14.55 | \n", "17.020014 | \n", "16.780001 | \n", "9.00 | \n", "3.59 | \n", "28.910002 | \n", "16.920000 | \n", "... | \n", "49.470001 | \n", "6.18 | \n", "23.280001 | \n", "27.100000 | \n", "87.000000 | \n", "31.749874 | \n", "28.860001 | \n", "69.519997 | \n", "8.57 | \n", "16.700001 | \n", "
5 rows × 50 columns
\n", "