{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", " \n", "## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course \n", "\n", "Author: Mariya Mansurova, Analyst & developer in Yandex.Metrics team. Translated by Ivan Zakharov, ML enthusiast.
This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#
Assignment #9 (demo)\n", "##
Time series analysis\n", "\n", "**Same assignment as a [Kaggle Kernel](https://www.kaggle.com/kashnitsky/a9-demo-time-series-analysis) + [solution](https://www.kaggle.com/kashnitsky/a9-demo-time-series-analysis-solution).**\n", "\n", "**Fill cells marked with \"Your code here\" and submit your answers to the questions through the [web form](https://docs.google.com/forms/d/1UYQ_WYSpsV3VSlZAzhSN_YXmyjV7YlTP8EYMg8M8SoM/edit).**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import warnings\n", "\n", "warnings.filterwarnings(\"ignore\")\n", "import numpy as np\n", "import pandas as pd\n", "import requests\n", "from plotly import __version__\n", "from plotly import graph_objs as go\n", "from plotly.offline import download_plotlyjs, init_notebook_mode, iplot, plot\n", "\n", "print(__version__) # need 1.9.0 or greater\n", "init_notebook_mode(connected=True)\n", "\n", "\n", "def plotly_df(df, title=\"\"):\n", " data = []\n", "\n", " for column in df.columns:\n", " trace = go.Scatter(x=df.index, y=df[column], mode=\"lines\", name=column)\n", " data.append(trace)\n", "\n", " layout = dict(title=title)\n", " fig = dict(data=data, layout=layout)\n", " iplot(fig, show_link=False)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Data preparation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"../../data/wiki_machine_learning.csv\", sep=\" \")\n", "df = df[df[\"count\"] != 0]\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predicting with FB Prophet\n", "We will train at first 5 months and predict the number of trips for June." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.date = pd.to_datetime(df.date)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plotly_df(df.set_index(\"date\")[[\"count\"]])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from fbprophet import Prophet" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictions = 30\n", "\n", "df = df[[\"date\", \"count\"]]\n", "df.columns = [\"ds\", \"y\"]\n", "df.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 1:** What is the prediction of the number of views of the wiki page on January 20? Round to the nearest integer.\n", "\n", "- 4947\n", "- 3426\n", "- 5229\n", "- 2744" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Estimate the quality of the prediction with the last 30 points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 2: What is MAPE equal to?**\n", "\n", "- 34.5\n", "- 42.42\n", "- 5.39\n", "- 65.91\n", "\n", "**Question 3: What is MAE equal to?**\n", "\n", "- 355\n", "- 4007\n", "- 600\n", "- 903" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predicting with ARIMA" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import statsmodels.api as sm\n", "from scipy import stats\n", "\n", "plt.rcParams[\"figure.figsize\"] = (15, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 4: Let's verify the stationarity of the series using the Dickey-Fuller test. Is the series stationary? What is the p-value?**\n", "\n", "- Series is stationary, p_value = 0.107\n", "- Series is not stationary, p_value = 0.107\n", "- Series is stationary, p_value = 0.001\n", "- Series is not stationary, p_value = 0.001" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Next, we turn to the construction of the SARIMAX model (`sm.tsa.statespace.SARIMAX`).
Question 5: What parameters are the best for the model according to the `AIC` criterion?**\n", "\n", "- D = 1, d = 0, Q = 0, q = 2, P = 3, p = 1\n", "- D = 2, d = 1, Q = 1, q = 2, P = 3, p = 1\n", "- D = 1, d = 1, Q = 1, q = 2, P = 3, p = 1\n", "- D = 0, d = 0, Q = 0, q = 2, P = 3, p = 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You code here" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 2 }