{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "<img src=\"../../img/ods_stickers.jpg\" />\n",
    "    \n",
    "## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course \n",
    "\n",
    "Author: Mariya Mansurova, Analyst & developer in Yandex.Metrics team. Translated by Ivan Zakharov, ML enthusiast. <br>This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <center> Assignment #9 (demo)\n",
    "## <center> Time series analysis\n",
    "\n",
    "**Same assignment as a [Kaggle Kernel](https://www.kaggle.com/kashnitsky/a9-demo-time-series-analysis) + [solution](https://www.kaggle.com/kashnitsky/a9-demo-time-series-analysis-solution).**\n",
    "\n",
    "**Fill cells marked with \"Your code here\" and submit your answers to the questions through the [web form](https://docs.google.com/forms/d/1UYQ_WYSpsV3VSlZAzhSN_YXmyjV7YlTP8EYMg8M8SoM/edit).**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import warnings\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import requests\n",
    "from plotly import __version__\n",
    "from plotly import graph_objs as go\n",
    "from plotly.offline import download_plotlyjs, init_notebook_mode, iplot, plot\n",
    "\n",
    "print(__version__)  # need 1.9.0 or greater\n",
    "init_notebook_mode(connected=True)\n",
    "\n",
    "\n",
    "def plotly_df(df, title=\"\"):\n",
    "    data = []\n",
    "\n",
    "    for column in df.columns:\n",
    "        trace = go.Scatter(x=df.index, y=df[column], mode=\"lines\", name=column)\n",
    "        data.append(trace)\n",
    "\n",
    "    layout = dict(title=title)\n",
    "    fig = dict(data=data, layout=layout)\n",
    "    iplot(fig, show_link=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Data preparation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"../../data/wiki_machine_learning.csv\", sep=\" \")\n",
    "df = df[df[\"count\"] != 0]\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Predicting with FB Prophet\n",
    "We will train at first 5 months and predict the number of trips for June."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.date = pd.to_datetime(df.date)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plotly_df(df.set_index(\"date\")[[\"count\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fbprophet import Prophet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictions = 30\n",
    "\n",
    "df = df[[\"date\", \"count\"]]\n",
    "df.columns = [\"ds\", \"y\"]\n",
    "df.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**<font color='red'>Question 1:</font>** What is the prediction of the number of views of the wiki page on January 20? Round to the nearest integer.\n",
    "\n",
    "- 4947\n",
    "- 3426\n",
    "- 5229\n",
    "- 2744"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Estimate the quality of the prediction with the last 30 points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**<font color='red'>Question 2:</font> What is MAPE equal to?**\n",
    "\n",
    "- 34.5\n",
    "- 42.42\n",
    "- 5.39\n",
    "- 65.91\n",
    "\n",
    "**<font color='red'>Question 3:</font> What is MAE equal to?**\n",
    "\n",
    "- 355\n",
    "- 4007\n",
    "- 600\n",
    "- 903"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Predicting with ARIMA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import statsmodels.api as sm\n",
    "from scipy import stats\n",
    "\n",
    "plt.rcParams[\"figure.figsize\"] = (15, 10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**<font color='red'>Question 4:</font> Let's verify the stationarity of the series using the Dickey-Fuller test. Is the series stationary? What is the p-value?**\n",
    "\n",
    "- Series is stationary, p_value = 0.107\n",
    "- Series is not stationary, p_value = 0.107\n",
    "- Series is stationary, p_value = 0.001\n",
    "- Series is not stationary, p_value = 0.001"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Next, we turn to the construction of the SARIMAX model (`sm.tsa.statespace.SARIMAX`).<br> <font color='red'>Question 5:</font> What parameters are the best for the model according to the `AIC` criterion?**\n",
    "\n",
    "- D = 1, d = 0, Q = 0, q = 2, P = 3, p = 1\n",
    "- D = 2, d = 1, Q = 1, q = 2, P = 3, p = 1\n",
    "- D = 1, d = 1, Q = 1, q = 2, P = 3, p = 1\n",
    "- D = 0, d = 0, Q = 0, q = 2, P = 3, p = 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You code here"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}