\n",
"\n",
" \n",
"## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course \n",
"\n",
"Author: Mariya Mansurova, Analyst & developer in Yandex.Metrics team. Translated by Ivan Zakharov, ML enthusiast. This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#
Assignment #9 (demo)\n",
"##
Time series analysis\n",
"\n",
"**Same assignment as a [Kaggle Kernel](https://www.kaggle.com/kashnitsky/a9-demo-time-series-analysis) + [solution](https://www.kaggle.com/kashnitsky/a9-demo-time-series-analysis-solution).**\n",
"\n",
"**Fill cells marked with \"Your code here\" and submit your answers to the questions through the [web form](https://docs.google.com/forms/d/1UYQ_WYSpsV3VSlZAzhSN_YXmyjV7YlTP8EYMg8M8SoM/edit).**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\")\n",
"import numpy as np\n",
"import pandas as pd\n",
"import requests\n",
"from plotly import __version__\n",
"from plotly import graph_objs as go\n",
"from plotly.offline import download_plotlyjs, init_notebook_mode, iplot, plot\n",
"\n",
"print(__version__) # need 1.9.0 or greater\n",
"init_notebook_mode(connected=True)\n",
"\n",
"\n",
"def plotly_df(df, title=\"\"):\n",
" data = []\n",
"\n",
" for column in df.columns:\n",
" trace = go.Scatter(x=df.index, y=df[column], mode=\"lines\", name=column)\n",
" data.append(trace)\n",
"\n",
" layout = dict(title=title)\n",
" fig = dict(data=data, layout=layout)\n",
" iplot(fig, show_link=False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Data preparation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"../../data/wiki_machine_learning.csv\", sep=\" \")\n",
"df = df[df[\"count\"] != 0]\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Predicting with FB Prophet\n",
"We will train at first 5 months and predict the number of trips for June."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.date = pd.to_datetime(df.date)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plotly_df(df.set_index(\"date\")[[\"count\"]])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fbprophet import Prophet"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"predictions = 30\n",
"\n",
"df = df[[\"date\", \"count\"]]\n",
"df.columns = [\"ds\", \"y\"]\n",
"df.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Question 1:** What is the prediction of the number of views of the wiki page on January 20? Round to the nearest integer.\n",
"\n",
"- 4947\n",
"- 3426\n",
"- 5229\n",
"- 2744"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Estimate the quality of the prediction with the last 30 points."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Question 2: What is MAPE equal to?**\n",
"\n",
"- 34.5\n",
"- 42.42\n",
"- 5.39\n",
"- 65.91\n",
"\n",
"**Question 3: What is MAE equal to?**\n",
"\n",
"- 355\n",
"- 4007\n",
"- 600\n",
"- 903"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Predicting with ARIMA"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import statsmodels.api as sm\n",
"from scipy import stats\n",
"\n",
"plt.rcParams[\"figure.figsize\"] = (15, 10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Question 4: Let's verify the stationarity of the series using the Dickey-Fuller test. Is the series stationary? What is the p-value?**\n",
"\n",
"- Series is stationary, p_value = 0.107\n",
"- Series is not stationary, p_value = 0.107\n",
"- Series is stationary, p_value = 0.001\n",
"- Series is not stationary, p_value = 0.001"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Next, we turn to the construction of the SARIMAX model (`sm.tsa.statespace.SARIMAX`). Question 5: What parameters are the best for the model according to the `AIC` criterion?**\n",
"\n",
"- D = 1, d = 0, Q = 0, q = 2, P = 3, p = 1\n",
"- D = 2, d = 1, Q = 1, q = 2, P = 3, p = 1\n",
"- D = 1, d = 1, Q = 1, q = 2, P = 3, p = 1\n",
"- D = 0, d = 0, Q = 0, q = 2, P = 3, p = 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}