{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# About this Notebook\n", "\n", "Bayesian temporal matrix factorization is a type of Bayesian matrix factorization that achieves state-of-the-art results on challenging imputation and prediction problems. In the following, we will discuss:\n", "\n", "- What the proposed Bayesian temporal matrix factorization (BTMF for short) is?\n", "\n", "- How to implement BTMF mainly using Python Numpy with high efficiency?\n", "\n", "- How to develop a spatiotemporal prediction model by adapting BTMF?\n", "\n", "- How to make predictions with real-world spatiotemporal datasets?\n", "\n", "If you want to understand what is BTMF and its modeling tricks in detail, our paper is for you:\n", "\n", "> Xinyu Chen, Lijun Sun (2019). **Bayesian temporal factorization for multidimensional time series prediction**.\n", "\n", "## Quick Run\n", "\n", "This notebook is publicly available for any usage at our data imputation project. Please click [**transdim**](https://github.com/xinychen/transdim).\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from numpy.random import multivariate_normal as mvnrnd\n", "from scipy.stats import wishart\n", "from numpy.linalg import inv as inv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 1: Matrix Computation Concepts\n", "\n", "## 1) Kronecker product\n", "\n", "- **Definition**:\n", "\n", "Given two matrices $A\\in\\mathbb{R}^{m_1\\times n_1}$ and $B\\in\\mathbb{R}^{m_2\\times n_2}$, then, the **Kronecker product** between these two matrices is defined as\n", "\n", "$$A\\otimes B=\\left[ \\begin{array}{cccc} a_{11}B & a_{12}B & \\cdots & a_{1m_2}B \\\\ a_{21}B & a_{22}B & \\cdots & a_{2m_2}B \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ a_{m_11}B & a_{m_12}B & \\cdots & a_{m_1m_2}B \\\\ \\end{array} \\right]$$\n", "where the symbol $\\otimes$ denotes Kronecker product, and the size of resulted $A\\otimes B$ is $(m_1m_2)\\times (n_1n_2)$ (i.e., $m_1\\times m_2$ columns and $n_1\\times n_2$ rows).\n", "\n", "- **Example**:\n", "\n", "If $A=\\left[ \\begin{array}{cc} 1 & 2 \\\\ 3 & 4 \\\\ \\end{array} \\right]$ and $B=\\left[ \\begin{array}{ccc} 5 & 6 & 7\\\\ 8 & 9 & 10 \\\\ \\end{array} \\right]$, then, we have\n", "\n", "$$A\\otimes B=\\left[ \\begin{array}{cc} 1\\times \\left[ \\begin{array}{ccc} 5 & 6 & 7\\\\ 8 & 9 & 10\\\\ \\end{array} \\right] & 2\\times \\left[ \\begin{array}{ccc} 5 & 6 & 7\\\\ 8 & 9 & 10\\\\ \\end{array} \\right] \\\\ 3\\times \\left[ \\begin{array}{ccc} 5 & 6 & 7\\\\ 8 & 9 & 10\\\\ \\end{array} \\right] & 4\\times \\left[ \\begin{array}{ccc} 5 & 6 & 7\\\\ 8 & 9 & 10\\\\ \\end{array} \\right] \\\\ \\end{array} \\right]$$\n", "\n", "$$=\\left[ \\begin{array}{cccccc} 5 & 6 & 7 & 10 & 12 & 14 \\\\ 8 & 9 & 10 & 16 & 18 & 20 \\\\ 15 & 18 & 21 & 20 & 24 & 28 \\\\ 24 & 27 & 30 & 32 & 36 & 40 \\\\ \\end{array} \\right]\\in\\mathbb{R}^{4\\times 6}.$$\n", "\n", "## 2) Khatri-Rao product (kr_prod)\n", "\n", "- **Definition**:\n", "\n", "Given two matrices $A=\\left( \\boldsymbol{a}_1,\\boldsymbol{a}_2,...,\\boldsymbol{a}_r \\right)\\in\\mathbb{R}^{m\\times r}$ and $B=\\left( \\boldsymbol{b}_1,\\boldsymbol{b}_2,...,\\boldsymbol{b}_r \\right)\\in\\mathbb{R}^{n\\times r}$ with same number of columns, then, the **Khatri-Rao product** (or **column-wise Kronecker product**) between $A$ and $B$ is given as follows,\n", "\n", "$$A\\odot B=\\left( \\boldsymbol{a}_1\\otimes \\boldsymbol{b}_1,\\boldsymbol{a}_2\\otimes \\boldsymbol{b}_2,...,\\boldsymbol{a}_r\\otimes \\boldsymbol{b}_r \\right)\\in\\mathbb{R}^{(mn)\\times r}$$\n", "where the symbol $\\odot$ denotes Khatri-Rao product, and $\\otimes$ denotes Kronecker product.\n", "\n", "- **Example**:\n", "\n", "If $A=\\left[ \\begin{array}{cc} 1 & 2 \\\\ 3 & 4 \\\\ \\end{array} \\right]=\\left( \\boldsymbol{a}_1,\\boldsymbol{a}_2 \\right)$ and $B=\\left[ \\begin{array}{cc} 5 & 6 \\\\ 7 & 8 \\\\ 9 & 10 \\\\ \\end{array} \\right]=\\left( \\boldsymbol{b}_1,\\boldsymbol{b}_2 \\right)$, then, we have\n", "\n", "$$A\\odot B=\\left( \\boldsymbol{a}_1\\otimes \\boldsymbol{b}_1,\\boldsymbol{a}_2\\otimes \\boldsymbol{b}_2 \\right)$$\n", "\n", "$$=\\left[ \\begin{array}{cc} \\left[ \\begin{array}{c} 1 \\\\ 3 \\\\ \\end{array} \\right]\\otimes \\left[ \\begin{array}{c} 5 \\\\ 7 \\\\ 9 \\\\ \\end{array} \\right] & \\left[ \\begin{array}{c} 2 \\\\ 4 \\\\ \\end{array} \\right]\\otimes \\left[ \\begin{array}{c} 6 \\\\ 8 \\\\ 10 \\\\ \\end{array} \\right] \\\\ \\end{array} \\right]$$\n", "\n", "$$=\\left[ \\begin{array}{cc} 5 & 12 \\\\ 7 & 16 \\\\ 9 & 20 \\\\ 15 & 24 \\\\ 21 & 32 \\\\ 27 & 40 \\\\ \\end{array} \\right]\\in\\mathbb{R}^{6\\times 2}.$$" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def kr_prod(a, b):\n", " return np.einsum('ir, jr -> ijr', a, b).reshape(a.shape[0] * b.shape[0], -1)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 5 12]\n", " [ 7 16]\n", " [ 9 20]\n", " [15 24]\n", " [21 32]\n", " [27 40]]\n" ] } ], "source": [ "A = np.array([[1, 2], [3, 4]])\n", "B = np.array([[5, 6], [7, 8], [9, 10]])\n", "print(kr_prod(A, B))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3) Computing Covariance Matrix (cov_mat)\n", "\n", "For any matrix $X\\in\\mathbb{R}^{m\\times n}$, cov_mat can return a $n\\times n$ covariance matrix for special use in the following." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def cov_mat(mat):\n", " dim1, dim2 = mat.shape\n", " new_mat = np.zeros((dim2, dim2))\n", " mat_bar = np.mean(mat, axis = 0)\n", " for i in range(dim1):\n", " new_mat += np.einsum('i, j -> ij', mat[i, :] - mat_bar, mat[i, :] - mat_bar)\n", " return new_mat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 2: Bayesian Temporal Matrix Factorization (BTMF)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def BTMF(dense_mat, sparse_mat, init, rank, time_lags, maxiter1, maxiter2):\n", " \"\"\"Bayesian Temporal Matrix Factorization, BTMF.\"\"\"\n", " W = init[\"W\"]\n", " X = init[\"X\"]\n", " theta = init[\"theta\"]\n", " \n", " d = theta.shape[0]\n", " dim1, dim2 = sparse_mat.shape\n", " pos = np.where((dense_mat != 0) & (sparse_mat == 0))\n", " position = np.where(sparse_mat != 0)\n", " binary_mat = np.zeros((dim1, dim2))\n", " binary_mat[position] = 1\n", " \n", " beta0 = 1\n", " nu0 = rank\n", " mu0 = np.zeros((rank))\n", " W0 = np.eye(rank)\n", " tau = 1\n", " alpha = 1e-6\n", " beta = 1e-6\n", " \n", " mat_hat = np.zeros((dim1, dim2 + 1))\n", " W_plus = np.zeros((dim1, rank))\n", " X_plus = np.zeros((dim2, rank))\n", " X_new = np.zeros((dim2 + 1, rank))\n", " X_new_plus = np.zeros((dim2 + 1, rank))\n", " theta_plus = np.zeros((d, rank))\n", " mat_hat_plus = np.zeros((dim1, dim2 + 1))\n", " for iters in range(maxiter1):\n", " W_bar = np.mean(W, axis = 0)\n", " var_mu_hyper = (dim1 * W_bar + beta0 * mu0)/(dim1 + beta0)\n", " var_W_hyper = inv(inv(W0) + cov_mat(W) + dim1 * beta0/(dim1 + beta0) \n", " * np.outer(W_bar - mu0, W_bar - mu0))\n", " var_W_hyper = (var_W_hyper + var_W_hyper.T)/2\n", " var_Lambda_hyper = wishart(df = dim1 + nu0, scale = var_W_hyper, seed = None).rvs()\n", " var_mu_hyper = mvnrnd(var_mu_hyper, inv((dim1 + beta0) * var_Lambda_hyper))\n", " \n", " var1 = X.T\n", " var2 = kr_prod(var1, var1)\n", " var3 = (tau * np.matmul(var2, binary_mat.T).reshape([rank, rank, dim1]) \n", " + np.dstack([var_Lambda_hyper] * dim1))\n", " var4 = (tau * np.matmul(var1, sparse_mat.T)\n", " + np.dstack([np.matmul(var_Lambda_hyper, var_mu_hyper)] * dim1)[0, :, :])\n", " for i in range(dim1):\n", " var_Lambda = var3[:, :, i]\n", " inv_var_Lambda = inv((var_Lambda + var_Lambda.T)/2)\n", " var_mu = np.matmul(inv_var_Lambda, var4[:, i])\n", " W[i, :] = mvnrnd(var_mu, inv_var_Lambda)\n", " if iters + 1 > maxiter1 - maxiter2:\n", " W_plus += W\n", " \n", " mat0 = X[0 : np.max(time_lags), :]\n", " mat = np.matmul(mat0.T, mat0)\n", " new_mat = np.zeros((dim2 - np.max(time_lags), rank))\n", " for t in range(dim2 - np.max(time_lags)):\n", " new_mat[t, :] = (X[t + np.max(time_lags), :]\n", " - np.einsum('ij, ij -> j', theta, X[t + np.max(time_lags) - time_lags, :]))\n", " mat += np.matmul(new_mat.T, new_mat)\n", " var_W_hyper = inv(inv(W0) + mat)\n", " var_W_hyper = (var_W_hyper + var_W_hyper.T)/2\n", " Lambda_x = wishart(df = dim2 + nu0, scale = var_W_hyper, seed = None).rvs()\n", " \n", " var1 = W.T\n", " var2 = kr_prod(var1, var1)\n", " var3 = tau * np.matmul(var2, binary_mat).reshape([rank, rank, dim2]) + np.dstack([Lambda_x] * dim2)\n", " var4 = tau * np.matmul(var1, sparse_mat)\n", " for t in range(dim2):\n", " Mt = np.zeros((rank, rank))\n", " Nt = np.zeros(rank)\n", " if t < np.max(time_lags):\n", " Qt = np.zeros(rank)\n", " else:\n", " Qt = np.matmul(Lambda_x, np.einsum('ij, ij -> j', theta, X[t - time_lags, :]))\n", " if t < dim2 - np.min(time_lags):\n", " if t >= np.max(time_lags) and t < dim2 - np.max(time_lags):\n", " index = list(range(0, d))\n", " else:\n", " index = list(np.where((t + time_lags >= np.max(time_lags)) & (t + time_lags < dim2)))[0]\n", " for k in index:\n", " Ak = theta[k, :]\n", " Mt += np.multiply(np.outer(Ak, Ak), Lambda_x)\n", " theta0 = theta.copy()\n", " theta0[k, :] = 0\n", " var5 = (X[t + time_lags[k], :] \n", " - np.einsum('ij, ij -> j', theta0, X[t + time_lags[k] - time_lags, :]))\n", " Nt += np.matmul(np.matmul(np.diag(Ak), Lambda_x), var5)\n", " var_mu = var4[:, t] + Nt + Qt\n", " var_Lambda = var3[:, :, t] + Mt\n", " inv_var_Lambda = inv((var_Lambda + var_Lambda.T)/2)\n", " X[t, :] = mvnrnd(np.matmul(inv_var_Lambda, var_mu), inv_var_Lambda)\n", " if iters + 1 > maxiter1 - maxiter2:\n", " X_plus += X\n", " \n", " mat_hat = np.matmul(W, X.T)\n", " if iters + 1 > maxiter1 - maxiter2:\n", " X_new[0 : dim2, :] = X.copy()\n", " X_new[dim2, :] = np.einsum('ij, ij -> j', theta, X_new[dim2 - time_lags, :])\n", " X_new_plus += X_new\n", " mat_hat_plus += np.matmul(W, X_new.T)\n", " rmse = np.sqrt(np.sum((dense_mat[pos] - mat_hat[pos]) ** 2)/dense_mat[pos].shape[0])\n", " \n", " var_alpha = alpha + 0.5 * sparse_mat[position].shape[0]\n", " error = sparse_mat - mat_hat\n", " var_beta = beta + 0.5 * np.sum(error[position] ** 2)\n", " tau = np.random.gamma(var_alpha, 1/var_beta)\n", "\n", " theta_bar = np.mean(theta, axis = 0)\n", " var_mu_hyper = (d * theta_bar + beta0 * mu0)/(d + beta0)\n", " var_W_hyper = inv(inv(W0) + cov_mat(theta) + d * beta0/(d + beta0)\n", " * np.outer(theta_bar - mu0, theta_bar - mu0))\n", " var_W_hyper = (var_W_hyper + var_W_hyper.T)/2\n", " var_Lambda_hyper = wishart(df = d + nu0, scale = var_W_hyper, seed = None).rvs()\n", " var_mu_hyper = mvnrnd(var_mu_hyper, inv((d + beta0) * var_Lambda_hyper))\n", " \n", " for k in range(d):\n", " theta0 = theta.copy()\n", " theta0[k, :] = 0\n", " mat0 = np.zeros((dim2 - np.max(time_lags), rank))\n", " for L in range(d):\n", " mat0 += np.matmul(X[np.max(time_lags) - time_lags[L] : dim2 - time_lags[L], :], \n", " np.diag(theta0[L, :]))\n", " VarPi = X[np.max(time_lags) : dim2, :] - mat0\n", " var1 = np.zeros((rank, rank))\n", " var2 = np.zeros(rank)\n", " for t in range(np.max(time_lags), dim2):\n", " B = X[t - time_lags[k], :]\n", " var1 += np.multiply(np.outer(B, B), Lambda_x)\n", " var2 += np.matmul(np.matmul(np.diag(B), Lambda_x), VarPi[t - np.max(time_lags), :])\n", " var_Lambda = var1 + var_Lambda_hyper\n", " inv_var_Lambda = inv((var_Lambda + var_Lambda.T)/2)\n", " var_mu = np.matmul(inv_var_Lambda, var2 + np.matmul(var_Lambda_hyper, var_mu_hyper))\n", " theta[k, :] = mvnrnd(var_mu, inv_var_Lambda)\n", " if iters + 1 > maxiter1 - maxiter2:\n", " theta_plus += theta\n", "\n", " if (iters + 1) % 200 == 0 and iters < maxiter1 - maxiter2:\n", " print('Iter: {}'.format(iters + 1))\n", " print('RMSE: {:.6}'.format(rmse))\n", " print()\n", "\n", " W = W_plus/maxiter2\n", " X = X_plus/maxiter2\n", " X_new = X_new_plus/maxiter2\n", " theta = theta_plus/maxiter2\n", " mat_hat = mat_hat_plus/maxiter2\n", " if maxiter1 >= 100:\n", " final_mape = np.sum(np.abs(dense_mat[pos] - mat_hat[pos])/dense_mat[pos])/dense_mat[pos].shape[0]\n", " final_rmse = np.sqrt(np.sum((dense_mat[pos] - mat_hat[pos]) ** 2)/dense_mat[pos].shape[0])\n", " print('Imputation MAPE: {:.6}'.format(final_mape))\n", " print('Imputation RMSE: {:.6}'.format(final_rmse))\n", " print()\n", " \n", " return mat_hat, W, X_new, theta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 3: Online Prediction Problem\n", "\n", "**Question**: How to update $\\boldsymbol{x}_{t}\\in\\mathbb{R}^{r}$?\n", "\n", "Fixing spatial factor matrix $W\\in\\mathbb{R}^{m\\times r}$.\n", "\n", "Bayesian model:\n", "\n", "$$y_{it}\\sim\\mathcal{N}\\left(\\boldsymbol{w}_{i}^{T}\\boldsymbol{x}_{t},\\tau^{-1}\\right),$$\n", "\n", "$$\\boldsymbol{x}_{t}\\sim\\mathcal{N}\\left(\\sum_{k=1}^{d}\\boldsymbol{\\theta}_{k}\\circledast\\boldsymbol{x}_{t-h_k},\\Lambda_{x}^{-1}\\right),$$\n", "\n", "$$\\tau\\sim\\text{Gamma}\\left(\\alpha,\\beta\\right),$$\n", "\n", "$$\\Lambda_{x}\\sim\\mathcal{W}\\left(W_0,\\nu_0\\right).$$\n", "\n", "The posterior distribution of $\\Lambda_{x}$ is $\\mathcal{W}\\left(W_{x}^{*},\\nu_{x}^{*}\\right)$ in which\n", "\n", "$$\\left(W_{x}^{*}\\right)^{-1}=W_0^{-1}+\\left(\\boldsymbol{x}_{t}-\\tilde{\\boldsymbol{x}}_{t}\\right)\\left(\\boldsymbol{x}_{t}-\\tilde{\\boldsymbol{x}}_{t}\\right)^{T},\\nu_{x}^{*}=\\nu_{0}+1,$$\n", "where $\\tilde{\\boldsymbol{x}}_{t}=\\sum_{k=1}^{d}\\boldsymbol{\\theta}_{k}\\circledast\\boldsymbol{x}_{t-h_{k}}$.\n", "\n", "The posterior distribution of $\\boldsymbol{x}_{t}$ is $\\mathcal{N}\\left(\\boldsymbol{\\mu}_{x}^{*},\\left(\\Lambda_{x}^{*}\\right)^{-1}\\right)$ in which\n", "\n", "$$\\Lambda_{x}^{*}=\\tau \\sum_{i\\in \\Omega_{t}} \\boldsymbol{w}_{i} \\boldsymbol{w}_{i}^{T}+\\Lambda_{x},~\\boldsymbol{\\mu}_{x}^{*}=\\left(\\Lambda_{x}^{*}\\right)^{-1}\\left(\\tau \\sum_{i\\in \\Omega_{t}} \\boldsymbol{w}_{i} y_{it}+\\Lambda_{x} \\tilde{\\boldsymbol{x}}_{t}\\right).$$\n", "\n", "The posterior distribution of $\\tau$ is $\\text{Gamma}\\left(\\alpha^{*},\\beta^{*}\\right)$ in which\n", "\n", "$$\\alpha^{*}=\\frac{1}{2}|\\Omega|+\\alpha,~\\beta^{*}=\\frac{1}{2} \\sum_{i \\in \\Omega_{t}}\\left(y_{i t}-\\boldsymbol{w}_{i}^{T} \\boldsymbol{x}_{t}\\right)^{2}+\\beta.$$" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def OnlineBTMF(sparse_vec, init, time_lags, maxiter1, maxiter2):\n", " \"\"\"Online Bayesain Temporal Matrix Factorization\"\"\"\n", " W = init[\"W\"]\n", " X = init[\"X\"]\n", " theta = init[\"theta\"]\n", "\n", " d = time_lags.shape[0]\n", " dim = sparse_vec.shape[0]\n", " t, rank = X.shape\n", " position = np.where(sparse_vec != 0)\n", " binary_vec = np.zeros(dim)\n", " binary_vec[position] = 1\n", "\n", " tau = 1\n", " alpha = 1e-6\n", " beta = 1e-6\n", " nu0 = rank\n", " W0 = np.eye(rank)\n", " var_mu0 = np.einsum('ij, ij -> j', theta, X[t - 1 - time_lags, :])\n", "\n", " X_new_plus = np.zeros((t + 1, rank))\n", " mat_hat_plus = np.zeros((W.shape[0], t + 1))\n", " for iters in range(maxiter1):\n", " vec0 = X[t - 1, :] - var_mu0\n", " Lambda_x = wishart(df = nu0 + 1, scale = inv(inv(W0) + np.outer(vec0, vec0)), seed = None).rvs()\n", " \n", " var1 = W.T\n", " var2 = kr_prod(var1, var1)\n", " var_mu = tau * np.matmul(var1, sparse_vec) + np.matmul(Lambda_x, var_mu0)\n", " inv_var_Lambda = inv(tau * np.matmul(var2, binary_vec).reshape([rank, rank]) + Lambda_x)\n", " X[t - 1, :] = mvnrnd(np.matmul(inv_var_Lambda, var_mu), inv_var_Lambda)\n", " \n", " tau = np.random.gamma(alpha + 0.5 * sparse_vec[position].shape[0], \n", " 1/(beta + 0.5 * np.sum((sparse_vec - np.matmul(W, X[t - 1, :]))[position] ** 2)))\n", " \n", " X_new = np.zeros((t + 1, rank))\n", " if iters + 1 > maxiter1 - maxiter2:\n", " X_new[0 : t, :] = X.copy()\n", " X_new[t, :] = np.einsum('ij, ij -> j', theta, X_new[t - time_lags, :])\n", " X_new_plus += X_new\n", " mat_hat_plus += np.matmul(W, X_new.T)\n", "\n", " X_new = X_new_plus/maxiter2\n", " mat_hat = mat_hat_plus/maxiter2\n", " return mat_hat, X_new" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter):\n", " T = dense_mat.shape[1]\n", " start_time = T - pred_time_steps\n", " dense_mat0 = dense_mat[:, 0 : start_time]\n", " sparse_mat0 = sparse_mat[:, 0 : start_time]\n", " dim1, dim2 = sparse_mat0.shape\n", " d = time_lags.shape[0]\n", " mat_hat = np.zeros((dim1, pred_time_steps))\n", " \n", " for t in range(pred_time_steps):\n", " if t == 0:\n", " init = {\"W\": 0.1 * np.random.rand(dim1, rank), \"X\": 0.1 * np.random.rand(dim2, rank),\n", " \"theta\": 0.1 * np.random.rand(d, rank)}\n", " mat, W, X, theta = BTMF(dense_mat0, sparse_mat0, init, rank, time_lags, maxiter[0], maxiter[1])\n", " else:\n", " sparse_vec = sparse_mat[:, start_time + t - 1]\n", " if np.where(sparse_vec > 0)[0].shape[0] > rank:\n", " init = {\"W\": W, \"X\": X[- np.max(time_lags) :, :], \"theta\": theta}\n", " mat, X = OnlineBTMF(sparse_vec, init, time_lags, maxiter[2], maxiter[3])\n", " else:\n", " X0 = np.zeros(X.shape)\n", " X0[: -1, :] = X[- np.max(time_lags) :, :]\n", " X0[-1, :] = np.einsum('ij, ij -> j', theta, X[-1 - time_lags, :])\n", " X = X0.copy()\n", " mat = np.matmul(W, X.T)\n", " mat_hat[:, t] = mat[:, -1]\n", " if (t + 1) % 40 == 0:\n", " print('Time step: {}'.format(t + 1))\n", "\n", " small_dense_mat = dense_mat[:, start_time : dense_mat.shape[1]]\n", " pos = np.where(small_dense_mat > 0)\n", " final_mape = np.sum(np.abs(small_dense_mat[pos] - \n", " mat_hat[pos])/small_dense_mat[pos])/small_dense_mat[pos].shape[0]\n", " final_rmse = np.sqrt(np.sum((small_dense_mat[pos] - \n", " mat_hat[pos]) ** 2)/small_dense_mat[pos].shape[0])\n", " print('Final MAPE: {:.6}'.format(final_mape))\n", " print('Final RMSE: {:.6}'.format(final_rmse))\n", " print()\n", " return mat_hat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 4: Data Organization\n", "\n", "## 1) Matrix Structure\n", "\n", "We consider a dataset of $m$ discrete time series $\\boldsymbol{y}_{i}\\in\\mathbb{R}^{f},i\\in\\left\\{1,2,...,m\\right\\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\\in\\mathbb{R}^{m\\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),\n", "\n", "$$Y=\\left[ \\begin{array}{cccc} y_{11} & y_{12} & \\cdots & y_{1f} \\\\ y_{21} & y_{22} & \\cdots & y_{2f} \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ y_{m1} & y_{m2} & \\cdots & y_{mf} \\\\ \\end{array} \\right]\\in\\mathbb{R}^{m\\times f}.$$\n", "\n", "## 2) Tensor Structure\n", "\n", "We consider a dataset of $m$ discrete time series $\\boldsymbol{y}_{i}\\in\\mathbb{R}^{nf},i\\in\\left\\{1,2,...,m\\right\\}$. The time series may have missing elements. We partition each time series into intervals of predifined length $f$. We express each partitioned time series as a matrix $Y_{i}$ with $n$ rows (e.g., days) and $f$ columns (e.g., discrete time intervals per day),\n", "\n", "$$Y_{i}=\\left[ \\begin{array}{cccc} y_{11} & y_{12} & \\cdots & y_{1f} \\\\ y_{21} & y_{22} & \\cdots & y_{2f} \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ y_{n1} & y_{n2} & \\cdots & y_{nf} \\\\ \\end{array} \\right]\\in\\mathbb{R}^{n\\times f},i=1,2,...,m,$$\n", "\n", "therefore, the resulting structure is a tensor $\\mathcal{Y}\\in\\mathbb{R}^{m\\times n\\times f}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 5: Experiments on Guangzhou Data Set" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.0\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:100: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:147: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:148: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: nan\n", "Imputation RMSE: nan\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Final MAPE: 0.10697\n", "Final RMSE: 4.2706\n", "\n", "Running time: 1979 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 144 * 5\n", "rank = 30\n", "time_lags = np.array([1, 2, 144])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.2\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.0859967\n", "Imputation RMSE: 3.6448\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Final MAPE: 0.110258\n", "Final RMSE: 4.4329\n", "\n", "Running time: 2358 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 144 * 5\n", "rank = 30\n", "time_lags = np.array([1, 2, 144])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.4\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.0869872\n", "Imputation RMSE: 3.6998\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Final MAPE: 0.111931\n", "Final RMSE: 4.4873\n", "\n", "Running time: 2649 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 144 * 5\n", "rank = 30\n", "time_lags = np.array([1, 2, 144])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.2\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", " * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.102854\n", "Imputation RMSE: 4.54364\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Final MAPE: 0.11122\n", "Final RMSE: 4.49188\n", "\n", "Running time: 2073 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 144 * 5\n", "rank = 30\n", "time_lags = np.array([1, 2, 144])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.4\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", " * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.113144\n", "Imputation RMSE: 5.12058\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Final MAPE: 0.119727\n", "Final RMSE: 4.90671\n", "\n", "Running time: 1923 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 144 * 5\n", "rank = 30\n", "time_lags = np.array([1, 2, 144])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Experiment results** of short-term rolling prediction with missing values using BTMF:\n", "\n", "| scenario |rank|time_lags| maxiter | mape | rmse |\n", "|:----------|-----:|---------:|---------:|-----------:|----------:|\n", "|**Original data**| 30 | (1,2,144) | (100,100,500,500) | **0.1070** | **4.27**|\n", "|**20%, RM**| 30 | (1,2,144) | (100,100,500,500) | **0.1103** | **4.43**|\n", "|**40%, RM**| 30 | (1,2,144) | (100,100,500,500) | **0.1119** | **4.49**|\n", "|**20%, NM**| 30 | (1,2,144) | (100,100,500,500) | **0.1112** | **4.49**|\n", "|**40%, NM**| 30 | (1,2,144) | (100,100,500,500) | **0.1197** | **4.91**|\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 6: Experiments on Birmingham Data Set" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.0\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:100: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:147: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:148: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: nan\n", "Imputation RMSE: nan\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Final MAPE: 0.318002\n", "Final RMSE: 161.113\n", "\n", "Running time: 173 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 18 * 7\n", "rank = 10\n", "time_lags = np.array([1, 2, 18])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.1\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.080144\n", "Imputation RMSE: 21.0435\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Final MAPE: 0.320708\n", "Final RMSE: 167.157\n", "\n", "Running time: 173 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 18 * 7\n", "rank = 10\n", "time_lags = np.array([1, 2, 18])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.3\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.0815558\n", "Imputation RMSE: 28.372\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Final MAPE: 0.31212\n", "Final RMSE: 166.872\n", "\n", "Running time: 175 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 18 * 7\n", "rank = 10\n", "time_lags = np.array([1, 2, 18])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.1\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.129272\n", "Imputation RMSE: 30.629\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Final MAPE: 0.329965\n", "Final RMSE: 170.479\n", "\n", "Running time: 173 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 18 * 7\n", "rank = 10\n", "time_lags = np.array([1, 2, 18])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.3\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.155757\n", "Imputation RMSE: 71.7693\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Final MAPE: 0.357067\n", "Final RMSE: 173.645\n", "\n", "Running time: 173 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 18 * 7\n", "rank = 10\n", "time_lags = np.array([1, 2, 18])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Experiment results** of short-term rolling prediction with missing values using BTMF:\n", "\n", "| scenario |rank|time_lags| maxiter | mape | rmse |\n", "|:----------|-----:|---------:|---------:|-----------:|----------:|\n", "|**Original data**| 10 | (1,2,18) | (200,100,1100,100) | **0.3180** | **161.11**|\n", "|**10%, RM**| 10 | (1,2,18) | (200,100,1100,100) | **0.3207** | **167.16**|\n", "|**30%, RM**| 10 | (1,2,18) | (200,100,1100,100) | **0.3121** | **166.87**|\n", "|**10%, NM**| 10 | (1,2,18) | (200,100,1100,100) | **0.3300** | **170.48**|\n", "|**30%, NM**| 10 | (1,2,18) | (200,100,1100,100) | **0.3571** | **173.65**|\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 7: Experiments on Hangzhou Data Set" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.0\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:100: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:147: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:148: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: nan\n", "Imputation RMSE: nan\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Final MAPE: 0.301681\n", "Final RMSE: 40.8733\n", "\n", "Running time: 495 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 108 * 5\n", "rank = 10\n", "time_lags = np.array([1, 2, 108])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.2\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.240128\n", "Imputation RMSE: 41.3058\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Final MAPE: 0.323368\n", "Final RMSE: 48.2007\n", "\n", "Running time: 497 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 108 * 5\n", "rank = 10\n", "time_lags = np.array([1, 2, 108])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.4\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.250495\n", "Imputation RMSE: 43.5886\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Final MAPE: 0.349265\n", "Final RMSE: 49.2994\n", "\n", "Running time: 495 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 108 * 5\n", "rank = 10\n", "time_lags = np.array([1, 2, 108])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.2\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", " * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.279115\n", "Imputation RMSE: 90.332\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Final MAPE: 0.308464\n", "Final RMSE: 49.1956\n", "\n", "Running time: 497 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 108 * 5\n", "rank = 10\n", "time_lags = np.array([1, 2, 108])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.4\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", " * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.292671\n", "Imputation RMSE: 77.7601\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Final MAPE: 0.296947\n", "Final RMSE: 51.6401\n", "\n", "Running time: 493 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 108 * 5\n", "rank = 10\n", "time_lags = np.array([1, 2, 108])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Experiment results** of short-term rolling prediction with missing values using BTMF:\n", "\n", "| scenario |rank|time_lags| maxiter | mape | rmse |\n", "|:----------|-----:|----------:|----------:|-----------:|----------:|\n", "|**Original data**| 30 | (1,2,108) | (200,100,1100,100) | **0.3017** | **40.87**|\n", "|**20%, RM**| 30 | (1,2,108) | (200,100,1100,100) | **0.3234** | **48.20**|\n", "|**40%, RM**| 30 | (1,2,108) | (200,100,1100,100) | **0.3493** | **49.30**|\n", "|**20%, NM**| 30 | (1,2,108) | (200,100,1100,100) | **0.3085** | **49.20**|\n", "|**40%, NM**| 30 | (1,2,108) | (200,100,1100,100) | **0.2969** | **51.64**|\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 8: Experiments on Seattle Data Set" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)\n", "RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)\n", "dense_mat = dense_mat.values\n", "RM_mat = RM_mat.values\n", "\n", "missing_rate = 0.2\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(RM_mat + 0.5 - missing_rate)\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.0671328\n", "Imputation RMSE: 4.08922\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Time step: 760\n", "Time step: 800\n", "Time step: 840\n", "Time step: 880\n", "Time step: 920\n", "Time step: 960\n", "Time step: 1000\n", "Time step: 1040\n", "Time step: 1080\n", "Time step: 1120\n", "Time step: 1160\n", "Time step: 1200\n", "Time step: 1240\n", "Time step: 1280\n", "Time step: 1320\n", "Time step: 1360\n", "Time step: 1400\n", "Time step: 1440\n", "Final MAPE: 0.0813108\n", "Final RMSE: 4.90261\n", "\n", "Running time: 3509 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 288 * 5\n", "rank = 30\n", "time_lags = np.array([1, 2, 288])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)\n", "RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)\n", "dense_mat = dense_mat.values\n", "RM_mat = RM_mat.values\n", "\n", "missing_rate = 0.4\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(RM_mat + 0.5 - missing_rate)\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.0685051\n", "Imputation RMSE: 4.14694\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Time step: 760\n", "Time step: 800\n", "Time step: 840\n", "Time step: 880\n", "Time step: 920\n", "Time step: 960\n", "Time step: 1000\n", "Time step: 1040\n", "Time step: 1080\n", "Time step: 1120\n", "Time step: 1160\n", "Time step: 1200\n", "Time step: 1240\n", "Time step: 1280\n", "Time step: 1320\n", "Time step: 1360\n", "Time step: 1400\n", "Time step: 1440\n", "Final MAPE: 0.0840672\n", "Final RMSE: 5.07584\n", "\n", "Running time: 3431 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 288 * 5\n", "rank = 30\n", "time_lags = np.array([1, 2, 288])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)\n", "NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)\n", "dense_mat = dense_mat.values\n", "NM_mat = NM_mat.values\n", "\n", "missing_rate = 0.2\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))\n", "for i1 in range(binary_tensor.shape[0]):\n", " for i2 in range(binary_tensor.shape[1]):\n", " binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.0826717\n", "Imputation RMSE: 4.94159\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Time step: 760\n", "Time step: 800\n", "Time step: 840\n", "Time step: 880\n", "Time step: 920\n", "Time step: 960\n", "Time step: 1000\n", "Time step: 1040\n", "Time step: 1080\n", "Time step: 1120\n", "Time step: 1160\n", "Time step: 1200\n", "Time step: 1240\n", "Time step: 1280\n", "Time step: 1320\n", "Time step: 1360\n", "Time step: 1400\n", "Time step: 1440\n", "Final MAPE: 0.0796434\n", "Final RMSE: 4.84381\n", "\n", "Running time: 3153 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 288 * 5\n", "rank = 30\n", "time_lags = np.array([1, 2, 288])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)\n", "NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)\n", "dense_mat = dense_mat.values\n", "NM_mat = NM_mat.values\n", "\n", "missing_rate = 0.4\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))\n", "for i1 in range(binary_tensor.shape[0]):\n", " for i2 in range(binary_tensor.shape[1]):\n", " binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: 0.0887443\n", "Imputation RMSE: 5.48192\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Time step: 760\n", "Time step: 800\n", "Time step: 840\n", "Time step: 880\n", "Time step: 920\n", "Time step: 960\n", "Time step: 1000\n", "Time step: 1040\n", "Time step: 1080\n", "Time step: 1120\n", "Time step: 1160\n", "Time step: 1200\n", "Time step: 1240\n", "Time step: 1280\n", "Time step: 1320\n", "Time step: 1360\n", "Time step: 1400\n", "Time step: 1440\n", "Final MAPE: 0.0846578\n", "Final RMSE: 5.12452\n", "\n", "Running time: 3133 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 288 * 5\n", "rank = 30\n", "time_lags = np.array([1, 2, 288])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)\n", "NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)\n", "dense_mat = dense_mat.values\n", "NM_mat = NM_mat.values\n", "\n", "missing_rate = 0.0\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))\n", "for i1 in range(binary_tensor.shape[0]):\n", " for i2 in range(binary_tensor.shape[1]):\n", " binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:100: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:147: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:148: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Imputation MAPE: nan\n", "Imputation RMSE: nan\n", "\n", "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Time step: 760\n", "Time step: 800\n", "Time step: 840\n", "Time step: 880\n", "Time step: 920\n", "Time step: 960\n", "Time step: 1000\n", "Time step: 1040\n", "Time step: 1080\n", "Time step: 1120\n", "Time step: 1160\n", "Time step: 1200\n", "Time step: 1240\n", "Time step: 1280\n", "Time step: 1320\n", "Time step: 1360\n", "Time step: 1400\n", "Time step: 1440\n", "Final MAPE: 0.0789641\n", "Final RMSE: 4.78312\n", "\n", "Running time: 3203 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 288 * 5\n", "rank = 30\n", "time_lags = np.array([1, 2, 288])\n", "maxiter = np.array([200, 100, 1100, 100])\n", "small_dense_mat = dense_mat[:, dense_mat.shape[1] - pred_time_steps : dense_mat.shape[1]]\n", "mat_hat = st_prediction(dense_mat, sparse_mat, pred_time_steps, rank, time_lags, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Experiment results** of short-term rolling prediction with missing values using BTMF:\n", "\n", "| scenario |rank|time_lags| maxiter | mape | rmse |\n", "|:----------|-----:|---------:|---------:|-----------:|----------:|\n", "|**Original data**| 30 | (1,2,288) | (200,100,1100,100) | **0.0790** | **4.78**|\n", "|**20%, RM**| 30 | (1,2,288) | (200,100,1100,100) | **0.0813** | **4.90**|\n", "|**40%, RM**| 30 | (1,2,288) | (200,100,1100,100) | **0.0841** | **5.08**|\n", "|**20%, NM**| 30 | (1,2,108) | (200,100,1100,100) | **0.0796** | **4.84**|\n", "|**40%, NM**| 30 | (1,2,108) | (200,100,1100,100) | **0.0847** | **5.12**|\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }