{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Organization: Matrix Structure\n", "\n", ">**Reference**: Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon, 2016. [*Temporal regularized matrix factorization for high-dimensional time series prediction*](http://www.cs.utexas.edu/~rofuyu/papers/tr-mf-nips.pdf). 30th Conference on Neural Information Processing Systems (*NIPS 2016*), Barcelona, Spain.\n", "\n", "We consider a dataset of $m$ discrete time series $\\boldsymbol{y}_{i}\\in\\mathbb{R}^{f},i\\in\\left\\{1,2,...,m\\right\\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\\in\\mathbb{R}^{m\\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),\n", "\n", "$$Y=\\left[ \\begin{array}{cccc} y_{11} & y_{12} & \\cdots & y_{1f} \\\\ y_{21} & y_{22} & \\cdots & y_{2f} \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ y_{m1} & y_{m2} & \\cdots & y_{mf} \\\\ \\end{array} \\right]\\in\\mathbb{R}^{m\\times f}.$$\n", "\n", "# Temporal Regularized Matrix Factorization(TRMF)\n", "Temporal Regularized Matrix Factorization (TRMF) framework is an approach to incorporate temporal dependencies into matrix factorization models which use well-studied time series models to describe temporal dependencies\n", "among ${\\boldsymbol{x}_t}$ explicitly.Such models take the form:\n", "\n", "$$\\boldsymbol{x}_{t}\\approx\\sum_{l\\in\\mathcal{L}}\\boldsymbol{\\theta}_{l}\\circledast\\boldsymbol{x}_{t-l}$$\n", "\n", "where this autoregressive (AR) is specialized by a lag set $\\mathcal{L}=\\left\\{l_1,l_2,...,l_d\\right\\}$ (e.g., $\\mathcal{L}=\\left\\{1,2,144\\right\\}$) and weights $\\boldsymbol{\\theta}_{l}\\in\\mathbb{R}^{r},\\forall l$, and we further define\n", "\n", "$$\\mathcal{R}_{AR}\\left(X\\mid \\mathcal{L},\\Theta,\\eta\\right)=\\frac{1}{2}\\sum_{t=l_d+1}^{f}\\left(\\boldsymbol{x}_{t}-\\sum_{l\\in\\mathcal{L}}\\boldsymbol{\\theta}_{l}\\circledast\\boldsymbol{x}_{t-l}\\right)^T\\left(\\boldsymbol{x}_{t}-\\sum_{l\\in\\mathcal{L}}\\boldsymbol{\\theta}_{l}\\circledast\\boldsymbol{x}_{t-l}\\right)+\\frac{\\eta}{2}\\sum_{t=1}^{f}\\boldsymbol{x}_{t}^T\\boldsymbol{x}_{t}.$$\n", "\n", "Thus, TRMF-AR is given by solving\n", "\n", "$$\\min_{W,X,\\Theta}\\frac{1}{2}\\underbrace{\\sum_{(i,t)\\in\\Omega}\\left(y_{it}-\\boldsymbol{w}_{i}^T\\boldsymbol{x}_{t}\\right)^2}_{\\text{sum of squared residual errors}}+\\lambda_{w}\\underbrace{\\mathcal{R}_{w}\\left(W\\right)}_{W-\\text{regularizer}}+\\lambda_{x}\\underbrace{\\mathcal{R}_{AR}\\left(X\\mid \\mathcal{L},\\Theta,\\eta\\right)}_{\\text{AR-regularizer}}+\\lambda_{\\theta}\\underbrace{\\mathcal{R}_{\\theta}\\left(\\Theta\\right)}_{\\Theta-\\text{regularizer}}$$\n", "\n", "where $\\mathcal{R}_{w}\\left(W\\right)=\\frac{1}{2}\\sum_{i=1}^{m}\\boldsymbol{w}_{i}^T\\boldsymbol{w}_{i}$ and $\\mathcal{R}_{\\theta}\\left(\\Theta\\right)=\\frac{1}{2}\\sum_{l\\in\\mathcal{L}}\\boldsymbol{\\theta}_{l}^T\\boldsymbol{\\theta}_{l}$ are regularization terms." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from numpy.linalg import inv as inv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Matrix Computation Concepts\n", "\n", "## Kronecker product\n", "\n", "- **Definition**:\n", "\n", "Given two matrices $A\\in\\mathbb{R}^{m_1\\times n_1}$ and $B\\in\\mathbb{R}^{m_2\\times n_2}$, then, the **Kronecker product** between these two matrices is defined as\n", "\n", "$$A\\otimes B=\\left[ \\begin{array}{cccc} a_{11}B & a_{12}B & \\cdots & a_{1m_2}B \\\\ a_{21}B & a_{22}B & \\cdots & a_{2m_2}B \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ a_{m_11}B & a_{m_12}B & \\cdots & a_{m_1m_2}B \\\\ \\end{array} \\right]$$\n", "where the symbol $\\otimes$ denotes Kronecker product, and the size of resulted $A\\otimes B$ is $(m_1m_2)\\times (n_1n_2)$ (i.e., $m_1\\times m_2$ columns and $n_1\\times n_2$ rows).\n", "\n", "- **Example**:\n", "\n", "If $A=\\left[ \\begin{array}{cc} 1 & 2 \\\\ 3 & 4 \\\\ \\end{array} \\right]$ and $B=\\left[ \\begin{array}{ccc} 5 & 6 & 7\\\\ 8 & 9 & 10 \\\\ \\end{array} \\right]$, then, we have\n", "\n", "$$A\\otimes B=\\left[ \\begin{array}{cc} 1\\times \\left[ \\begin{array}{ccc} 5 & 6 & 7\\\\ 8 & 9 & 10\\\\ \\end{array} \\right] & 2\\times \\left[ \\begin{array}{ccc} 5 & 6 & 7\\\\ 8 & 9 & 10\\\\ \\end{array} \\right] \\\\ 3\\times \\left[ \\begin{array}{ccc} 5 & 6 & 7\\\\ 8 & 9 & 10\\\\ \\end{array} \\right] & 4\\times \\left[ \\begin{array}{ccc} 5 & 6 & 7\\\\ 8 & 9 & 10\\\\ \\end{array} \\right] \\\\ \\end{array} \\right]$$\n", "\n", "$$=\\left[ \\begin{array}{cccccc} 5 & 6 & 7 & 10 & 12 & 14 \\\\ 8 & 9 & 10 & 16 & 18 & 20 \\\\ 15 & 18 & 21 & 20 & 24 & 28 \\\\ 24 & 27 & 30 & 32 & 36 & 40 \\\\ \\end{array} \\right]\\in\\mathbb{R}^{4\\times 6}.$$\n", "\n", "## Khatri-Rao product (kr_prod)\n", "\n", "- **Definition**:\n", "\n", "Given two matrices $A=\\left( \\boldsymbol{a}_1,\\boldsymbol{a}_2,...,\\boldsymbol{a}_r \\right)\\in\\mathbb{R}^{m\\times r}$ and $B=\\left( \\boldsymbol{b}_1,\\boldsymbol{b}_2,...,\\boldsymbol{b}_r \\right)\\in\\mathbb{R}^{n\\times r}$ with same number of columns, then, the **Khatri-Rao product** (or **column-wise Kronecker product**) between $A$ and $B$ is given as follows,\n", "\n", "$$A\\odot B=\\left( \\boldsymbol{a}_1\\otimes \\boldsymbol{b}_1,\\boldsymbol{a}_2\\otimes \\boldsymbol{b}_2,...,\\boldsymbol{a}_r\\otimes \\boldsymbol{b}_r \\right)\\in\\mathbb{R}^{(mn)\\times r}$$\n", "where the symbol $\\odot$ denotes Khatri-Rao product, and $\\otimes$ denotes Kronecker product.\n", "\n", "- **Example**:\n", "\n", "If $A=\\left[ \\begin{array}{cc} 1 & 2 \\\\ 3 & 4 \\\\ \\end{array} \\right]=\\left( \\boldsymbol{a}_1,\\boldsymbol{a}_2 \\right)$ and $B=\\left[ \\begin{array}{cc} 5 & 6 \\\\ 7 & 8 \\\\ 9 & 10 \\\\ \\end{array} \\right]=\\left( \\boldsymbol{b}_1,\\boldsymbol{b}_2 \\right)$, then, we have\n", "\n", "$$A\\odot B=\\left( \\boldsymbol{a}_1\\otimes \\boldsymbol{b}_1,\\boldsymbol{a}_2\\otimes \\boldsymbol{b}_2 \\right)$$\n", "\n", "$$=\\left[ \\begin{array}{cc} \\left[ \\begin{array}{c} 1 \\\\ 3 \\\\ \\end{array} \\right]\\otimes \\left[ \\begin{array}{c} 5 \\\\ 7 \\\\ 9 \\\\ \\end{array} \\right] & \\left[ \\begin{array}{c} 2 \\\\ 4 \\\\ \\end{array} \\right]\\otimes \\left[ \\begin{array}{c} 6 \\\\ 8 \\\\ 10 \\\\ \\end{array} \\right] \\\\ \\end{array} \\right]$$\n", "\n", "$$=\\left[ \\begin{array}{cc} 5 & 12 \\\\ 7 & 16 \\\\ 9 & 20 \\\\ 15 & 24 \\\\ 21 & 32 \\\\ 27 & 40 \\\\ \\end{array} \\right]\\in\\mathbb{R}^{6\\times 2}.$$" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def kr_prod(a, b):\n", " return np.einsum('ir, jr -> ijr', a, b).reshape(a.shape[0] * b.shape[0], -1)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 5 12]\n", " [ 7 16]\n", " [ 9 20]\n", " [15 24]\n", " [21 32]\n", " [27 40]]\n" ] } ], "source": [ "import numpy as np\n", "A = np.array([[1, 2], [3, 4]])\n", "B = np.array([[5, 6], [7, 8], [9, 10]])\n", "print(kr_prod(A, B))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def TRMF(dense_mat, sparse_mat, init, time_lags, lambda_w, lambda_x, lambda_theta, eta, maxiter):\n", " W = init[\"W\"]\n", " X = init[\"X\"]\n", " theta = init[\"theta\"]\n", " \n", " dim1, dim2 = sparse_mat.shape\n", " binary_mat = np.zeros((dim1,dim2))\n", " position = np.where((sparse_mat != 0))\n", " binary_mat[position] = 1\n", " pos = np.where((dense_mat != 0) & (sparse_mat == 0))\n", " d = len(time_lags)\n", " r = theta.shape[1]\n", "\n", " mape = np.zeros(maxiter)\n", " rmse = np.zeros(maxiter)\n", " for iter in range(maxiter):\n", " var1 = X.T\n", " var2 = kr_prod(var1,var1)\n", " var3 = np.matmul(var2,binary_mat.T)\n", " var4 = np.matmul(var1,sparse_mat.T)\n", " for i in range(dim1):\n", " W[i,:] = np.matmul(inv((var3[:,i].reshape([r,r]))+lambda_w * np.eye(r)), var4[:,i])\n", "\n", " var1 = W.T\n", " var2 = kr_prod(var1,var1)\n", " var3 = np.matmul(var2, binary_mat)\n", " var4 = np.matmul(var1, sparse_mat)\n", " for t in range(dim2):\n", " Mt = np.zeros((r,r))\n", " Nt = np.zeros(r)\n", " if t < max(time_lags):\n", " Pt = np.zeros((r,r))\n", " Qt = np.zeros(r)\n", " else:\n", " Pt = np.eye(r)\n", " Qt = np.einsum('ij, ij -> j', theta, X[t - time_lags, :])\n", " if t < dim2 - np.min(time_lags):\n", " if t >= np.max(time_lags) and t < dim2 - np.max(time_lags):\n", " index = list(range(0, d))\n", " else:\n", " index = list(np.where((t + time_lags >= np.max(time_lags)) & (t + time_lags < dim2)))[0]\n", " for k in index:\n", " theta0 = theta.copy()\n", " theta0[k, :] = 0\n", " Mt = Mt + np.diag(theta[k, :]**2);\n", " Nt = Nt + np.multiply(theta[k,:],(X[t+time_lags[k], :] \n", " - np.einsum('ij, ij -> j', theta0,\n", " X[t + time_lags[k] - time_lags, :])))\n", " X[t,:] = np.matmul(inv(var3[:, t].reshape([r,r])\n", " + lambda_x * Pt + lambda_x * Mt + lambda_x * eta * np.eye(r)),\n", " (var4[:, t] + lambda_x * Qt + lambda_x * Nt))\n", " elif t >= dim2 - np.min(time_lags):\n", " X[t, :] = np.matmul(inv(var3[:, t].reshape([r, r]) + lambda_x * Pt \n", " + lambda_x * eta * np.eye(r)), (var4[:, t] + Qt))\n", " for k in range(d):\n", " var1 = X[np.max(time_lags) - time_lags[k] : dim2 - time_lags[k], :]\n", " var2 = inv(np.diag(np.einsum('ij, ij -> j', var1, var1)) + (lambda_theta / lambda_x) * np.eye(r))\n", " var3 = np.zeros(r)\n", " for t in range(np.max(time_lags) - time_lags[k], dim2 - time_lags[k]):\n", " var3 = var3 + np.multiply(X[t, :],\n", " (X[t + time_lags[k], :] \n", " - np.einsum('ij, ij -> j', theta, X[t + time_lags[k] - time_lags, :])\n", " +np.multiply(theta[k, :], X[t,:])))\n", " theta[k, :] = np.matmul(var2,var3)\n", "\n", " mat_hat = np.matmul(W, X.T)\n", " mape[iter] = np.sum(np.abs(dense_mat[pos] - mat_hat[pos]) / dense_mat[pos]) / dense_mat[pos].shape[0]\n", " rmse[iter] = np.sqrt(np.sum((dense_mat[pos] - mat_hat[pos])**2)/dense_mat[pos].shape[0])\n", " return W, X, theta" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def OnlineTRMF(sparse_vec, init, lambda_x, time_lags):\n", " W = init[\"W\"]\n", " X = init[\"X\"]\n", " theta = init[\"theta\"]\n", " dim = sparse_vec.shape[0]\n", " t, rank = X.shape\n", " position = np.where(sparse_vec != 0)\n", " binary_vec = np.zeros(dim)\n", " binary_vec[position] = 1\n", " \n", " xt_tilde = np.einsum('ij, ij -> j', theta, X[t - 1 - time_lags, :])\n", " var1 = W.T\n", " var2 = kr_prod(var1, var1)\n", " var_mu = np.matmul(var1, sparse_vec) + lambda_x * xt_tilde\n", " inv_var_Lambda = inv(np.matmul(var2, binary_vec).reshape([rank, rank]) + lambda_x * np.eye(rank))\n", " X[t - 1, :] = np.matmul(inv_var_Lambda, var_mu)\n", " mat_hat = np.matmul(W, X.T) \n", " return X" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def st_prediction(dense_mat, sparse_mat, time_lags, lambda_w, lambda_x, lambda_theta, eta, \n", " rank, pred_time_steps, maxiter):\n", " start_time = dense_mat.shape[1] - pred_time_steps\n", " dense_mat0 = dense_mat[:, 0 : start_time] \n", " sparse_mat0 = sparse_mat[:, 0 : start_time]\n", " dim1 = sparse_mat0.shape[0]\n", " dim2 = sparse_mat0.shape[1]\n", " mat_hat = np.zeros((dim1, pred_time_steps))\n", "\n", " for t in range(pred_time_steps):\n", " if t == 0:\n", " init = {\"W\": 0.1 * np.random.rand(dim1, rank), \"X\": 0.1 * np.random.rand(dim2, rank),\n", " \"theta\": 0.1 * np.random.rand(time_lags.shape[0], rank)}\n", " W, X, theta = TRMF(dense_mat0, sparse_mat0, init, time_lags, \n", " lambda_w, lambda_x, lambda_theta, eta, maxiter)\n", " X0 = np.zeros((dim2 + t + 1, rank))\n", " X0[0 : dim2 + t, :] = X.copy()\n", " X0[dim2 + t, :] = np.einsum('ij, ij -> j', theta, X0[dim2 + t - time_lags, :])\n", " else:\n", " sparse_vec = sparse_mat[:, start_time + t - 1]\n", " if np.where(sparse_vec > 0)[0].shape[0] > rank:\n", " init = {\"W\": W, \"X\": X0[- np.max(time_lags) - 1 :, :], \"theta\": theta}\n", " X = OnlineTRMF(sparse_vec, init, lambda_x/dim2, time_lags)\n", " X0 = np.zeros((np.max(time_lags) + 1, rank))\n", " X0[0 : np.max(time_lags), :] = X[1 :, :].copy()\n", " X0[np.max(time_lags), :] = np.einsum('ij, ij -> j', theta, X0[np.max(time_lags) - time_lags, :])\n", " else:\n", " X0 = np.zeros((np.max(time_lags) + 1, rank))\n", " X0[0 : np.max(time_lags), :] = X[1 :, :]\n", " X0[np.max(time_lags), :] = np.einsum('ij, ij -> j', theta, X0[np.max(time_lags) - time_lags, :])\n", " mat_hat[:, t] = np.matmul(W, X0[-1, :])\n", " if (t + 1) % 40 == 0:\n", " print('Time step: {}'.format(t + 1))\n", "\n", " small_dense_mat = dense_mat[:, start_time : dense_mat.shape[1]]\n", " pos = np.where(small_dense_mat != 0)\n", " final_mape = np.sum(np.abs(small_dense_mat[pos] - \n", " mat_hat[pos])/small_dense_mat[pos])/small_dense_mat[pos].shape[0]\n", " final_rmse = np.sqrt(np.sum((small_dense_mat[pos] - \n", " mat_hat[pos]) ** 2)/small_dense_mat[pos].shape[0])\n", " print('Final MAPE: {:.6}'.format(final_mape))\n", " print('Final RMSE: {:.6}'.format(final_rmse))\n", " print()\n", " return mat_hat" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.0\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "# binary_tensor = np.zeros(tensor.shape)\n", "# for i1 in range(tensor.shape[0]):\n", "# for i2 in range(tensor.shape[1]):\n", "# binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", "# * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Final MAPE: 0.106524\n", "Final RMSE: 4.29955\n", "\n", "Running time: 417 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 144 * 5\n", "time_lags = np.array([1, 2, 144])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 30\n", "lambda_w = 500\n", "lambda_x = 500\n", "lambda_theta = 500\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.2\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "# binary_tensor = np.zeros(tensor.shape)\n", "# for i1 in range(tensor.shape[0]):\n", "# for i2 in range(tensor.shape[1]):\n", "# binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", "# * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Final MAPE: 0.106232\n", "Final RMSE: 4.3062\n", "\n", "Running time: 404 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 144 * 5\n", "time_lags = np.array([1, 2, 144])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 30\n", "lambda_w = 500\n", "lambda_x = 500\n", "lambda_theta = 500\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.4\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "# binary_tensor = np.zeros(tensor.shape)\n", "# for i1 in range(tensor.shape[0]):\n", "# for i2 in range(tensor.shape[1]):\n", "# binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", "# * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Final MAPE: 0.10619\n", "Final RMSE: 4.30295\n", "\n", "Running time: 386 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 144 * 5\n", "time_lags = np.array([1, 2, 144])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 30\n", "lambda_w = 500\n", "lambda_x = 500\n", "lambda_theta = 500\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.2\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", "# random_tensor.shape[1] \n", "# * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", " * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Final MAPE: 0.106395\n", "Final RMSE: 4.29308\n", "\n", "Running time: 385 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 144 * 5\n", "time_lags = np.array([1, 2, 144])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 30\n", "lambda_w = 500\n", "lambda_x = 500\n", "lambda_theta = 500\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.4\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", "# random_tensor.shape[1] \n", "# * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", " * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Final MAPE: 0.107114\n", "Final RMSE: 4.32297\n", "\n", "Running time: 385 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 144 * 5\n", "time_lags = np.array([1, 2, 144])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 30\n", "lambda_w = 500\n", "lambda_x = 500\n", "lambda_theta = 500\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Experiment results** of spatial-temporal data prediction using TRMF:\n", "\n", "| scenario |rank|Lambda_w|Lambda_x|Lambda_theta|eta|maxiter| mape | rmse |\n", "|:----------|-----:|---------:|---------:|-------------:|----:|----------:|-----:|-----:|\n", "|**Original data**| 30 | 500 | 500 | 500 | 0.03 | 200 | **0.1065**| **4.30**|\n", "|**20%, RM**| 30 | 500 | 500 | 500 | 0.03 | 200 | **0.1062**| **4.31**|\n", "|**40%, RM**| 30 | 500 | 500 | 500 | 0.03 | 200 | **0.1062**| **4.30**|\n", "|**20%, NM**| 30 | 500 | 500 | 500 | 0.03 | 200 | **0.1064**| **4.29**|\n", "|**40%, NM**| 30 | 500 | 500 | 500 | 0.03 | 200 | **0.1071**| **4.32**|\n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.0\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", "# random_tensor.shape[1] \n", "# * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", " * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Final MAPE: 0.326294\n", "Final RMSE: 174.248\n", "\n", "Running time: 42 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 18 * 7\n", "time_lags = np.array([1, 2, 18])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 10\n", "lambda_w = 100\n", "lambda_x = 100\n", "lambda_theta = 100\n", "eta = 0.01\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.1\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "# binary_tensor = np.zeros(tensor.shape)\n", "# for i1 in range(tensor.shape[0]):\n", "# for i2 in range(tensor.shape[1]):\n", "# binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", "# * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Final MAPE: 0.326723\n", "Final RMSE: 171.685\n", "\n", "Running time: 37 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 18 * 7\n", "time_lags = np.array([1, 2, 18])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 10\n", "lambda_w = 100\n", "lambda_x = 100\n", "lambda_theta = 100\n", "eta = 0.01\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.3\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "# binary_tensor = np.zeros(tensor.shape)\n", "# for i1 in range(tensor.shape[0]):\n", "# for i2 in range(tensor.shape[1]):\n", "# binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", "# * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Final MAPE: 0.344155\n", "Final RMSE: 181.166\n", "\n", "Running time: 37 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 18 * 7\n", "time_lags = np.array([1, 2, 18])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 10\n", "lambda_w = 100\n", "lambda_x = 100\n", "lambda_theta = 100\n", "eta = 0.01\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.1\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", "# random_tensor.shape[1] \n", "# * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", " * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Final MAPE: 0.319528\n", "Final RMSE: 169.295\n", "\n", "Running time: 39 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 18 * 7\n", "time_lags = np.array([1, 2, 18])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 10\n", "lambda_w = 100\n", "lambda_x = 100\n", "lambda_theta = 100\n", "eta = 0.01\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.3\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", "# random_tensor.shape[1] \n", "# * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", " * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Final MAPE: 0.33093\n", "Final RMSE: 175.635\n", "\n", "Running time: 38 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 18 * 7\n", "time_lags = np.array([1, 2, 18])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 10\n", "lambda_w = 100\n", "lambda_x = 100\n", "lambda_theta = 100\n", "eta = 0.01\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Experiment results** of spatial-temporal data prediction using TRMF:\n", "\n", "| scenario |rank|Lambda_w|Lambda_x|Lambda_theta|eta|back step| mape | rmse |\n", "|:----------|-----:|---------:|---------:|-------------:|----:|----------:|-----:|-----:|\n", "|**Original data**| 10 | 100 | 100 | 100 | 0.01 | 200 | **0.3263**| **174.25**|\n", "|**20%, RM**| 10 | 100 | 100 | 100 | 0.01 | 200 | **0.3267**| **171.69**|\n", "|**40%, RM**| 10 | 100 | 100 | 100 | 0.01 | 200 | **0.3442**| **181.17**|\n", "|**20%, NM**| 10 | 100 | 100 | 100 | 0.01 | 200 | **0.3195**| **169.30**|\n", "|**40%, NM**| 10 | 100 | 100 | 100 | 0.01 | 200 | **0.3309**| **175.64**|\n" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.0\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", "# random_tensor.shape[1] \n", "# * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", " * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Final MAPE: 0.277726\n", "Final RMSE: 39.9873\n", "\n", "Running time: 65 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 108 * 5\n", "time_lags = np.array([1, 2, 108])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 10\n", "lambda_w = 1000\n", "lambda_x = 1000\n", "lambda_theta = 1000\n", "eta = 0.05\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.2\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "# binary_tensor = np.zeros(tensor.shape)\n", "# for i1 in range(tensor.shape[0]):\n", "# for i2 in range(tensor.shape[1]):\n", "# binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", "# * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Final MAPE: 0.275878\n", "Final RMSE: 40.7251\n", "\n", "Running time: 64 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 108 * 5\n", "time_lags = np.array([1, 2, 108])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 10\n", "lambda_w = 1000\n", "lambda_x = 1000\n", "lambda_theta = 1000\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.4\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", " random_tensor.shape[1] \n", " * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "# binary_tensor = np.zeros(tensor.shape)\n", "# for i1 in range(tensor.shape[0]):\n", "# for i2 in range(tensor.shape[1]):\n", "# binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", "# * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Final MAPE: 0.266774\n", "Final RMSE: 47.8046\n", "\n", "Running time: 64 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 108 * 5\n", "time_lags = np.array([1, 2, 108])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 10\n", "lambda_w = 1000\n", "lambda_x = 1000\n", "lambda_theta = 1000\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.2\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", "# random_tensor.shape[1] \n", "# * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", " * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Final MAPE: 0.265751\n", "Final RMSE: 45.2281\n", "\n", "Running time: 75 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 108 * 5\n", "time_lags = np.array([1, 2, 108])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 10\n", "lambda_w = 1000\n", "lambda_x = 1000\n", "lambda_theta = 1000\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "\n", "tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')\n", "tensor = tensor['tensor']\n", "random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')\n", "random_matrix = random_matrix['random_matrix']\n", "random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')\n", "random_tensor = random_tensor['random_tensor']\n", "\n", "dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])\n", "missing_rate = 0.4\n", "\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0], \n", "# random_tensor.shape[1] \n", "# * random_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros(tensor.shape)\n", "for i1 in range(tensor.shape[0]):\n", " for i2 in range(tensor.shape[1]):\n", " binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] \n", " * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Final MAPE: 0.287804\n", "Final RMSE: 41.0237\n", "\n", "Running time: 75 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 108 * 5\n", "time_lags = np.array([1, 2, 108])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 10\n", "lambda_w = 1000\n", "lambda_x = 1000\n", "lambda_theta = 1000\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Experiment results** of spatial-temporal data prediction using TRMF:\n", "\n", "| scenario |rank|Lambda_w|Lambda_x|Lambda_theta|eta|maxiter| mape | rmse |\n", "|:----------|-----:|---------:|---------:|-------------:|----:|--------:|-----:|-----:|\n", "|**Original data**| 10 | 1000 | 1000 | 1000 | 0.03 | 200 | **0.2777**| **39.99**|\n", "|**20%, RM**| 10 | 1000 | 1000 | 1000 | 0.03 | 200 | **0.2759**| **40.73**|\n", "|**40%, RM**| 10 | 1000 | 1000 | 1000 | 0.03 | 200 | **0.2668**| **47.80**|\n", "|**20%, NM**| 10 | 1000 | 1000 | 1000 | 0.03 | 200 | **0.2658**| **45.23**|\n", "|**40%, NM**| 10 | 1000 | 1000 | 1000 | 0.03 | 200 | **0.2878**| **41.02**|\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)\n", "RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)\n", "dense_mat = dense_mat.values\n", "RM_mat = RM_mat.values\n", "\n", "missing_rate = 0.2\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(RM_mat + 0.5 - missing_rate)\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Time step: 760\n", "Time step: 800\n", "Time step: 840\n", "Time step: 880\n", "Time step: 920\n", "Time step: 960\n", "Time step: 1000\n", "Time step: 1040\n", "Time step: 1080\n", "Time step: 1120\n", "Time step: 1160\n", "Time step: 1200\n", "Time step: 1240\n", "Time step: 1280\n", "Time step: 1320\n", "Time step: 1360\n", "Time step: 1400\n", "Time step: 1440\n", "Final MAPE: 0.0795357\n", "Final RMSE: 4.89963\n", "\n", "Running time: 324 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 288 * 5\n", "time_lags = np.array([1, 2, 288])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 30\n", "lambda_w = 500\n", "lambda_x = 500\n", "lambda_theta = 500\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)\n", "RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)\n", "dense_mat = dense_mat.values\n", "RM_mat = RM_mat.values\n", "\n", "missing_rate = 0.4\n", "# =============================================================================\n", "### Random missing (RM) scenario\n", "### Set the RM scenario by:\n", "binary_mat = np.round(RM_mat + 0.5 - missing_rate)\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Time step: 760\n", "Time step: 800\n", "Time step: 840\n", "Time step: 880\n", "Time step: 920\n", "Time step: 960\n", "Time step: 1000\n", "Time step: 1040\n", "Time step: 1080\n", "Time step: 1120\n", "Time step: 1160\n", "Time step: 1200\n", "Time step: 1240\n", "Time step: 1280\n", "Time step: 1320\n", "Time step: 1360\n", "Time step: 1400\n", "Time step: 1440\n", "Final MAPE: 0.0795249\n", "Final RMSE: 4.89829\n", "\n", "Running time: 298 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 288 * 5\n", "time_lags = np.array([1, 2, 288])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 30\n", "lambda_w = 500\n", "lambda_x = 500\n", "lambda_theta = 500\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)\n", "NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)\n", "dense_mat = dense_mat.values\n", "NM_mat = NM_mat.values\n", "\n", "missing_rate = 0.2\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))\n", "for i1 in range(binary_tensor.shape[0]):\n", " for i2 in range(binary_tensor.shape[1]):\n", " binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Time step: 760\n", "Time step: 800\n", "Time step: 840\n", "Time step: 880\n", "Time step: 920\n", "Time step: 960\n", "Time step: 1000\n", "Time step: 1040\n", "Time step: 1080\n", "Time step: 1120\n", "Time step: 1160\n", "Time step: 1200\n", "Time step: 1240\n", "Time step: 1280\n", "Time step: 1320\n", "Time step: 1360\n", "Time step: 1400\n", "Time step: 1440\n", "Final MAPE: 0.0794329\n", "Final RMSE: 4.89428\n", "\n", "Running time: 303 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 288 * 5\n", "time_lags = np.array([1, 2, 288])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 30\n", "lambda_w = 500\n", "lambda_x = 500\n", "lambda_theta = 500\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)\n", "NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)\n", "dense_mat = dense_mat.values\n", "NM_mat = NM_mat.values\n", "\n", "missing_rate = 0.4\n", "# =============================================================================\n", "### Non-random missing (NM) scenario\n", "### Set the NM scenario by:\n", "binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))\n", "for i1 in range(binary_tensor.shape[0]):\n", " for i2 in range(binary_tensor.shape[1]):\n", " binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)\n", "binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] * binary_tensor.shape[2]])\n", "# =============================================================================\n", "\n", "sparse_mat = np.multiply(dense_mat, binary_mat)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars\n", "/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Time step: 40\n", "Time step: 80\n", "Time step: 120\n", "Time step: 160\n", "Time step: 200\n", "Time step: 240\n", "Time step: 280\n", "Time step: 320\n", "Time step: 360\n", "Time step: 400\n", "Time step: 440\n", "Time step: 480\n", "Time step: 520\n", "Time step: 560\n", "Time step: 600\n", "Time step: 640\n", "Time step: 680\n", "Time step: 720\n", "Time step: 760\n", "Time step: 800\n", "Time step: 840\n", "Time step: 880\n", "Time step: 920\n", "Time step: 960\n", "Time step: 1000\n", "Time step: 1040\n", "Time step: 1080\n", "Time step: 1120\n", "Time step: 1160\n", "Time step: 1200\n", "Time step: 1240\n", "Time step: 1280\n", "Time step: 1320\n", "Time step: 1360\n", "Time step: 1400\n", "Time step: 1440\n", "Final MAPE: 0.0795817\n", "Final RMSE: 4.89878\n", "\n", "Running time: 310 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "pred_time_steps = 288 * 5\n", "time_lags = np.array([1, 2, 288])\n", "dim1, dim2 = sparse_mat.shape\n", "rank = 30\n", "lambda_w = 500\n", "lambda_x = 500\n", "lambda_theta = 500\n", "eta = 0.03\n", "d = time_lags.shape[0]\n", "\n", "maxiter = 200\n", "mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta, \n", " eta, rank, pred_time_steps, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Experiment results** of spatial-temporal data prediction using TRMF:\n", "\n", "| scenario |rank|Lambda_w|Lambda_x|Lambda_theta|eta|maxiter| mape | rmse |\n", "|:----------|-----:|---------:|---------:|-------------:|----:|----------:|-----:|-----:|\n", "|**Original data**| 30 | 500 | 500 | 500 | 0.03 | 200 | **0.0796** | **4.90**|\n", "|**20%, RM**| 30 | 500 | 500 | 500 | 0.03 | 200 | **0.0795** | **4.90**|\n", "|**40%, RM**| 30 | 500 | 500 | 500 | 0.03 | 200 | **0.0795** | **4.90**|\n", "|**20%, NM**| 30 | 500 | 500 | 500 | 0.03 | 200 | **0.0794** | **4.89**|\n", "|**40%, NM**| 30 | 500 | 500 | 500 | 0.03 | 200 | **0.0796** | **4.90**|\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }