{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# About This Notebook\n", "\n", "This notebook shows how to implement **Low-Rank Tensor Completion with Truncated Nuclear Norm minimization (LRTC-TNN)** on some real-world data sets. For an in-depth discussion of LRTC-TNN, please see our article [1].\n", "\n", "
\n", "\n", "[1] Xinyu Chen, Jinming Yang, Lijun Sun (2020). A Nonconvex Low-Rank Tensor Completion Model for Spatiotemporal Traffic Data Imputation. arXiv.2003.10271. [PDF] \n", "\n", "
\n", "\n", "\n", "## Quick Run\n", "\n", "This notebook is publicly available for any usage at our data imputation project. Please check out [**transdim - GitHub**](https://github.com/xinychen/transdim).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Low-Rank Tensor Completion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We start by importing the necessary dependencies." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from numpy.linalg import inv as inv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tensor Unfolding (`ten2mat`) and Matrix Folding (`mat2ten`)\n", "\n", "Using numpy reshape to perform 3rd rank tensor unfold operation. [[**link**](https://stackoverflow.com/questions/49970141/using-numpy-reshape-to-perform-3rd-rank-tensor-unfold-operation)]" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def ten2mat(tensor, mode):\n", " return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor size:\n", "(3, 2, 4)\n", "original tensor:\n", "[[[ 1 2 3 4]\n", " [ 3 4 5 6]]\n", "\n", " [[ 5 6 7 8]\n", " [ 7 8 9 10]]\n", "\n", " [[ 9 10 11 12]\n", " [11 12 13 14]]]\n", "\n", "(1) mode-1 tensor unfolding:\n", "[[ 1 3 2 4 3 5 4 6]\n", " [ 5 7 6 8 7 9 8 10]\n", " [ 9 11 10 12 11 13 12 14]]\n", "\n", "(2) mode-2 tensor unfolding:\n", "[[ 1 5 9 2 6 10 3 7 11 4 8 12]\n", " [ 3 7 11 4 8 12 5 9 13 6 10 14]]\n", "\n", "(3) mode-3 tensor unfolding:\n", "[[ 1 5 9 3 7 11]\n", " [ 2 6 10 4 8 12]\n", " [ 3 7 11 5 9 13]\n", " [ 4 8 12 6 10 14]]\n" ] } ], "source": [ "X = np.array([[[1, 2, 3, 4], [3, 4, 5, 6]], \n", " [[5, 6, 7, 8], [7, 8, 9, 10]], \n", " [[9, 10, 11, 12], [11, 12, 13, 14]]])\n", "print('tensor size:')\n", "print(X.shape)\n", "print('original tensor:')\n", "print(X)\n", "print()\n", "print('(1) mode-1 tensor unfolding:')\n", "print(ten2mat(X, 0))\n", "print()\n", "print('(2) mode-2 tensor unfolding:')\n", "print(ten2mat(X, 1))\n", "print()\n", "print('(3) mode-3 tensor unfolding:')\n", "print(ten2mat(X, 2))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def mat2ten(mat, tensor_size, mode):\n", " index = list()\n", " index.append(mode)\n", " for i in range(tensor_size.shape[0]):\n", " if i != mode:\n", " index.append(i)\n", " return np.moveaxis(np.reshape(mat, list(tensor_size[index]), order = 'F'), 0, mode)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Singular Value Thresholding (SVT) for TNN" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def svt_tnn(mat, alpha, rho, theta):\n", " \"\"\"This is a Numpy dependent singular value thresholding (SVT) process.\"\"\"\n", " u, s, v = np.linalg.svd(mat, full_matrices = 0)\n", " vec = s.copy()\n", " vec[theta :] = s[theta :] - alpha / rho\n", " vec[vec < 0] = 0\n", " return u @ np.diag(vec) @ v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Understanding these codes**:\n", "\n", "- **`line 1`**: Necessary inputs including any input matrix $\\boldsymbol{X}$, weight of Truncated Nuclear Norm (TNN) regularization $\\alpha$, learning rate $\\rho$, and positive integer number $\\theta$ for nuclear norm truncation.\n", "\n", "- **`line 2`**: Compute the Singular Value Decomposition (SVD) for any matrix $\\boldsymbol{X}$ with `numpy.linalg.svd` (i.e., SVD function in `Numpy`'s linear algebra package).\n", "\n", "- **`line 3-5`**: Truncate singular values $\\sigma_{\\theta+1},...$ with the following rule:\n", "\n", "\\begin{equation}\n", "\\sigma_{i}=\\left[\\sigma_{i}(\\boldsymbol{X})-\\frac{\\alpha}{\\rho}\\right]_{+}.\n", "\\end{equation}\n", "\n", "- **`line 6`**: Return the resulted matrix." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Potential alternative for this**:\n", "\n", "This is a competitively efficient algorithm for implementing SVT-TNN." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def svt_tnn(mat, alpha, rho, theta):\n", " tau = alpha / rho\n", " [m, n] = mat.shape\n", " if 2 * m < n:\n", " u, s, v = np.linalg.svd(mat @ mat.T, full_matrices = 0)\n", " s = np.sqrt(s)\n", " idx = np.sum(s > tau)\n", " mid = np.zeros(idx)\n", " mid[:theta] = 1\n", " mid[theta:idx] = (s[theta:idx] - tau) / s[theta:idx]\n", " return (u[:, :idx] @ np.diag(mid)) @ (u[:, :idx].T @ mat)\n", " elif m > 2 * n:\n", " return svt_tnn(mat.T, alpha, rho, theta).T\n", " u, s, v = np.linalg.svd(mat, full_matrices = 0)\n", " idx = np.sum(s > tau)\n", " vec = s[:idx].copy()\n", " vec[theta:idx] = s[theta:idx] - tau\n", " return u[:, :idx] @ np.diag(vec) @ v[:idx, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "
\n", "\n", "> Note that $$\\mathrm{MAPE}=\\frac{1}{n} \\sum_{i=1}^{n} \\frac{\\left|y_{i}-\\hat{y}_{i}\\right|}{y_{i}} \\times 100, \\quad\\mathrm{RMSE}=\\sqrt{\\frac{1}{n} \\sum_{i=1}^{n}\\left(y_{i}-\\hat{y}_{i}\\right)^{2}},$$ where $n$ is the total number of estimated values, and $y_i$ and $\\hat{y}_i$ are the actual value and its estimation, respectively." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def compute_rmse(var, var_hat):\n", " return np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def compute_mape(var, var_hat):\n", " return np.sum(np.abs(var - var_hat) / var) / var.shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define LRTC-TNN Function with `Numpy`" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter):\n", " \"\"\"Low-Rank Tenor Completion with Truncated Nuclear Norm, LRTC-TNN.\"\"\"\n", " \n", " dim = np.array(sparse_tensor.shape)\n", " pos_missing = np.where(sparse_tensor == 0)\n", " pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))\n", " \n", " X = np.zeros(np.insert(dim, 0, len(dim))) # \\boldsymbol{\\mathcal{X}}\n", " T = np.zeros(np.insert(dim, 0, len(dim))) # \\boldsymbol{\\mathcal{T}}\n", " Z = sparse_tensor.copy()\n", " last_tensor = sparse_tensor.copy()\n", " snorm = np.sqrt(np.sum(sparse_tensor ** 2))\n", " it = 0\n", " while True:\n", " rho = min(rho * 1.05, 1e5)\n", " for k in range(len(dim)):\n", " X[k] = mat2ten(svt_tnn(ten2mat(Z - T[k] / rho, k), alpha[k], rho, int(np.ceil(theta * dim[k]))), dim, k)\n", " Z[pos_missing] = np.mean(X + T / rho, axis = 0)[pos_missing]\n", " T = T + rho * (X - np.broadcast_to(Z, np.insert(dim, 0, len(dim))))\n", " tensor_hat = np.einsum('k, kmnt -> mnt', alpha, X)\n", " tol = np.sqrt(np.sum((tensor_hat - last_tensor) ** 2)) / snorm\n", " last_tensor = tensor_hat.copy()\n", " it += 1\n", " if (it + 1) % 50 == 0:\n", " print('Iter: {}'.format(it + 1))\n", " print('RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], tensor_hat[pos_test])))\n", " print()\n", " if (tol < epsilon) or (it >= maxiter):\n", " break\n", "\n", " print('Imputation MAPE: {:.6}'.format(compute_mape(dense_tensor[pos_test], tensor_hat[pos_test])))\n", " print('Imputation RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], tensor_hat[pos_test])))\n", " print()\n", " \n", " return tensor_hat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Understanding these codes**:\n", "\n", "- **`line 18-19`**: Update $\\boldsymbol{\\mathcal{Z}}_{k}^{l+1},k=1,2,3$.\n", "\n", "- **`line 20-22`**: Update $\\boldsymbol{\\mathcal{X}}_{k}^{l+1}$ by\n", "\n", "\\begin{equation}\n", "\\boldsymbol{\\mathcal{X}}_{k}^{l+1}=\\mathcal{P}_{\\Omega}(\\boldsymbol{\\mathcal{Y}})+\\mathcal{P}_{\\Omega}^{\\perp}\\left(\\boldsymbol{\\mathcal{Z}}_{k}^{l+1}-\\frac{1}{\\rho}\\boldsymbol{\\mathcal{T}}_{k}^{l}\\right),k=1,2,3.\n", "\\end{equation}\n", "\n", "- **`line 23`**: Update $\\boldsymbol{\\mathcal{T}}_{k}^{l+1}$ by\n", "\n", "\\begin{equation}\n", "\\boldsymbol{\\mathcal{T}}_{k}^{l+1}=\\boldsymbol{\\mathcal{T}}_{k}^{l}+\\rho_k\\left(\\boldsymbol{\\mathcal{X}}_{k}^{l+1}-\\boldsymbol{\\mathcal{Z}}_{k}^{l+1}\\right).\n", "\\end{equation}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Guangzhou urban traffic speed data set" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Missing rate = 0.3\n", "Iter: 50\n", "RMSE: 5.51797\n", "\n", "Iter: 100\n", "RMSE: 3.00147\n", "\n", "Imputation MAPE: 0.070093\n", "Imputation RMSE: 2.99615\n", "\n", "Running time: 115 seconds\n", "\n", "Missing rate = 0.7\n", "Iter: 50\n", "RMSE: 5.32424\n", "\n", "Iter: 100\n", "RMSE: 3.58887\n", "\n", "Imputation MAPE: 0.0836949\n", "Imputation RMSE: 3.57911\n", "\n", "Running time: 151 seconds\n", "\n", "Missing rate = 0.9\n", "Iter: 50\n", "RMSE: 4.05595\n", "\n", "Iter: 100\n", "RMSE: 4.05\n", "\n", "Imputation MAPE: 0.0950771\n", "Imputation RMSE: 4.05026\n", "\n", "Running time: 167 seconds\n", "\n" ] } ], "source": [ "import numpy as np\n", "import time\n", "import scipy.io\n", "\n", "## 30% RM\n", "r = 0.3\n", "print('Missing rate = {}'.format(r))\n", "missing_rate = r\n", "\n", "## Random Missing (RM)\n", "dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)\n", "dim1, dim2, dim3 = dense_tensor.shape\n", "np.random.seed(1000)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)\n", "\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "theta = 0.25\n", "epsilon = 1e-4\n", "maxiter = 100\n", "LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))\n", "print()\n", "\n", "## 70% RM\n", "r = 0.7\n", "print('Missing rate = {}'.format(r))\n", "missing_rate = r\n", "\n", "## Random Missing (RM)\n", "dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)\n", "dim1, dim2, dim3 = dense_tensor.shape\n", "np.random.seed(1000)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)\n", "\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "theta = 0.2\n", "epsilon = 1e-4\n", "maxiter = 100\n", "LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))\n", "print()\n", "\n", "## 90% RM\n", "r = 0.9\n", "print('Missing rate = {}'.format(r))\n", "missing_rate = r\n", "\n", "## Random Missing (RM)\n", "dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)\n", "dim1, dim2, dim3 = dense_tensor.shape\n", "np.random.seed(1000)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)\n", "\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-4\n", "theta = 0.1\n", "epsilon = 1e-4\n", "maxiter = 100\n", "LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))\n", "print()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Missing rate = 0.3\n", "Iter: 50\n", "RMSE: 5.06925\n", "\n", "Iter: 100\n", "RMSE: 4.08494\n", "\n", "Imputation MAPE: 0.0965249\n", "Imputation RMSE: 4.08516\n", "\n", "Running time: 152 seconds\n", "\n", "Missing rate = 0.7\n", "Iter: 50\n", "RMSE: 5.14929\n", "\n", "Iter: 100\n", "RMSE: 4.30086\n", "\n", "Imputation MAPE: 0.101477\n", "Imputation RMSE: 4.30102\n", "\n", "Running time: 123 seconds\n", "\n" ] } ], "source": [ "import numpy as np\n", "import time\n", "import scipy.io\n", "\n", "for r in [0.3, 0.7]:\n", " print('Missing rate = {}'.format(r))\n", " missing_rate = r\n", "\n", " ## Non-random Missing (NM)\n", " dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)\n", " dim1, dim2, dim3 = dense_tensor.shape\n", " np.random.seed(1000)\n", " sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]\n", "\n", " start = time.time()\n", " alpha = np.ones(3) / 3\n", " rho = 1e-5\n", " theta = 0.05\n", " epsilon = 1e-4\n", " maxiter = 100\n", " LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", " end = time.time()\n", " print('Running time: %d seconds'%(end - start))\n", " print()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "RMSE: 5.27782\n", "\n", "Iter: 100\n", "RMSE: 3.97509\n", "\n", "Imputation MAPE: 0.094045\n", "Imputation RMSE: 3.96451\n", "\n", "Running time: 147 seconds\n", "\n" ] } ], "source": [ "import numpy as np\n", "import time\n", "import scipy.io\n", "np.random.seed(1000)\n", "\n", "missing_rate = 0.3\n", "\n", "## Block-out Missing (BM)\n", "dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)\n", "dim1, dim2, dim3 = dense_tensor.shape\n", "\n", "dim_time = dim2 * dim3\n", "block_window = 6\n", "vec = np.random.rand(int(dim_time / block_window))\n", "temp = np.array([vec] * block_window)\n", "vec = temp.reshape([dim2 * dim3], order = 'F')\n", "\n", "sparse_tensor = mat2ten(ten2mat(dense_tensor, 0) * np.round(vec + 0.5 - missing_rate)[None, :], np.array([dim1, dim2, dim3]), 0)\n", "\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "theta = 0.10\n", "epsilon = 1e-4\n", "maxiter = 100\n", "LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))\n", "print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hangzhou metro passenger flow data set" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Missing rate = 0.3\n", "Iter: 50\n", "RMSE: 25.2411\n", "\n", "Iter: 100\n", "RMSE: 24.944\n", "\n", "Imputation MAPE: 0.186277\n", "Imputation RMSE: 24.9491\n", "\n", "Running time: 15 seconds\n", "\n", "Missing rate = 0.7\n", "Iter: 50\n", "RMSE: 29.1678\n", "\n", "Iter: 100\n", "RMSE: 29.5341\n", "\n", "Imputation MAPE: 0.201632\n", "Imputation RMSE: 29.5459\n", "\n", "Running time: 19 seconds\n", "\n", "Missing rate = 0.9\n", "Iter: 50\n", "RMSE: 37.7603\n", "\n", "Iter: 100\n", "RMSE: 38.0407\n", "\n", "Imputation MAPE: 0.229517\n", "Imputation RMSE: 38.0515\n", "\n", "Running time: 23 seconds\n", "\n" ] } ], "source": [ "import numpy as np\n", "import time\n", "import scipy.io\n", "\n", "for r in [0.3, 0.7, 0.9]:\n", " print('Missing rate = {}'.format(r))\n", " missing_rate = r\n", "\n", " ## Random Missing (RM)\n", " dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)\n", " dim1, dim2, dim3 = dense_tensor.shape\n", " np.random.seed(1000)\n", " sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)\n", "\n", " start = time.time()\n", " alpha = np.ones(3) / 3\n", " rho = 1e-5\n", " theta = 0.10\n", " epsilon = 1e-4\n", " maxiter = 100\n", " LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", " end = time.time()\n", " print('Running time: %d seconds'%(end - start))\n", " print()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Missing rate = 0.3\n", "Iter: 50\n", "RMSE: 49.406\n", "\n", "Iter: 100\n", "RMSE: 47.6061\n", "\n", "Imputation MAPE: 0.193862\n", "Imputation RMSE: 47.5992\n", "\n", "Running time: 21 seconds\n", "\n", "Missing rate = 0.7\n", "Iter: 50\n", "RMSE: 42.8749\n", "\n", "Iter: 100\n", "RMSE: 41.8305\n", "\n", "Imputation MAPE: 0.226381\n", "Imputation RMSE: 41.8327\n", "\n", "Running time: 14 seconds\n", "\n" ] } ], "source": [ "import numpy as np\n", "import time\n", "import scipy.io\n", "\n", "for r in [0.3, 0.7]:\n", " print('Missing rate = {}'.format(r))\n", " missing_rate = r\n", "\n", " ## Non-random Missing (NM)\n", " dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)\n", " dim1, dim2, dim3 = dense_tensor.shape\n", " np.random.seed(1000)\n", " sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]\n", "\n", " start = time.time()\n", " alpha = np.ones(3) / 3\n", " rho = 1e-5\n", " theta = 0.10\n", " epsilon = 1e-4\n", " maxiter = 100\n", " LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", " end = time.time()\n", " print('Running time: %d seconds'%(end - start))\n", " print()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "RMSE: 28.8777\n", "\n", "Iter: 100\n", "RMSE: 29.2744\n", "\n", "Imputation MAPE: 0.21374\n", "Imputation RMSE: 29.2814\n", "\n", "Running time: 20 seconds\n", "\n" ] } ], "source": [ "import numpy as np\n", "import time\n", "import scipy.io\n", "np.random.seed(1000)\n", "\n", "missing_rate = 0.3\n", "\n", "## Block-out Missing (BM)\n", "dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)\n", "dim1, dim2, dim3 = dense_tensor.shape\n", "\n", "dim_time = dim2 * dim3\n", "block_window = 6\n", "vec = np.random.rand(int(dim_time / block_window))\n", "temp = np.array([vec] * block_window)\n", "vec = temp.reshape([dim2 * dim3], order = 'F')\n", "\n", "sparse_tensor = mat2ten(ten2mat(dense_tensor, 0) * np.round(vec + 0.5 - missing_rate)[None, :], np.array([dim1, dim2, dim3]), 0)\n", "\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "theta = 0.10\n", "epsilon = 1e-4\n", "maxiter = 100\n", "LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))\n", "print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Seattle freeway traffic speed data set" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Missing rate = 0.3\n", "Iter: 50\n", "RMSE: 5.86304\n", "\n", "Iter: 100\n", "RMSE: 3.1126\n", "\n", "Imputation MAPE: 0.0480528\n", "Imputation RMSE: 3.10965\n", "\n", "Running time: 263 seconds\n", "\n", "Missing rate = 0.7\n", "Iter: 50\n", "RMSE: 5.89289\n", "\n", "Iter: 100\n", "RMSE: 3.79913\n", "\n", "Imputation MAPE: 0.0613629\n", "Imputation RMSE: 3.7982\n", "\n", "Running time: 228 seconds\n", "\n", "Missing rate = 0.9\n", "Iter: 50\n", "RMSE: 4.86097\n", "\n", "Iter: 100\n", "RMSE: 4.81283\n", "\n", "Imputation MAPE: 0.081929\n", "Imputation RMSE: 4.81347\n", "\n", "Running time: 260 seconds\n", "\n" ] } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "import time\n", "import scipy.io\n", "\n", "for r in [0.3, 0.7, 0.9]:\n", " print('Missing rate = {}'.format(r))\n", " missing_rate = r\n", "\n", " ## Random missing (RM)\n", " dense_tensor = np.load('../datasets/Seattle-data-set/tensor.npz')['arr_0'].transpose(0, 2, 1)\n", " dim1, dim2, dim3 = dense_tensor.shape\n", " np.random.seed(1000)\n", " sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)\n", "\n", " start = time.time()\n", " alpha = np.ones(3) / 3\n", " rho = 1e-5\n", " theta = 0.30\n", " if r > 0.8:\n", " rho = 5e-5\n", " theta = 0.10\n", " epsilon = 1e-4\n", " maxiter = 100\n", " LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", " end = time.time()\n", " print('Running time: %d seconds'%(end - start))\n", " print()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Missing rate = 0.3\n", "Iter: 50\n", "RMSE: 5.10438\n", "\n", "Iter: 100\n", "RMSE: 4.43494\n", "\n", "Imputation MAPE: 0.074171\n", "Imputation RMSE: 4.43533\n", "\n", "Running time: 249 seconds\n", "\n", "Missing rate = 0.7\n", "Iter: 50\n", "RMSE: 6.14372\n", "\n", "Iter: 100\n", "RMSE: 5.4049\n", "\n", "Imputation MAPE: 0.0928467\n", "Imputation RMSE: 5.40446\n", "\n", "Running time: 239 seconds\n", "\n" ] } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "import time\n", "import scipy.io\n", "\n", "for r in [0.3, 0.7]:\n", " print('Missing rate = {}'.format(r))\n", " missing_rate = r\n", "\n", " ## Non-random Missing (NM)\n", " dense_tensor = np.load('../datasets/Seattle-data-set/tensor.npz')['arr_0'].transpose(0, 2, 1)\n", " dim1, dim2, dim3 = dense_tensor.shape\n", " np.random.seed(1000)\n", " sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]\n", "\n", " start = time.time()\n", " alpha = np.ones(3) / 3\n", " rho = 1e-5\n", " theta = 0.05\n", " epsilon = 1e-4\n", " maxiter = 100\n", " LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", " end = time.time()\n", " print('Running time: %d seconds'%(end - start))\n", " print()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "RMSE: 6.60377\n", "\n", "Iter: 100\n", "RMSE: 5.69047\n", "\n", "Imputation MAPE: 0.0981021\n", "Imputation RMSE: 5.69791\n", "\n", "Running time: 220 seconds\n", "\n" ] } ], "source": [ "import numpy as np\n", "import scipy.io\n", "np.random.seed(1000)\n", "\n", "missing_rate = 0.3\n", "\n", "## Block-out Missing (BM)\n", "dense_tensor = np.load('../datasets/Seattle-data-set/tensor.npz')['arr_0'].transpose(0, 2, 1)\n", "dim1, dim2, dim3 = dense_tensor.shape\n", "block_window = 12\n", "vec = np.random.rand(int(dim2 * dim3 / block_window))\n", "temp = np.array([vec] * block_window)\n", "vec = temp.reshape([dim2 * dim3], order = 'F')\n", "sparse_tensor = mat2ten(dense_mat * np.round(vec + 0.5 - missing_rate)[None, :], np.array([dim1, dim2, dim3]), 0)\n", "\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "theta = 0.30\n", "epsilon = 1e-4\n", "maxiter = 100\n", "LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))\n", "print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Portland highway traffic volume data set" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Missing rate = 0.3\n", "Iter: 50\n", "RMSE: 16.6038\n", "\n", "Iter: 100\n", "RMSE: 15.6579\n", "\n", "Imputation MAPE: 0.172064\n", "Imputation RMSE: 15.6594\n", "\n", "Running time: 693 seconds\n", "\n", "Missing rate = 0.7\n", "Iter: 50\n", "RMSE: 19.5941\n", "\n", "Iter: 100\n", "RMSE: 19.2446\n", "\n", "Imputation MAPE: 0.204501\n", "Imputation RMSE: 19.2494\n", "\n", "Running time: 698 seconds\n", "\n", "Missing rate = 0.9\n", "Iter: 50\n", "RMSE: 23.7422\n", "\n", "Iter: 100\n", "RMSE: 24.1852\n", "\n", "Imputation MAPE: 0.244044\n", "Imputation RMSE: 24.1911\n", "\n", "Running time: 646 seconds\n", "\n" ] } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "import time\n", "import scipy.io\n", "\n", "for r in [0.3, 0.7, 0.9]:\n", " print('Missing rate = {}'.format(r))\n", " missing_rate = r\n", "\n", " # Random Missing (RM)\n", " dense_mat = np.load('../datasets/Portland-data-set/volume.npy')\n", " dim1, dim2 = dense_mat.shape\n", " dim = np.array([dim1, 96, 31])\n", " dense_tensor = mat2ten(dense_mat, dim, 0)\n", " np.random.seed(1000)\n", " sparse_tensor = mat2ten(dense_mat * np.round(np.random.rand(dim1, dim2) + 0.5 - missing_rate), dim, 0)\n", "\n", " start = time.time()\n", " alpha = np.ones(3) / 3\n", " rho = 1e-5\n", " theta = 0.10\n", " epsilon = 1e-4\n", " maxiter = 100\n", " LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", " end = time.time()\n", " print('Running time: %d seconds'%(end - start))\n", " print()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Missing rate = 0.3\n", "Iter: 50\n", "RMSE: 19.2963\n", "\n", "Iter: 100\n", "RMSE: 18.5759\n", "\n", "Imputation MAPE: 0.193184\n", "Imputation RMSE: 18.5804\n", "\n", "Running time: 722 seconds\n", "\n", "Missing rate = 0.7\n", "Iter: 50\n", "RMSE: 38.5421\n", "\n", "Iter: 100\n", "RMSE: 38.2218\n", "\n", "Imputation MAPE: 0.269787\n", "Imputation RMSE: 38.2252\n", "\n", "Running time: 656 seconds\n", "\n" ] } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "import time\n", "import scipy.io\n", "\n", "for r in [0.3, 0.7]:\n", " print('Missing rate = {}'.format(r))\n", " missing_rate = r\n", "\n", " # Non-random Missing (NM)\n", " dense_mat = np.load('../datasets/Portland-data-set/volume.npy')\n", " dim1, dim2 = dense_mat.shape\n", " dim = np.array([dim1, 96, 31])\n", " dense_tensor = mat2ten(dense_mat, dim, 0)\n", " np.random.seed(1000)\n", " sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim[2]) + 0.5 - missing_rate)[:, None, :]\n", "\n", " start = time.time()\n", " alpha = np.ones(3) / 3\n", " rho = 1e-5\n", " theta = 0.10\n", " epsilon = 1e-4\n", " maxiter = 100\n", " LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", " end = time.time()\n", " print('Running time: %d seconds'%(end - start))\n", " print()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "RMSE: 23.3329\n", "\n", "Iter: 100\n", "RMSE: 23.0543\n", "\n", "Imputation MAPE: 0.230816\n", "Imputation RMSE: 23.0534\n", "\n", "Running time: 697 seconds\n", "\n" ] } ], "source": [ "import numpy as np\n", "import scipy.io\n", "np.random.seed(1000)\n", "\n", "missing_rate = 0.3\n", "\n", "## Block-out Missing (BM)\n", "dense_mat = np.load('../datasets/Portland-data-set/volume.npy')\n", "dim1, dim2 = dense_mat.shape\n", "dim = np.array([dim1, 96, 31])\n", "dense_tensor = mat2ten(dense_mat, dim, 0)\n", "block_window = 4\n", "vec = np.random.rand(int(dim2 / block_window))\n", "temp = np.array([vec] * block_window)\n", "vec = temp.reshape([dim2], order = 'F')\n", "sparse_tensor = mat2ten(dense_mat * np.round(vec + 0.5 - missing_rate)[None, :], dim, 0)\n", "\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "theta = 0.05\n", "epsilon = 1e-4\n", "maxiter = 100\n", "LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))\n", "print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### License\n", "\n", "
\n", "This work is released under the MIT license.\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" }, "nbTranslate": { "displayLangs": [ "*" ], "hotkey": "alt-t", "langInMainMenu": true, "sourceLang": "en", "targetLang": "fr", "useGoogleTranslate": true } }, "nbformat": 4, "nbformat_minor": 2 }