{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Bayesian Probabilistic Matrix Factorization\n", "\n", "**Published**: December 31, 2020\n", "\n", "**Author**: Xinyu Chen [[**GitHub homepage**](https://github.com/xinychen)]\n", "\n", "**Download**: This Jupyter notebook is at our GitHub repository. If you want to evaluate the code, please download the notebook from the [**transdim**](https://github.com/xinychen/transdim/blob/master/imputer/HaLRTC.ipynb) repository.\n", "\n", "This notebook shows how to implement the High-accuracy Low-Rank Tensor Completion (HaLRTC) on some real-world data sets. For an in-depth discussion of HaLRTC, please see [1].\n", "\n", "
\n", "\n", "[1] Ji Liu, Przemyslaw Musialski, Peter Wonka, Jieping Ye (2013). Tensor completion for estimating missing values in visual data IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1): 208-220. [PDF]\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def ten2mat(tensor, mode):\n", " return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')\n", "\n", "def mat2ten(mat, dim, mode):\n", " index = list()\n", " index.append(mode)\n", " for i in range(dim.shape[0]):\n", " if i != mode:\n", " index.append(i)\n", " return np.moveaxis(np.reshape(mat, list(dim[index]), order = 'F'), 0, mode)\n", "\n", "def svt(mat, tau):\n", " u, s, v = np.linalg.svd(mat, full_matrices = False)\n", " vec = s - tau\n", " vec[vec < 0] = 0\n", " return np.matmul(np.matmul(u, np.diag(vec)), v)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def compute_mape(var, var_hat):\n", " return np.sum(np.abs(var - var_hat) / var) / var.shape[0]\n", "\n", "def compute_rmse(var, var_hat):\n", " return np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def HaLRTC_imputer(dense_tensor, sparse_tensor, alpha: list, rho: float, epsilon: float, maxiter: int):\n", " dim = np.array(sparse_tensor.shape)\n", " if np.isnan(sparse_tensor).any() == False:\n", " pos_miss = np.where(sparse_tensor == 0)\n", " pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))\n", " elif np.isnan(sparse_tensor).any() == True:\n", " pos_test = np.where((dense_tensor != 0) & (np.isnan(sparse_tensor)))\n", " sparse_tensor[np.isnan(sparse_tensor)] = 0\n", " pos_miss = np.where(sparse_tensor == 0)\n", " dense_test = dense_tensor[pos_test]\n", " del dense_tensor\n", " tensor_hat = sparse_tensor.copy()\n", " B = [np.zeros(sparse_tensor.shape) for _ in range(len(dim))]\n", " Y = [np.zeros(sparse_tensor.shape) for _ in range(len(dim))]\n", " last_ten = sparse_tensor.copy()\n", " snorm = np.linalg.norm(sparse_tensor)\n", " \n", " it = 0\n", " while True:\n", " rho = min(rho * 1.05, 1e5)\n", " for k in range(len(dim)):\n", " B[k] = mat2ten(svt(ten2mat(tensor_hat + Y[k] / rho, k), alpha[k] / rho), dim, k)\n", " tensor_hat[pos_miss] = ((sum(B) - sum(Y) / rho) / 3)[pos_miss]\n", " for k in range(len(dim)):\n", " Y[k] = Y[k] - rho * (B[k] - tensor_hat)\n", " tol = np.linalg.norm((tensor_hat - last_ten)) / snorm\n", " last_ten = tensor_hat.copy()\n", " it += 1\n", " if it % 50 == 0:\n", " print('Iter: {}'.format(it))\n", " print('Tolerance: {:.6}'.format(tol))\n", " print('MAPE: {:.6}'.format(compute_mape(dense_test, tensor_hat[pos_test])))\n", " print('RMSE: {:.6}'.format(compute_rmse(dense_test, tensor_hat[pos_test])))\n", " print()\n", " if (tol < epsilon) or (it >= maxiter):\n", " break\n", " \n", " print('Total iteration: {}'.format(it))\n", " print('Tolerance: {:.6}'.format(tol))\n", " print('MAPE: {:.6}'.format(compute_mape(dense_test, tensor_hat[pos_test])))\n", " print('RMSE: {:.6}'.format(compute_rmse(dense_test, tensor_hat[pos_test])))\n", " print()\n", " \n", " return tensor_hat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluation on Guangzhou Speed Data\n", "\n", "**Scenario setting**:\n", "\n", "- Tensor size: $214\\times 61\\times 144$ (road segment, day, time of day)\n", "- Random missing (RM)\n", "- 40% missing rate\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']\n", "dim = dense_tensor.shape\n", "missing_rate = 0.4 # Random missing (RM)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-5}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.0016919\n", "MAPE: 0.0880788\n", "RMSE: 3.59247\n", "\n", "Total iteration: 59\n", "Tolerance: 5.54723e-05\n", "MAPE: 0.0886173\n", "RMSE: 3.61067\n", "\n", "Running time: 32 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scenario setting**:\n", "\n", "- Tensor size: $214\\times 61\\times 144$ (road segment, day, time of day)\n", "- Random missing (RM)\n", "- 60% missing rate\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']\n", "dim = dense_tensor.shape\n", "missing_rate = 0.6 # Random missing (RM)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-4}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total iteration: 30\n", "Tolerance: 9.40289e-05\n", "MAPE: 0.0982231\n", "RMSE: 3.9581\n", "\n", "Running time: 16 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-4\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scenario setting**:\n", "\n", "- Tensor size: $214\\times 61\\times 144$ (road segment, day, time of day)\n", "- Non-random missing (NM)\n", "- 40% missing rate\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']\n", "dim = dense_tensor.shape\n", "missing_rate = 0.4 # Non-random missing (NM)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1])[:, :, np.newaxis] + 0.5 - missing_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-5}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.00159878\n", "MAPE: 0.108402\n", "RMSE: 4.3662\n", "\n", "Total iteration: 61\n", "Tolerance: 9.28742e-05\n", "MAPE: 0.108782\n", "RMSE: 4.37534\n", "\n", "Running time: 34 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluation on Hangzhou Flow Data\n", "\n", "**Scenario setting**:\n", "\n", "- Tensor size: $80\\times 25\\times 108$ (metro station, day, time of day)\n", "- Random missing (RM)\n", "- 40% missing rate\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']\n", "dim = dense_tensor.shape\n", "missing_rate = 0.4 # Random missing (RM)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-5}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.000118962\n", "MAPE: 0.190242\n", "RMSE: 31.8082\n", "\n", "Total iteration: 54\n", "Tolerance: 8.03007e-05\n", "MAPE: 0.190249\n", "RMSE: 31.8102\n", "\n", "Running time: 2 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scenario setting**:\n", "\n", "- Tensor size: $80\\times 25\\times 108$ (metro station, day, time of day)\n", "- Random missing (RM)\n", "- 60% missing rate\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']\n", "dim = dense_tensor.shape\n", "missing_rate = 0.6 # Random missing (RM)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-5}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.000112243\n", "MAPE: 0.20047\n", "RMSE: 36.1853\n", "\n", "Total iteration: 58\n", "Tolerance: 7.86866e-05\n", "MAPE: 0.200866\n", "RMSE: 36.1915\n", "\n", "Running time: 2 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scenario setting**:\n", "\n", "- Tensor size: $80\\times 25\\times 108$ (metro station, day, time of day)\n", "- Non-random missing (NM)\n", "- 40% missing rate\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']\n", "dim = dense_tensor.shape\n", "missing_rate = 0.4 # Non-random missing (NM)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1])[:, :, np.newaxis] + 0.5 - missing_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-5}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total iteration: 46\n", "Tolerance: 7.6348e-05\n", "MAPE: 0.214628\n", "RMSE: 53.1454\n", "\n", "Running time: 2 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluation on Seattle Speed Data\n", "\n", "**Scenario setting**:\n", "\n", "- Tensor size: $323\\times 28\\times 288$ (road segment, day, time of day)\n", "- Random missing (RM)\n", "- 40% missing rate\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = scipy.io.loadmat('../datasets/Seattle-data-set/tensor.npz')['arr_0']\n", "dim = dense_tensor.shape\n", "missing_rate = 0.4 # Random missing (RM)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-5}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.00163238\n", "MAPE: 0.0657551\n", "RMSE: 3.75497\n", "\n", "Total iteration: 61\n", "Tolerance: 7.7105e-05\n", "MAPE: 0.0675857\n", "RMSE: 3.83403\n", "\n", "Running time: 46 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scenario setting**:\n", "\n", "- Tensor size: $323\\times 28\\times 288$ (road segment, day, time of day)\n", "- Random missing (RM)\n", "- 60% missing rate\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = scipy.io.loadmat('../datasets/Seattle-data-set/tensor.npz')['arr_0']\n", "dim = dense_tensor.shape\n", "missing_rate = 0.6 # Random missing (RM)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-5}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.00247575\n", "MAPE: 0.0767219\n", "RMSE: 4.23934\n", "\n", "Total iteration: 64\n", "Tolerance: 7.80975e-05\n", "MAPE: 0.079017\n", "RMSE: 4.33587\n", "\n", "Running time: 46 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scenario setting**:\n", "\n", "- Tensor size: $323\\times 28\\times 288$ (road segment, day, time of day)\n", "- Non-random missing (NM)\n", "- 40% missing rate\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = scipy.io.loadmat('../datasets/Seattle-data-set/tensor.npz')['arr_0']\n", "dim = dense_tensor.shape\n", "missing_rate = 0.4 # Non-random missing (NM)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1])[:, :, np.newaxis] + 0.5 - missing_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-5}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.00120819\n", "MAPE: 0.100643\n", "RMSE: 5.22702\n", "\n", "Total iteration: 64\n", "Tolerance: 9.95834e-05\n", "MAPE: 0.101906\n", "RMSE: 5.27444\n", "\n", "Running time: 45 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluation on London Movement Speed Data\n", "\n", "**Scenario setting**:\n", "\n", "- Tensor size: $35912\\times 30\\times 24$ (road segment, day, time of day)\n", "- Random missing (RM)\n", "- 40% missing rate\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "np.random.seed(1000)\n", "\n", "missing_rate = 0.4\n", "\n", "dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')\n", "binary_mat = dense_mat.copy()\n", "binary_mat[binary_mat != 0] = 1\n", "pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])\n", "dense_mat = dense_mat[pos[0], :]\n", "\n", "## Random missing (RM)\n", "random_mat = np.random.rand(dense_mat.shape[0], dense_mat.shape[1])\n", "binary_mat = np.round(random_mat + 0.5 - missing_rate)\n", "sparse_mat = np.multiply(dense_mat, binary_mat)\n", "\n", "dense_tensor = dense_mat.reshape([dense_mat.shape[0], 30, 24])\n", "sparse_tensor = sparse_mat.reshape([sparse_mat.shape[0], 30, 24])\n", "del dense_mat, sparse_mat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-5}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.00111649\n", "MAPE: 0.0907538\n", "RMSE: 2.16498\n", "\n", "Total iteration: 60\n", "Tolerance: 9.10963e-05\n", "MAPE: 0.091131\n", "RMSE: 2.17446\n", "\n", "Running time: 776 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scenario setting**:\n", "\n", "- Tensor size: $35912\\times 30\\times 24$ (road segment, day, time of day)\n", "- Random missing (RM)\n", "- 60% missing rate\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "np.random.seed(1000)\n", "\n", "missing_rate = 0.6\n", "\n", "dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')\n", "binary_mat = dense_mat.copy()\n", "binary_mat[binary_mat != 0] = 1\n", "pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])\n", "dense_mat = dense_mat[pos[0], :]\n", "\n", "## Random missing (RM)\n", "random_mat = np.random.rand(dense_mat.shape[0], dense_mat.shape[1])\n", "binary_mat = np.round(random_mat + 0.5 - missing_rate)\n", "sparse_mat = np.multiply(dense_mat, binary_mat)\n", "\n", "dense_tensor = dense_mat.reshape([dense_mat.shape[0], 30, 24])\n", "sparse_tensor = sparse_mat.reshape([sparse_mat.shape[0], 30, 24])\n", "del dense_mat, sparse_mat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-5}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.00241757\n", "MAPE: 0.0942166\n", "RMSE: 2.25369\n", "\n", "Total iteration: 66\n", "Tolerance: 8.58548e-05\n", "MAPE: 0.0950641\n", "RMSE: 2.27341\n", "\n", "Running time: 891 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scenario setting**:\n", "\n", "- Tensor size: $35912\\times 30\\times 24$ (road segment, day, time of day)\n", "- Non-random missing (NM)\n", "- 40% missing rate\n" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "np.random.seed(1000)\n", "\n", "missing_rate = 0.4\n", "\n", "dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')\n", "binary_mat = dense_mat.copy()\n", "binary_mat[binary_mat != 0] = 1\n", "pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])\n", "dense_mat = dense_mat[pos[0], :]\n", "\n", "## Non-random missing (NM)\n", "binary_mat = np.zeros(dense_mat.shape)\n", "random_mat = np.random.rand(dense_mat.shape[0], 30)\n", "for i1 in range(dense_mat.shape[0]):\n", " for i2 in range(30):\n", " binary_mat[i1, i2 * 24 : (i2 + 1) * 24] = np.round(random_mat[i1, i2] + 0.5 - missing_rate)\n", "sparse_mat = np.multiply(dense_mat, binary_mat)\n", "\n", "dense_tensor = dense_mat.reshape([dense_mat.shape[0], 30, 24])\n", "sparse_tensor = sparse_mat.reshape([sparse_mat.shape[0], 30, 24])\n", "del dense_mat, sparse_mat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-5}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.00103334\n", "MAPE: 0.0982027\n", "RMSE: 2.34015\n", "\n", "Total iteration: 59\n", "Tolerance: 9.23072e-05\n", "MAPE: 0.0983797\n", "RMSE: 2.34535\n", "\n", "Running time: 759 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-5\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluation on New York Taxi Data\n", "\n", "**Scenario setting**:\n", "\n", "- Tensor size: $30\\times 30\\times 1464$ (origin, destination, time)\n", "- Random missing (RM)\n", "- 40% missing rate\n" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = scipy.io.loadmat('../datasets/NYC-data-set/tensor.mat')['tensor'].astype(np.float32)\n", "dim = dense_tensor.shape\n", "missing_rate = 0.4 # Random missing (RM)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-4}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.00122498\n", "MAPE: 0.504978\n", "RMSE: 6.84282\n", "\n", "Total iteration: 87\n", "Tolerance: 9.28379e-05\n", "MAPE: 0.509911\n", "RMSE: 6.84419\n", "\n", "Running time: 54 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-4\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scenario setting**:\n", "\n", "- Tensor size: $30\\times 30\\times 1464$ (origin, destination, time)\n", "- Random missing (RM)\n", "- 60% missing rate\n" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = scipy.io.loadmat('../datasets/NYC-data-set/tensor.mat')['tensor'].astype(np.float32)\n", "dim = dense_tensor.shape\n", "missing_rate = 0.6 # Random missing (RM)\n", "sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-4}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.00150918\n", "MAPE: 0.515299\n", "RMSE: 8.12735\n", "\n", "Total iteration: 89\n", "Tolerance: 9.40833e-05\n", "MAPE: 0.520701\n", "RMSE: 8.12889\n", "\n", "Running time: 48 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-4\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scenario setting**:\n", "\n", "- Tensor size: $30\\times 30\\times 1464$ (origin, destination, time)\n", "- Non-random missing (NM)\n", "- 40% missing rate\n" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "import scipy.io\n", "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = scipy.io.loadmat('../datasets/NYC-data-set/tensor.mat')['tensor']\n", "dim = dense_tensor.shape\n", "nm_tensor = np.random.rand(dim[0], dim[1], dim[2])\n", "missing_rate = 0.4 # Non-random missing (NM)\n", "binary_tensor = np.zeros(dense_tensor.shape)\n", "for i1 in range(dim[0]):\n", " for i2 in range(dim[1]):\n", " for i3 in range(61):\n", " binary_tensor[i1, i2, i3 * 24 : (i3 + 1) * 24] = np.round(nm_tensor[i1, i2, i3] + 0.5 - missing_rate)\n", "sparse_tensor = dense_tensor * binary_tensor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-4}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.00121294\n", "MAPE: 0.510509\n", "RMSE: 7.0283\n", "\n", "Total iteration: 87\n", "Tolerance: 9.4833e-05\n", "MAPE: 0.515144\n", "RMSE: 7.03015\n", "\n", "Running time: 48 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-4\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluation on Pacific Temperature Data\n", "\n", "**Scenario setting**:\n", "\n", "- Tensor size: $30\\times 84\\times 396$ (grid, grid, time)\n", "- Random missing (RM)\n", "- 40% missing rate\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = np.load('../datasets/Temperature-data-set/tensor.npy').astype(np.float32)\n", "pos = np.where(dense_tensor[:, 0, :] > 50)\n", "dense_tensor[pos[0], :, pos[1]] = 0\n", "random_tensor = np.random.rand(dense_tensor.shape[0], dense_tensor.shape[1], dense_tensor.shape[2])\n", "missing_rate = 0.4\n", "\n", "## Random missing (RM)\n", "binary_tensor = np.round(random_tensor + 0.5 - missing_rate)\n", "sparse_tensor = dense_tensor.copy()\n", "sparse_tensor[binary_tensor == 0] = np.nan\n", "sparse_tensor[sparse_tensor == 0] = np.nan" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-4}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.000424517\n", "MAPE: 0.00663588\n", "RMSE: 0.23659\n", "\n", "Total iteration: 78\n", "Tolerance: 9.14475e-05\n", "MAPE: 0.00670003\n", "RMSE: 0.236096\n", "\n", "Running time: 24 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-4\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scenario setting**:\n", "\n", "- Tensor size: $30\\times 84\\times 396$ (grid, grid, time)\n", "- Random missing (RM)\n", "- 60% missing rate\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = np.load('../datasets/Temperature-data-set/tensor.npy').astype(np.float32)\n", "pos = np.where(dense_tensor[:, 0, :] > 50)\n", "dense_tensor[pos[0], :, pos[1]] = 0\n", "random_tensor = np.random.rand(dense_tensor.shape[0], dense_tensor.shape[1], dense_tensor.shape[2])\n", "missing_rate = 0.6\n", "\n", "## Random missing (RM)\n", "binary_tensor = np.round(random_tensor + 0.5 - missing_rate)\n", "sparse_tensor = dense_tensor.copy()\n", "sparse_tensor[binary_tensor == 0] = np.nan\n", "sparse_tensor[sparse_tensor == 0] = np.nan" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-4}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.000666835\n", "MAPE: 0.00954841\n", "RMSE: 0.338751\n", "\n", "Total iteration: 78\n", "Tolerance: 9.55115e-05\n", "MAPE: 0.00977159\n", "RMSE: 0.345411\n", "\n", "Running time: 24 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-4\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scenario setting**:\n", "\n", "- Tensor size: $30\\times 84\\times 396$ (grid, grid, time)\n", "- Non-random missing (NM)\n", "- 40% missing rate\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "np.random.seed(1000)\n", "\n", "dense_tensor = np.load('../datasets/Temperature-data-set/tensor.npy').astype(np.float32)\n", "pos = np.where(dense_tensor[:, 0, :] > 50)\n", "dense_tensor[pos[0], :, pos[1]] = 0\n", "random_tensor = np.random.rand(dense_tensor.shape[0], dense_tensor.shape[1], int(dense_tensor.shape[2] / 3))\n", "missing_rate = 0.4\n", "\n", "## Non-random missing (NM)\n", "binary_tensor = np.zeros(dense_tensor.shape)\n", "for i1 in range(dense_tensor.shape[0]):\n", " for i2 in range(dense_tensor.shape[1]):\n", " for i3 in range(int(dense_tensor.shape[2] / 3)):\n", " binary_tensor[i1, i2, i3 * 3 : (i3 + 1) * 3] = np.round(random_tensor[i1, i2, i3] + 0.5 - missing_rate)\n", "sparse_tensor = dense_tensor.copy()\n", "sparse_tensor[binary_tensor == 0] = np.nan\n", "sparse_tensor[sparse_tensor == 0] = np.nan" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Model setting**:\n", "\n", "- $\\boldsymbol{\\alpha}=\\left(\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right)$\n", "- $\\rho=10^{-4}$\n", "- $\\epsilon =10^{-4}$\n", "- The number of iterations: 200" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter: 50\n", "Tolerance: 0.000423441\n", "MAPE: 0.00701627\n", "RMSE: 0.249312\n", "\n", "Total iteration: 78\n", "Tolerance: 9.44678e-05\n", "MAPE: 0.00724445\n", "RMSE: 0.252975\n", "\n", "Running time: 24 seconds\n" ] } ], "source": [ "import time\n", "start = time.time()\n", "alpha = np.ones(3) / 3\n", "rho = 1e-4\n", "epsilon = 1e-4\n", "maxiter = 200\n", "tensor_hat = HaLRTC_imputer(dense_tensor, sparse_tensor, alpha, rho, epsilon, maxiter)\n", "end = time.time()\n", "print('Running time: %d seconds'%(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### License\n", "\n", "
\n", "This work is released under the MIT license.\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }