{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# -*- coding: utf-8 -*-\n",
    "#  Copyright 2019 - 2024 United Kingdom Research and Innovation\n",
    "#  Copyright 2019 - 2024 Technical University of Denmark\n",
    "#  Copyright 2019 - 2022 The University of Manchester\n",
    "#  Copyright 2019 - 2022 The University of Bath\n",
    "#\n",
    "#  Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "#  you may not use this file except in compliance with the License.\n",
    "#  You may obtain a copy of the License at\n",
    "#\n",
    "#      http://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "#  Unless required by applicable law or agreed to in writing, software\n",
    "#  distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "#  See the License for the specific language governing permissions and\n",
    "#  limitations under the License.\n",
    "#\n",
    "#   Authored by:    Claire Delplancke (University of Bath)\n",
    "#                   Evangelos Papoutsellis (UKRI-STFC)\n",
    "#                   Gemma Fardell (UKRI-STFC)\n",
    "#                   Laura Murgatroyd (UKRI-STFC)  \n",
    "#                   Margaret Duff (UKRI-STFC)\n",
    "#                   Hannah Robarts (UKRI-STFC)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Stochastic Primal Dual Hybrid Gradient Algorithm: Tomography\n",
    "\n",
    " In this demo, we learn how to use the Stochastic Primal-Dual Hybrid (SPDHG) algorithm introduced in [Chambolle et al.](https://arxiv.org/abs/1706.04957), see as well [Ehrhardt et al.](https://arxiv.org/abs/1808.07150), [Papoutsellis et al.](https://arxiv.org/abs/2102.06126), [Gutierrez et al.](https://arxiv.org/abs/2012.01255). This algorithm is a stochastic version of the Primal-Dual Hybrid Gradient (PDHG) Algorithm by [Chambolle & Pock](https://hal.archives-ouvertes.fr/hal-00490826/document). We focus on Tomography Reconstruction under an edge-preserving prior, i.e., the __Total variation regularisation__, see [ROF](https://en.wikipedia.org/wiki/Total_variation_denoising). \n",
    "\n",
    " ## Why SPDHG?\n",
    "\n",
    "In the previous demo, we presented __PDHG__ for tomography reconstruction.\n",
    "\n",
    "__SPDHG__ is a stochastic version of __PDHG__. As PDHG, SPDHG can solve convex, non necessarily smooth optimization problems and is provenly convergent. \n",
    "\n",
    "The central improvement with respect to PDHG is that SPDHG can deal with __subsets of data__, making it computationally more efficient, so that it can deal with **larger datasets**.\n",
    "\n",
    "For CT and PET for example, the data can be divided into a collection of partial sinograms, each of them corresponding to a subset of the total set of angles. Accordingly, the projection operator $A: X \\rightarrow Y$ can be decomposed as $A=(A_1, \\dots, A_n): X \\rightarrow Y=Y_1\\times \\dots \\times Y_n$ where $n$ is the number of subsets and each $A_i:X \\rightarrow Y_i$ is a __partial__ projection.\n",
    "\n",
    "At each iteration, SPDHG computes only one partial forward and back-projection $A_i,\\,A_i^*$ for a randomly chosen index $i$, whereas PDHG computes $A$ and $A^*$. Hence SPDHG requires only a fraction of the computations of PDHG per iteration, which makes it faster per iteration.\n",
    "\n",
    "\n",
    "## Learning objectives\n",
    "\n",
    "1. Implement the subsetting\n",
    "1. Implement default set-up\n",
    "1. Compare PDHG and SPDHG: notion of epoch, visual comparison, measures (primal objective function)\n",
    "1. Tweak the default set-up: explicit / implicit, stepsizes\n",
    "\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "Pre-requisites:\n",
    "\n",
    "1. ImageData, AcquisitionData, Block Framework\n",
    "1. Projection operator\n",
    "1. Total Variation (TV)\n",
    "1. PDHG implicit set-up for tomographic reconstruction. Optional: explicit set-up"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tomography reconstruction\n",
    "\n",
    "The minimisation problem for Total Variation Tomography reconstruction with TV is \n",
    "\n",
    "$$ \\underset{u}{\\operatorname{argmin}} \\mathcal{O}(u)= \\frac{1}{2}\\| A u - g\\|^{2} + \\alpha\\,\\mathrm{TV}(u) +  \\mathbb{I}_{\\{u>0\\}}(u) \\tag{1}$$\n",
    "\n",
    "\n",
    "where,\n",
    "\n",
    "1. TV is defined as $$\\mathrm{TV}(u) = \\|\\nabla u \\|_{2,1} = \\sum \\sqrt{ (\\partial_{y}u)^{2} + (\\partial_{x}u)^{2} }$$\n",
    "1. g is the Acquisition data obtained from the detector.\n",
    "1. $A$ is the projection operator ( _Radon transform_ ) that maps from an image-space to an acquisition space, i.e., $A : X \\rightarrow Y, $ where X is an __ImageGeometry__ and Y is a __BlockDataContainer__ of __AcquisitionGeometry__ objects.\n",
    "1. $\\alpha$ is a regularising parameter that measures a trade-off between the fidelity and the regulariser terms.\n",
    "1. The functional $\\mathbb{I}_{\\{u\\geq 0\\}}(u) : = \n",
    "\\begin{cases}\n",
    "0, & \\text{ if } u\\geq 0\\\\\n",
    "\\infty , & \\text{ otherwise}\n",
    "\\quad\n",
    "\\end{cases}\n",
    "$, $\\quad$ is a positivity constraint for the minimiser $u$.\n",
    "\n",
    "\n",
    "### SPDHG form\n",
    "In order to use the SPDHG algorithm for the problem above, we need to express our minimisation problem into the following form:\n",
    "\n",
    "$$\\min_{u} \\sum_{i=1}^n F_i(K_i u) + G(u) \\tag{2}$$\n",
    "\n",
    "where we assume that:\n",
    "\n",
    "1. $F_i$, $G$ are __convex__ functionals\n",
    "    \n",
    "    - $F_i: Y_i \\rightarrow \\mathbb{R}$ \n",
    "    \n",
    "    - $G: X \\rightarrow \\mathbb{R}$\n",
    "    \n",
    "    \n",
    "2. $K_i$ is a continuous linear operator acting from a space X to another space Y_i :\n",
    "\n",
    "$$K_i : X \\rightarrow Y_i \\quad $$ \n",
    "\n",
    "with operator norm  defined as $$\\| K_i \\| = \\max\\{ \\|K_i x\\|_{Y} : \\|x\\|_{X}\\leq 1 \\}.$$  \n",
    "\n",
    "**Note**: We use the [Power Method](https://en.wikipedia.org/wiki/Power_iteration) to approximate the norm of the $K_i$'s and if needed of $\\nabla$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import libraries\n",
    "  \n",
    "from cil.optimisation.algorithms import PDHG, SPDHG\n",
    "from cil.optimisation.operators import GradientOperator, BlockOperator\n",
    "from cil.optimisation.functions import IndicatorBox, BlockFunction, L2NormSquared, MixedL21Norm\n",
    " \n",
    "from cil.io import ZEISSDataReader\n",
    " \n",
    "from cil.processors import Slicer, Binner, TransmissionAbsorptionConverter\n",
    " \n",
    "from cil.plugins.astra.operators import ProjectionOperator\n",
    "from cil.plugins.ccpi_regularisation.functions import FGP_TV\n",
    " \n",
    "from cil.utilities.display import show2D\n",
    "from cil.utilities.jupyter import islicer\n",
    " \n",
    " \n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data information\n",
    "\n",
    "In this demo, we use the **Walnut** found in [Jørgensen_et_all](https://zenodo.org/record/4822516#.YLXyAJMzZp8). In total, there are 6 individual micro Computed Tomography datasets in the native Zeiss TXRM/TXM format. The six datasets were acquired at the 3D Imaging Center at Technical University of Denmark in 2014 (HDTomo3D in 2016) as part of the ERC-funded project High-Definition Tomography (HDTomo) headed by Prof. Per Christian Hansen. \n",
    "\n",
    "This example requires the dataset walnut.zip from https://zenodo.org/record/4822516 :\n",
    "\n",
    "    https://zenodo.org/record/4822516/files/walnut.zip\n",
    "\n",
    "If running locally please download the data and update the `filename` variable below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = \"/mnt/materials/SIRF/Fully3D/CIL/Walnut/valnut_2014-03-21_643_28/tomo-A/valnut_tomo-A.txrm\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reader = ZEISSDataReader()\n",
    "reader.set_up(file_name=filename)\n",
    "data3D = reader.read()\n",
    "\n",
    "# reorder data to match default order for Astra/Tigre operator\n",
    "data3D.reorder('astra')\n",
    "\n",
    "# Get Image and Acquisition geometries\n",
    "ag3D = data3D.geometry\n",
    "ig3D = ag3D.get_ImageGeometry()\n",
    "\n",
    "# Extract vertical slice\n",
    "data2D = data3D.get_slice(vertical='centre')\n",
    "\n",
    "# Select every 10 angles\n",
    "sliced_data = Slicer(roi={'angle':(0,1601,10)})(data2D)\n",
    "\n",
    "# Reduce background regions\n",
    "binned_data = Binner(roi={'horizontal':(120,-120,2)})(sliced_data)\n",
    "\n",
    "# Create absorption data \n",
    "data = TransmissionAbsorptionConverter()(binned_data) \n",
    "\n",
    "# Remove circular artifacts\n",
    "data -= np.mean(data.as_array()[80:100,0:30])\n",
    "\n",
    "# Get Image and Acquisition geometries for one slice\n",
    "ag2D = data.geometry\n",
    "ag2D.set_angles(ag2D.angles, initial_angle=0.2, angle_unit='radian')\n",
    "ig2D = ag2D.get_ImageGeometry()\n",
    "\n",
    "A = ProjectionOperator(ig2D, ag2D, device = \"gpu\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to setup and run SPDHG\n",
    "\n",
    "In order to setup and run SPDHG, we need to define the following:\n",
    "\n",
    "- The operator $K=(K_1,\\dots,K_n)$.\n",
    "- The functions $F=(F_1,\\dots,F_N)$ and $G$.\n",
    "- The maximum number of iterations\n",
    "\n",
    "The setup and run of SPDHG:\n",
    "\n",
    "` spdhg = SPDHG(f = F, g = G, operator = K)`\n",
    "\n",
    "` spdhg.run(number_of_iterations)`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can express our [problem](#Tomography-reconstruction) in the [form](#SPDHG-form), if we let\n",
    "\n",
    "1. $K = (A_1,\\dots,A_n) \\quad \\Longleftrightarrow \\quad $ `K = BlockOperator(A_i)`  \n",
    "\n",
    "1. $F_i: Y_i \\rightarrow \\mathbb{R}, \\text{ with } F_i(y_i) := \\frac{1}{2}\\| y_i - g_i \\|^{2}, \\quad \\Longleftrightarrow \\quad$ for all $i$'s, `F_i = 0.5*L2NormSquared(data[i])` and  `F = BlockFunction(F_i)` \n",
    "\n",
    "1. $G: X \\rightarrow \\mathbb{R}, \\text{ with } G(x) := \\alpha\\|\\nabla x\\|_{1} + \\mathbb{I}_{\\{x>0\\}}(x), \\quad \\Longleftrightarrow \\quad$ ` G = alpha * FGP_TV()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define operator $K$, functions $F$  and $G$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define number of subsets\n",
    "n_subsets = 10\n",
    "\n",
    "#Sequentially partition the data into the number of subsets using the data partitioner\n",
    "partitioned_data=data.partition(n_subsets, 'sequential')\n",
    "\n",
    "\n",
    "# Initialize the lists containing the F_i's\n",
    "f_subsets = []\n",
    "\n",
    "\n",
    "# Define F_i's\n",
    "for i in range(n_subsets):\n",
    "    # Define F_i and put into list\n",
    "    fi = 0.5*L2NormSquared(b = partitioned_data[i])\n",
    "    f_subsets.append(fi)\n",
    "    \n",
    "# Define F \n",
    "F = BlockFunction(*f_subsets)\n",
    "\n",
    "\n",
    "#Create the projection operator, K, for the partitioned data\n",
    "ageom_subset = partitioned_data.geometry\n",
    "K = ProjectionOperator(ig2D, ageom_subset)\n",
    "\n",
    "\n",
    "# Define G (set the constraint that the solution must be positive)\n",
    "alpha = 0.025\n",
    "G = alpha * FGP_TV(nonnegativity=True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup and run SPDHG\n",
    "\n",
    "__Note:__ in this example, there are some parameters which we don't pass, so that SPDHG will use default, pre-defined parameters. These parameters consist of:\n",
    "\n",
    "1. The probabilities $p_i$ of choosing each subset $i$. By default these probabilities are uniform: if there are $n$ subsets, $p_i=1/n$. We'll see another choice in the part about *explicit set-up*.\n",
    "1. The step-sizes of the algorithm. To know more about step-sizes setting, refer to the last part of the tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Setup and run SPDHG for 50 iterations\n",
    "spdhg = SPDHG(f = F, g = G, operator = K,\n",
    "            update_objective_interval = 10)\n",
    "spdhg.run(50)\n",
    "\n",
    "spdhg_recon = spdhg.solution    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Show reconstruction result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "show2D(spdhg_recon, cmap='inferno')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Comparison to PDHG\n",
    "\n",
    "PDHG and SPDHG converge to the same set of solutions, but SPDHG is faster per iteration. How can we compare PDHG and SPDHG? We want to compare PDHG and SPDHG **for the same amount of computational effort**. Remember that at each iteration, PDHG computes one entire forward and back-projection, while SPDHG computes only a partial one. For example, if we have 10 subsets, $10$ iterations of SPDHG have the same computational cost than one iteration of PDHG. \n",
    "\n",
    "In order to have a fair comparison between both algorithms, we then define the notion of **epoch**, where one epoch corresponds to a full forward and back-projection. Hence, one epoch of PDHG corresponds to one iteration, while one epoch of SPDHG corresponds in our case to $10$ iteration.\n",
    "\n",
    "Let's set-up and run PDHG and SPDHG for the same number of epochs and compare the results.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_epoch = 5\n",
    "\n",
    "# Setup and run PDHG\n",
    "normK = K.norm()\n",
    "tau = 1/normK\n",
    "sigma = 1/normK\n",
    "pdhg = PDHG(f = F, g = G, operator = K,\n",
    "            update_objective_interval = 1, tau=tau, sigma=sigma)\n",
    "pdhg.run(n_epoch)\n",
    "\n",
    "pdhg_recon = pdhg.solution\n",
    "\n",
    "# Setup and run SPDHG\n",
    "spdhg = SPDHG(f = F, g = G, operator = K,\n",
    "            update_objective_interval = 1 * n_subsets)\n",
    "spdhg.run(n_epoch * n_subsets)\n",
    "\n",
    "spdhg_recon = spdhg.solution\n",
    "\n",
    "show2D([pdhg_recon, spdhg_recon], cmap='inferno', title = [\"PDHG, {} epochs \".format(n_epoch),\"SPDHG, {} epochs \".format(n_epoch)])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After 5 epochs, SPDHG reconstruction seems visually more converged than PDHG reconstruction.\n",
    "\n",
    "To further compare PDHG and SPDHG speeds, let's use a quantitative measure of convergence, which is given by the value of the objective function $\\mathcal{O}(u)$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nepoch = 20\n",
    "pdhg.run(nepoch)\n",
    "pdhg_measures = pdhg.objective\n",
    "\n",
    "spdhg.run(nepoch*n_subsets)\n",
    "spdhg_measures = spdhg.objective"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot \n",
    "epoch_range = np.arange(0,pdhg.iteration+pdhg.update_objective_interval, pdhg.update_objective_interval)\n",
    "plt.semilogy(epoch_range, pdhg_measures, label='PDHG')\n",
    "plt.semilogy(epoch_range, spdhg_measures, label='SPDHG')\n",
    "plt.xlabel('Epochs')\n",
    "plt.ylabel('Objective function')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Explicit set-up\n",
    "\n",
    "As for PDHG, there are various ways to put our reconstruction [problem](#Tomography-reconstruction):\n",
    "$$ \\underset{u}{\\operatorname{argmin}} \\mathcal{O}(u)=  \\frac{1}{2}\\| A u - g\\|^{2} + \\alpha\\,\\mathrm{TV}(u) +  \\mathbb{I}_{\\{u>0\\}}(u) $$\n",
    "into SPDHG-appropriate [form](#SPDHG_form):\n",
    "$$\\min_{u} \\sum_{i=1}^n F_i(K_i u) + G(u).$$ \n",
    "\n",
    "*Implicit set-up*\n",
    "\n",
    "In the first part of the tutorial, we chose the so-called *implicit* set-up, where\n",
    "\n",
    "1. $K = (A_1,\\dots,A_n) \\quad \\Longleftrightarrow \\quad $ `K = BlockOperator(A_i)`  \n",
    "\n",
    "1. $F_i: Y_i \\rightarrow \\mathbb{R}, \\text{ with } F_i(y_i) := \\frac{1}{2}\\| y_i - g_i \\|^{2}, \\quad \\Longleftrightarrow \\quad$ for all $i$'s, `F_i = 0.5*L2NormSquared(data[i])` and  `F = BlockFunction(F_i)` \n",
    "\n",
    "1. $G: X \\rightarrow \\mathbb{R}, \\text{ with } G(x) := \\alpha\\|\\nabla x\\|_{1} + \\mathbb{I}_{\\{x>0\\}}(x), \\quad \\Longleftrightarrow \\quad$ ` G = alpha * FGP_TV()`.\n",
    "\n",
    "*Explicit set-up*\n",
    "\n",
    "As for PDHG, another way is to include the gradient of $\\mathrm{TV}(u)=\\alpha \\|\\nabla u\\|_{2,1}$ into the operator $K$:\n",
    "\n",
    "1. $K = (A_1,\\dots,A_n, \\nabla) \\quad \\Longleftrightarrow \\quad $ `K = BlockOperator(A_1,...,A_n, Grad)`  \n",
    "\n",
    "1. For all $1\\leq i \\leq n, F_i: Y_i \\rightarrow \\mathbb{R}, \\text{ with } F_i(y_i) := \\frac{1}{2}\\| y_i - g_i \\|^{2}$  and $F_{n+1}(z)=\\|z\\|_{1,2}$, $\\quad \\Longleftrightarrow \\quad$ \n",
    "\n",
    "for all $1\\leq i \\leq n$, `F_i = 0.5*L2NormSquared(data[i])`, `F_{n+1}=MixedL21Norm()` and  `F = BlockFunction(F_i)` \n",
    "\n",
    "1. $G: X \\rightarrow \\mathbb{R}, \\text{ with } G(x) := \\mathbb{I}_{\\{x>0\\}}(x), \\quad \\Longleftrightarrow \\quad$ ` G = IndicatorBox(lower=0)`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Non-uniform sampling\n",
    "\n",
    "This time, we also use non-uniform sampling: half of the time, we update the dual variable corresponding to TV, so that $p_{n+1}=1/2$, and half of the time we update one of the dual variable corresponding to one subset, so that $p_i=1/(2n)$ for $1\\leq i \\leq n$. \n",
    "\n",
    "__Comment 1:__ if you don't want to use the default uniform sampling, the probabilities you specify need sum to $1$.\n",
    "\n",
    "__Comment 2:__ the setting and sampling have an influence on the value of one epoch in terms of iterations. With the sampling defined above, we use the partial forward operator **only half of the time** in mean, so that $1$ epoch (that is, the unit of one entire forward and back-projection) corresponds to twice the number of subsets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define number of subsets\n",
    "n_subsets = 10\n",
    "\n",
    "#We sequentially partition the data into the number of subsets using the data partitioner\n",
    "partitioned_data=data.partition(n_subsets, 'sequential')\n",
    "\n",
    "# Initialize the lists containing the F_i's and A_i's\n",
    "f_subsets = []\n",
    "A_subsets = []\n",
    "\n",
    "# Define F_i's and A_i's - data fit part\n",
    "for i in range(n_subsets):\n",
    "    # Define F_i and put into list\n",
    "    fi = 0.5 * L2NormSquared(b = partitioned_data[i])\n",
    "    f_subsets.append(fi)\n",
    "    # Define A_i and put into list \n",
    "    ageom_subset = partitioned_data[i].geometry\n",
    "    Ai = ProjectionOperator(ig2D, ageom_subset)\n",
    "    A_subsets.append(Ai)\n",
    "    \n",
    "# Define F_{n+1} and A_{n+1} - regularization part\n",
    "f_reg = ig2D.spacing[0] * alpha * MixedL21Norm() # take into account the pixel size with ig2D.spacing\n",
    "Grad = GradientOperator(A_subsets[0].domain, backend='c', correlation='SpaceChannel')\n",
    "\n",
    "# Define F and K\n",
    "F = BlockFunction(*f_subsets, f_reg)\n",
    "K = BlockOperator(*A_subsets, Grad)\n",
    "\n",
    "# Define G\n",
    "G = IndicatorBox(lower=0)\n",
    "\n",
    "# Define probabilities\n",
    "prob = [1 / (2 * n_subsets)] * n_subsets + [1 / 2]\n",
    "\n",
    "# Setup and run SPDHG for 5 epochs\n",
    "spdhg_explicit = SPDHG(f = F, g = G, operator = K, \n",
    "            update_objective_interval = 2 * n_subsets, prob=prob )\n",
    "spdhg_explicit.run(5 * 2 * n_subsets)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Show result and compare to implicit framework\n",
    "spdhg_explicit_recon = spdhg_explicit.get_output()  \n",
    "\n",
    "show2D([spdhg_explicit_recon, spdhg_recon], cmap='inferno', title= ['Explicit 5 epochs', 'Implicit 5 epochs'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Comparison of objective function\n",
    "epoch_range = np.arange(0,6)\n",
    "plt.semilogy(epoch_range, spdhg_explicit.objective, label='explicit SPDHG')\n",
    "plt.semilogy(epoch_range, spdhg_measures[0:len(epoch_range)], label='implicit SPDHG')\n",
    "plt.xlabel('Epochs')\n",
    "plt.ylabel('Objective function')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Improve the speed of explicit reconstruction by re-normalizing\n",
    "\n",
    "You can observe that the explicit reconstruction converges slower than the implicit one. However, the block operator $K$ used in the explicit framework is now formed by partial operators $A_{\\text{subsets}}[i]$ and $\\nabla$ with potentially very different norms. Empirically it hence makes sense to renormalize the partial operators so that they all have norm one. To leave the objective function unchanged, we'd also need to redefine the function $f_i$'s according to the following equations:\n",
    "$$ \\| K u - g\\|_2^2 =  \\|K\\|^2 \\cdot \\left\\| \\frac{K}{\\|K\\|} u - \\frac{g}{\\|K\\|}\\right\\|_2^2$$\n",
    "$$ \\| \\nabla u\\|_{2,1} =  \\|\\nabla \\| \\cdot \\left\\| \\frac{\\nabla}{\\|\\nabla\\|} u\\right\\|_{2,1}$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define number of subsets\n",
    "n_subsets = 10\n",
    "\n",
    "#We sequentially partition the data into the number of subsets using the data partitioner\n",
    "partitioned_data=data.partition(n_subsets, 'sequential')\n",
    "\n",
    "# Initialize the lists containing the F_i's and A_i's\n",
    "f_subsets = []\n",
    "A_subsets = []\n",
    "\n",
    "# Define F_i's and A_i's - data fit part\n",
    "for i in range(n_subsets):\n",
    "    ageom_subset = partitioned_data[i].geometry\n",
    "    Ai = ProjectionOperator(ig2D, ageom_subset)\n",
    "    normi = Ai.norm()\n",
    "    A_subsets.append((1 / normi) * Ai)\n",
    "    # Define rescaled F_i and put into list\n",
    "    fi = 0.5 * normi**2 * L2NormSquared(b = (1 / normi) * partitioned_data[i])\n",
    "    f_subsets.append(fi)\n",
    "\n",
    "# Define F_{n+1} and A_{n+1} - regularization part\n",
    "Grad = GradientOperator(A_subsets[0].domain, backend='c', correlation='SpaceChannel')\n",
    "normGrad = Grad.norm()\n",
    "Grad_renorm = (1 / normGrad) * Grad\n",
    "f_reg = normGrad * ig2D.spacing[0] * alpha * MixedL21Norm() # take into account the pixel size with ig2D.spacing\n",
    "\n",
    "\n",
    "# Define F and K\n",
    "F = BlockFunction(*f_subsets, f_reg)\n",
    "K = BlockOperator(*A_subsets, Grad_renorm)\n",
    "# Setup and run SPDHG for 5 epochs\n",
    "spdhg_explicit = SPDHG(f = F, g = G, operator = K,  \n",
    "                       update_objective_interval = 2 * n_subsets, prob=prob)\n",
    "spdhg_explicit.run(5 * 2 * n_subsets)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Show result and compare to implicit framework\n",
    "spdhg_explicit_recon = spdhg_explicit.get_output()  \n",
    "show2D([spdhg_explicit_recon, spdhg_recon], cmap='inferno', title= ['Renormalized Explicit 5 epochs','Implicit 5 epochs'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Comparison of objective function\n",
    "epoch_range = np.arange(0,6)\n",
    "plt.semilogy(epoch_range, spdhg_explicit.objective, label='Renormalized Explicit SPDHG')\n",
    "plt.semilogy(epoch_range, spdhg_measures[0:len(epoch_range)], label='Implicit SPDHG')\n",
    "plt.xlabel('Epochs')\n",
    "plt.ylabel('Objective function')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## When to use explicit or implicit framework?\n",
    "\n",
    "1. Set-up: implicit framework is generally easier to set-up\n",
    "\n",
    "1. Time-per-iteration: the time-per-iteration is greater in the implicit framework; indeed, at each iteration one needs to compute the proximal operator of the TV, which is done by in a (hidden) inner loop. This does not occur in the explicit framework, where all the operators have explicit expressions (hence its name.)\n",
    "\n",
    "1. Speed of convergence: there is no theoretical result as to which framework is the fastest. In the example above, one can observe that renormalized explicit SPDHG seems to converge at roughly the same speed than implicit SPDHG.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Theoretical background\n",
    "\n",
    "\n",
    "**Algorithm**\n",
    "\n",
    "SPDHG is described as follows:\n",
    "\n",
    "1. **Input:** Step-sizes $\\tau, (\\sigma_i)_{1\\leq i \\leq n}$, $\\tau$; probabilities $p_i$\n",
    "1. **Initialize:** $x=0$, $y=0$, $z = \\bar{z} = A^T y = 0$\n",
    "1. **Iterate** for $m=1, \\dots$:\n",
    "\n",
    "    1. Update primal variable: \n",
    "        $$x = \\text{prox}_{\\tau g}(x - \\tau \\bar{z}) $$\n",
    "    1. Pick index $i$ at random with probability $p_i$\n",
    "        $$ $$\n",
    "    1. Update dual variable with index $i$: \n",
    "        $$ y_i^+ = \\text{prox}_{\\sigma_i f_i^*}(y_i + \\sigma_i A_i x) $$\n",
    "    1. Update secondary variables: \n",
    "        $$ d z = A_i^T(y_i^+ - y_i), \\quad z = z + d z, \\quad y_i = y_i^+ $$\n",
    "    1. Perform extrapolation step:     \n",
    "        $$ \\bar{z}= z + (1/p_i) \\cdot d z $$\n",
    "\n",
    "**Admissible step-sizes**\n",
    "\n",
    "SPDHG converges to the set of solutions of the saddle-point problem (in terms of Bregman distance [ref] and almost surely [ref1, ref2]) if the following conditions on step-sizes $\\tau, (\\sigma_i)_{1\\leq i \\leq n}$ is verified:\n",
    "$$ \\tau \\, \\sigma_i \\, \\|K_i\\|^2 < p_i^2, \\quad 1 \\leq i \\leq n$$\n",
    "\n",
    "CIL default implementation is $\\sigma_i = \\rho/\\|K_i\\|$ for all $i$'s and $\\tau =\\rho\\, \\min_i p_i^2/\\|K_i\\|$ with $\\rho=0.99$.\n",
    "\n",
    "\n",
    "Other admissible choices would be $\\sigma_i = \\gamma \\rho/\\|K_i\\|$ for all $i$'s and $\\tau =\\rho\\, \\min_i p_i^2/\\|K_i\\| / \\gamma$ with $\\rho=0.99$. Changing the value of $\\gamma$ allows to change the balance between primal and dual convergence. It can be changed in the following way:\n",
    "\n",
    "` spdhg = SPDHG(f = F, g = G, operator = K, gamma = gamma)`\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cil_demos",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}