{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deuteration of H$_3^+$\n",
    "\n",
    "The objective of this notebook is to perform the kinetic analysis shown in the figure below (Figure 9 from [E. Hugo et al. (2009) _J. Chem. Phys._ **130**, 164302](http://dx.doi.org/10.1063/1.3089422)).\n",
    "\n",
    "![Deuteration figure](https://leeping.github.io/che155/assets/images/week05/hugo-f9.jpg)\n",
    "\n",
    "## Inspecting the Data\n",
    "\n",
    "In this experiment, H$_3^+$ ions react with excess HD at 13.5 K to produce H$_2$D$^+$, D$_2$H$^+$, and D$_3^+$ through a series of elementary deuteration reactions. The data from the figure above were digitized, and this folder contains a csv file containing the ion count as a function of time for each species. First, we can import and inspect the data using `pandas.read_csv()`. Since the csv files do not contain column headers, we supply our own headers with the `names` keyword argument. Then, since the units of time in the csv file are ms, we divide the time column by 1000 to convert to seconds. This will be convenient down the road."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from matplotlib import pyplot as plt\n",
    "import pandas as pd\n",
    "\n",
    "h3 = pd.read_csv('H3+.csv',names=['time','number'])\n",
    "h2d = pd.read_csv('H2D+.csv',names=['time','number'])\n",
    "d2h = pd.read_csv('D2H+.csv',names=['time','number'])\n",
    "d3 = pd.read_csv('D3+.csv',names=['time','number'])\n",
    "\n",
    "for df in [h3,h2d,d2h,d3]:\n",
    "    df['time'] = df['time']/1000.\n",
    "\n",
    "h3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll make a scatterplot of the data using our usual matplotlib methods. When working with experimental data, it is usually a good idea to plot the data first so that you know what you're working with."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots()\n",
    "\n",
    "ax.scatter(h3['time'],h3['number'],color='#000000',label=r'H$_3^+$')\n",
    "ax.scatter(h2d['time'],h2d['number'],color='#ffbf00',label=r'H$_2$D$^+$')\n",
    "ax.scatter(d2h['time'],d2h['number'],color='#022851',label=r'D$_2$H$^+$')\n",
    "ax.scatter(d3['time'],d3['number'],color='#c10230',label=r'D$_3^+$')\n",
    "\n",
    "ax.set_xlabel(\"Time (s)\")\n",
    "ax.set_ylabel(\"Number\")\n",
    "ax.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the original paper, the data were plotted on a log scale. Logarithmic scaling is excellent for displaying exponential trends, as they appear linear on a log scale. The integrated rate law for H$_3^+$ shows that it is predicted to decay exponentially, so it should appear as a line on a logarithmic plot with a negative slope. In `matplotlib`, the axis scale can be set using [`Axes.set_yscale`](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.set_yscale.html)/[`Axes.set_xscale`](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.set_xscale.html), which takes `'linear'` and `'log'` as possible options."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ax.set_yscale('log')\n",
    "fig"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simple Kinetic Model\n",
    "\n",
    "In the introduction to this notebook, we performed a simple kinetic analysis of this system. Assuming that HD is in large excess, pseudo-first-order conditions apply, and the number densities for each species are:\n",
    "\n",
    "$$ [\\text{H}_3^+](t) = [\\text{H}_3^+]_0e^{-k_1t} $$\n",
    "\n",
    "$$ [\\text{H}_2\\text{D}^+](t) = \\frac{[\\text{H}_3^+]_0k_1}{k_2-k_1}\\left(e^{-k_1t} - e^{-k_2t}\\right) $$\n",
    "\n",
    "$$ [\\text{D}_2\\text{H}^+](t) = \\frac{[\\text{H}_3^+]_0k_1k_2}{k_2-k_1}\\left(\\frac{e^{-k_1t}-e^{-k_3t}}{k_3-k_1} - \\frac{e^{-k_2t} - e^{-k_3t}}{k_3-k_2}\\right) $$\n",
    "\n",
    "$$ [\\text{D}_3^+](t) = [\\text{H}_3^+]_0 - [\\text{H}_3^+](t) - [\\text{H}_2\\text{D}^+](t) - [\\text{D}_2\\text{H}^+](t) $$\n",
    "\n",
    "In these equations, the square brackets represent number density (molecules per cubic centimeter), and the rate coefficients are the pseudo-first-order constants, which are related to the true second order rate coefficients $k_n^{(2)}$ as follows:\n",
    "\n",
    "$$ k_n \\equiv k_n^{(2)}[\\text{HD}] $$\n",
    "\n",
    "According to the paper, \\[HD\\] was $6.3\\times 10^{10}$ cm$^{-3}$, and the best fit to the experimental data yielded the second-order rate coefficients:\n",
    "\n",
    "| Constant | Value (10$^{-9}$ cm$^{-3}$ s$^{-1}$) |\n",
    "| --- | --- |\n",
    "| $k_1^{(2)}$ | 1.30 |\n",
    "| $k_2^{(2)}$ | 1.30 |\n",
    "| $k_3^{(2)}$ | 1.05 |\n",
    "\n",
    "Although we don't know what the initial number density of H$_3^+$ is, from the graph in the paper it seems close to 950 cm$^{-3}$, so we can use that as a guess. We can build the kinetic model easily."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hd = 6.3e10 #number density of HD\n",
    "\n",
    "#pseudo-first order rate coefficients\n",
    "k1 = 1.30e-9*hd\n",
    "k2 = 1.30e-9*hd\n",
    "k3 = 1.05e-9*hd\n",
    "\n",
    "#initial density of H3+\n",
    "h30 = 920\n",
    "\n",
    "#time: 0 to 150 ms\n",
    "t = np.linspace(0,150e-3,1000)\n",
    "\n",
    "#rate equations\n",
    "h3t = (h30*np.exp(-k1*t))\n",
    "h2dt = (k1*h30/(k2-k1)*(np.exp(-k1*t) - np.exp(-k2*t)))\n",
    "d2ht = (k1*k2*h30/(k2-k1)*((np.exp(-k1*t) - np.exp(-k3*t))/(k3-k1) -\n",
    "                           (np.exp(-k2*t)-np.exp(-k3*t))/(k3-k2)))\n",
    "d3t = (h30-h3t-h2dt-d2ht)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The code raises an exception! It happens that our model fails when any of the two rate coefficients are equal because it leads to division by zero. This is one pitfall of numerical computing: you have to be very careful about potential divide by zero errors. It is also a flaw with the approach we're using! Integrated rate equations are susceptible to these situations in general. Next week we will see a completely different strategy for avoiding this problem, but for now we're going to stick with the integrated rate equations, so we need to handle this. One possiblity is to rederive the rate laws for each possible pair of equal rate coefficients and select which one we use based on a series of equality tests. That would be very tedious, and in this case it turns out to be unnecessary.\n",
    "\n",
    "In any experiment, there is uncertainty. The Hugo et al. paper did not explicitly state the uncertainties on their rate coefficients, but we could assume they are comparable to the last significant digit. So to fix our problem we can just make `k1` and `k2` very slightly different, by an amount smaller than the likely experimental uncertainty."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hd = 6.3e10 #number density of HD\n",
    "\n",
    "#pseudo-first order rate coefficients\n",
    "k1 = 1.30e-9*hd\n",
    "k2 = 1.3001e-9*hd\n",
    "k3 = 1.05e-9*hd\n",
    "\n",
    "#initial density of H3+\n",
    "h30 = 950\n",
    "\n",
    "#time: 0 to 150 ms\n",
    "t = np.linspace(0,150e-3,1000)\n",
    "\n",
    "#rate equations\n",
    "h3t = (h30*np.exp(-k1*t))\n",
    "h2dt = (k1*h30/(k2-k1)*(np.exp(-k1*t) - np.exp(-k2*t)))\n",
    "d2ht = (k1*k2*h30/(k2-k1)*((np.exp(-k1*t) - np.exp(-k3*t))/(k3-k1) - (np.exp(-k2*t)-np.exp(-k3*t))/(k3-k2)))\n",
    "d3t = (h30-h3t-h2dt-d2ht)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can plot the data, comparing the model to the experiment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots()\n",
    "\n",
    "ax.scatter(h3['time'],h3['number'],color='#000000',label=r'H$_3^+$')\n",
    "ax.scatter(h2d['time'],h2d['number'],color='#ffbf00',label=r'H$_2$D$^+$')\n",
    "ax.scatter(d2h['time'],d2h['number'],color='#022851',label=r'D$_2$H$^+$')\n",
    "ax.scatter(d3['time'],d3['number'],color='#c10230',label=r'D$_3^+$')\n",
    "\n",
    "ax.set_xlabel(\"Time (s)\")\n",
    "ax.set_ylabel(\"Number\")\n",
    "ax.legend()\n",
    "ax.plot(t,h3t,c='#000000')\n",
    "ax.plot(t,h2dt,c='#ffbf00')\n",
    "ax.plot(t,d2ht,c='#022851')\n",
    "ax.plot(t,d3t,c='#c10230')\n",
    "ax.set_ylim(.01,2000)\n",
    "ax.set_yscale('log')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Clearly, our simple model is not perfectly describing the data. Compared with the model in the paper, our model overestimates the amount of H$_3^+$ present, and it does not capture the slight curves in the H$_2$D$^+$ and D$_2$H$^+$ density at the longer times. Still, it's not bad, and the log scale somewhat exaggerates the differences. If we make the same plot on a linear scale, we see the agreement is quite good, although it could be better."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ax.set_yscale('linear')\n",
    "ax.set_ylim(-10,1000)\n",
    "fig"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a quick sanity check, we can switch which rate coefficient we altered just to make sure the behavior does not vary wildly, and indeed, it does not, as shown below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hd = 6.3e10 #number density of HD\n",
    "\n",
    "#pseudo-first order rate coefficients\n",
    "k1 = 1.3001e-9*hd\n",
    "k2 = 1.30e-9*hd\n",
    "k3 = 1.05e-9*hd\n",
    "\n",
    "#initial density of H3+\n",
    "h30 = 920\n",
    "\n",
    "#time: 0 to 150 ms\n",
    "t = np.linspace(0,150e-3,1000)\n",
    "\n",
    "#rate equations\n",
    "h3t = (h30*np.exp(-k1*t))\n",
    "h2dt = (k1*h30/(k2-k1)*(np.exp(-k1*t) - np.exp(-k2*t)))\n",
    "d2ht = (k1*k2*h30/(k2-k1)*((np.exp(-k1*t) - np.exp(-k3*t))/(k3-k1) - (np.exp(-k2*t)-np.exp(-k3*t))/(k3-k2)))\n",
    "d3t = (h30-h3t-h2dt-d2ht)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "\n",
    "ax.scatter(h3['time'],h3['number'],color='#000000',label=r'H$_3^+$')\n",
    "ax.scatter(h2d['time'],h2d['number'],color='#ffbf00',label=r'H$_2$D$^+$')\n",
    "ax.scatter(d2h['time'],d2h['number'],color='#022851',label=r'D$_2$H$^+$')\n",
    "ax.scatter(d3['time'],d3['number'],color='#c10230',label=r'D$_3^+$')\n",
    "\n",
    "ax.set_xlabel(\"Time (s)\")\n",
    "ax.set_ylabel(\"Number\")\n",
    "ax.legend()\n",
    "ax.plot(t,h3t,c='#000000')\n",
    "ax.plot(t,h2dt,c='#ffbf00')\n",
    "ax.plot(t,d2ht,c='#022851')\n",
    "ax.plot(t,d3t,c='#c10230')\n",
    "ax.set_ylim(.01,2000)\n",
    "ax.set_yscale('log')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As usual, it is a good idea to view the residuals between the model and the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def residuals(data,model):\n",
    "    return data - model\n",
    "\n",
    "def rms(data,model):\n",
    "    return np.sqrt(np.sum(residuals(data,model)**2))\n",
    "\n",
    "fig,ax = plt.subplots()\n",
    "\n",
    "ax.scatter(h3['time'],residuals(h3['number'],h30*np.exp(-k1*h3['time'])),color='#000000',label=r'H$_3^+$')\n",
    "ax.scatter(h2d['time'],residuals(h2d['number'],k1*h30/(k2-k1)*(np.exp(-k1*h2d['time']) - np.exp(-k2*h2d['time']))),color='#ffbf00',label=r'H$_2$D$^+$')\n",
    "ax.scatter(d2h['time'],residuals(d2h['number'],k1*k2*h30/(k2-k1)*((np.exp(-k1*d2h['time']) - np.exp(-k3*d2h['time']))/(k3-k1) - (np.exp(-k2*d2h['time'])-np.exp(-k3*d2h['time']))/(k3-k2))),color='#022851',label=r'D$_2$H$^+$')\n",
    "\n",
    "t2 = d3['time']\n",
    "h3t = (h30*np.exp(-k1*t2))\n",
    "h2dt = (k1*h30/(k2-k1)*(np.exp(-k1*t2) - np.exp(-k2*t2)))\n",
    "d2ht = (k1*k2*h30/(k2-k1)*((np.exp(-k1*t2) - np.exp(-k3*t2))/(k3-k1) - (np.exp(-k2*t2)-np.exp(-k3*t2))/(k3-k2)))\n",
    "d3t = (h30-h3t-h2dt-d2ht)\n",
    "ax.scatter(t2,residuals(d3['number'],d3t),color='#c10230',label=r'D$_3^+$')\n",
    "\n",
    "ax.legend()\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Rate coefficients from curve fitting\n",
    "\n",
    "Like last week, we can use `scipy.optimize.curve_fit` to fit x,y data to a model function. We can employ this approach to try and improve the agreement between our model function and the experimental data. Notice that \\[H$_3^+$\\](t) only depends on \\[H$_3^+$\\]$_0$ and $k_1$. That means we can get a value for $k_1$ just by fitting the H$_3^+$ data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import scipy.optimize as opt\n",
    "\n",
    "#c0 = [H3+]_0, and k = k1\n",
    "def h3p_fit(t,c0,k):\n",
    "    return c0*np.exp(-k*t)\n",
    "\n",
    "#set initial parameter estimates to [H3+]_0 = 1000 and k_1^(2) = 1e-9, so k_1 = 1e-9*[HD]\n",
    "popt, pcov = opt.curve_fit(h3p_fit,h3['time'],h3['number'],p0=[1000,1e-9*hd])\n",
    "\n",
    "fig,(ax1,ax2) = plt.subplots(2,1,figsize=(6,6),gridspec_kw={'height_ratios': [1,3]})\n",
    "\n",
    "ax2.scatter(h3['time'],h3['number'])\n",
    "ax2.plot(t,h3p_fit(t,*popt))\n",
    "ax2.set_xlabel('time (s)')\n",
    "ax2.set_ylabel(\"Number\")\n",
    "\n",
    "ax1.scatter(h3['time'],residuals(h3['number'],h3p_fit(h3['time'],*popt)))\n",
    "ax1.set_xlim(ax2.get_xlim())\n",
    "ax1.set_ylabel('Residuals')\n",
    "\n",
    "\n",
    "print(f'[H3+]0: {popt[0]:.2f} +/- {np.sqrt(pcov[0][0]):.2f}, k = {popt[1]/hd:.2e} +/- {np.sqrt(pcov[1][1])/hd:.2e}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To compare whether this new fit is \"better\" than the original model, we can compare the root-mean-square (RMS) values of the residuals. From these results, we can say that the new model more closely matches the experimental data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "h30*np.exp(-k1*h3['time'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "h3['number']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rms_orig = rms(h3['number'],h30*np.exp(-k1*h3['time']))\n",
    "rms_new = rms(h3['number'],h3p_fit(h3['time'],*popt))\n",
    "\n",
    "print(f'Old RMS: {rms_orig:.2f}; New RMS: {rms_new:.2f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have values for \\[H$_3^+$\\]$_0$ and $k_1$, we can use those values in the model for \\[H$_2$D$^+$\\] to extract $k_2$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "h30 = 919.31\n",
    "k1 = 1.32e-9*hd\n",
    "\n",
    "def h2d_fit(t,k2):\n",
    "    return k1*h30/(k2-k1)*(np.exp(-k1*t) - np.exp(-k2*t))\n",
    "\n",
    "popt,pcov = opt.curve_fit(h2d_fit,h2d['time'],h2d['number'],p0=[2e-9*hd])\n",
    "\n",
    "fig,(ax1,ax2) = plt.subplots(2,1,figsize=(6,6),gridspec_kw={'height_ratios': [1,3]})\n",
    "\n",
    "ax2.scatter(h2d['time'],h2d['number'])\n",
    "ax2.plot(t,h2d_fit(t,*popt))\n",
    "ax2.set_xlabel('time (s)')\n",
    "ax2.set_ylabel(\"Number\")\n",
    "\n",
    "ax1.scatter(h2d['time'],residuals(h2d['number'],h2d_fit(h2d['time'],*popt)))\n",
    "ax1.set_xlim(ax2.get_xlim())\n",
    "ax1.set_ylabel('Residuals')\n",
    "\n",
    "print(f'k2 = {popt[0]/hd:.2e} +/- {np.sqrt(pcov[0][0])/hd:.2e}; rms = {rms(h2d[\"number\"],h2d_fit(h2d[\"time\"],*popt)):.2f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is good, but the fit might be able to be improved by also optimizing \\[H$_3^+$\\]$_0$ and $k_1$ as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def h2d_fit(t,h30,k1,k2):\n",
    "    return k1*h30/(k2-k1)*(np.exp(-k1*t) - np.exp(-k2*t))\n",
    "\n",
    "#note, setting the initial values of k1 and k2 equal to each other would cause an error\n",
    "popt,pcov = opt.curve_fit(h2d_fit,h2d['time'],h2d['number'],\n",
    "                          p0=[950,1.5e-9*hd,1e-9*hd])\n",
    "\n",
    "fig,(ax1,ax2) = plt.subplots(2,1,figsize=(6,6),gridspec_kw={'height_ratios': [1,3]})\n",
    "\n",
    "ax2.scatter(h2d['time'],h2d['number'])\n",
    "ax2.plot(t,h2d_fit(t,*popt))\n",
    "ax2.set_xlabel('time (s)')\n",
    "ax2.set_ylabel(\"Number\")\n",
    "\n",
    "ax1.scatter(h2d['time'],residuals(h2d['number'],h2d_fit(h2d['time'],*popt)))\n",
    "ax1.set_xlim(ax2.get_xlim())\n",
    "ax1.set_ylabel('Residuals')\n",
    "\n",
    "for i,x in enumerate(['[H3+]0','k1','k2']):\n",
    "    den = hd\n",
    "    if i==0:\n",
    "        den = 1.\n",
    "    print(f'{x} = {popt[i]/den:.2e} +/- {np.sqrt(pcov[i][i])/den:.2e}')\n",
    "\n",
    "print(f'rms = {rms(h2d[\"number\"],h2d_fit(h2d[\"time\"],*popt)):.2f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The fit initially looks better, but when you look at the parameters, there are some problems. The $1\\sigma$ uncertainties on the parameter estimates are larger, and in the case of \\[H$_3^+$\\]$_0$, the error is larger than the parameter value! Clearly this model cannot do a good job of extracing the initial density of H$_3^+$. The main issue is that the covariance matrix has very large off-diagonal elements that indicate strong correlations between parameters. This means that the model performs almost identically well if two parameters are varied together. For instance, the `pcov[0][1]` element shows the covariance between \\[H$_3^+$\\]$_0$ and $k_1$ as -1.6e7, which is midway between the variances of those two parameters (given by `pcov[0][0]` and `pcov[1][1]`, respectively). This means that the model performs nearly equally well if \\[H$_3^+$\\]$_0$ increases and $k_1$ decreases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pcov"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another issue: if we take the values of \\[H$_3^+$\\]$_0$ and $k_1$ from this model and use them to predict \\[H$_3^+$\\], we see that the results are much worse than when we only fit H$_3^+$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig,(ax1,ax2) = plt.subplots(2,1,figsize=(6,6),gridspec_kw={'height_ratios': [1,3]})\n",
    "\n",
    "ax2.scatter(h3['time'],h3['number'])\n",
    "ax2.plot(t,h3p_fit(t,*popt[0:2]))\n",
    "ax2.set_xlabel('time (s)')\n",
    "ax2.set_ylabel(\"Number\")\n",
    "\n",
    "ax1.scatter(h3['time'],residuals(h3['number'],h3p_fit(h3['time'],*popt[0:2])))\n",
    "ax1.set_xlim(ax2.get_xlim())\n",
    "ax1.set_ylabel('Residuals')\n",
    "\n",
    "\n",
    "print(f'rms = {rms(h3[\"number\"],h3p_fit(h3[\"time\"],*popt[0:2])):.2f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simultaneous fitting with `scipy.optimize.least_squares`\n",
    "\n",
    "We've seen that the `curve_fit` approach we have been using is somewhat limited because it only handles one curve at a time. What we'd really like to do is optimize all 4 parameters (\\[H$_3^+$\\]$_0$, $k_1$, $k_2$, and $k_3$) while minimizing the squared differences between _all_ of our experimental data points. This is beyond the capabilities of the `curve_fit` function, which is designed only to fit one set of y data.\n",
    "\n",
    "We can accomplish this simultaneous fitting task with the help of the [`scipy.optimize.least_squares`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html) function. Take a moment to look at the documentation. The operation of `least_squares` may be a bit nonintuitive, especially if you are used to `curve_fit`. Instead of supplying a fit function, x data points, and y data points, when using `least_squares`, you provide a function that produces an array containing the _residuals_ of the model function(s) given an array of parameter values `x`. Conceptually, `least_squares` calculates a total cost function `F` (by default, this is equal to `0.5*np.sum(residuals**2)`) and attempts to minimize it by varying the parameters. To vary the parameters, it evaluates the derivative of the cost function with respect to each parameter. By default, this is done by finite differences: calculating `F` with slightly different parameter values `x` and computing $\\Delta F/\\Delta x_i$, but if you can compute the derivative analytically, you can provide this yourself via the Jacobian matrix (argument: `jac`). We could do this for our system: we would need to evaluate the derivatives of each rate equation with respect to each parameter. However, the finite difference method is usually sufficient, and the only real advantage to proviging the Jacobian matrix is to speed up the calculation by decreasing the number of times `F` has to be calculated. For us, calculating `F` is fast, so it's not a problem to do it; however, if computing `F` takes a long time, then it is beneficial to provide the Jacobian.  It reaches the endpoint when one of the following conditions is met (each can be controlled by an optional argument):\n",
    "\n",
    "- `max_nfev`: When the number of function evaluations exceeds a certain value (default, 100 times the number of parameters)\n",
    "- `ftol`: When the change in the cost function satisfies `dF < ftol*F`. By default, `ftol=1e-8`\n",
    "- `xtol`: When the change in the parameters becomes small. \"Small\" means different things depending on the exact algorithm used, so see the documentation for details\n",
    "- `gtol`: When the norm of the gradient of the cost function is small. Again, see documentation for exact details, as this varies by algorithm\n",
    "- If an error occurs in the calculation\n",
    "\n",
    "The function provides three different solvers: `lm`, `trf` (default), and `dogbox`, which each have their own advantages and disadvantages, and in some cases additional optional arguments. Sticking with the default `trf` or `dogbox` allows you to optionally provide `bounds` on each of the parameters: upper and lower limits that the optimization must respect.\n",
    "\n",
    "Also very important to understand is the meaning of the `loss` argument: this controls how the cost function `F` is computed from the residuals. The detault, `linear`, provides a normal least squares solution. However, more generally, a loss function `rho` can be applied to each squared residual before it is summed: `F = 0.5 * np.sum( rho(residuals**2) )`. This can be used, for example, to make a fit more robust in the presence of outlier data or do other more advanced types of fitting. Using the option `soft_l1`, for example, takes a squared residual `z` and calculates `2 * ((1 + z)**0.5 - 1)`. This function behaves least squares for small residuals, but is proportional to the absolute value of the residuals when they are larger, making the cost function less sensitive to large outliers. Most often, though, sticking with the default is a good idea.\n",
    "\n",
    "Finally, the `least_squares` can take optional `*args` and `**kwargs` parameters that are passed through to the loss function. Use those to pass the experimental data to fit. Here, we'll construct our loss function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#redefine individual fit functions in case they've been changed above\n",
    "def h3p_fit(t,c0,k):\n",
    "    return c0*np.exp(-k*t)\n",
    "\n",
    "def h2d_fit(t,h30,k1,k2):\n",
    "    return k1*h30/(k2-k1)*(np.exp(-k1*t) - np.exp(-k2*t))\n",
    "\n",
    "#make similar functions for d2h+ and d3+\n",
    "def d2h_fit(t,h0,k1,k2,k3):\n",
    "    return (k1*k2*h0/(k2-k1)*((np.exp(-k1*t) - np.exp(-k3*t))/(k3-k1) - (np.exp(-k2*t)-np.exp(-k3*t))/(k3-k2)))\n",
    "\n",
    "def d3_fit(t,h0,k1,k2,k3):\n",
    "    return h0 - h3p_fit(t,h0,k1) - h2d_fit(t,h0,k1,k2) - d2h_fit(t,h0,k1,k2,k3)\n",
    "\n",
    "#create the function F. x contains the parameters [h30,k1,k2,k3], and we'll use kwargs to pass the data\n",
    "def total_fit(x,*args,**kwargs):\n",
    "    h3 = kwargs['h3']\n",
    "    h2d = kwargs['h2d']\n",
    "    d2h = kwargs['d2h']\n",
    "    d3 = kwargs['d3']\n",
    "    \n",
    "    #compute residuals for each curve\n",
    "    r1 = residuals(h3['number'],h3p_fit(h3['time'],*x[0:2]))\n",
    "    r2 = residuals(h2d['number'],h2d_fit(h2d['time'],*x[0:3]))\n",
    "    r3 = residuals(d2h['number'],d2h_fit(d2h['time'],*x))\n",
    "    r4 = residuals(d3['number'],d3_fit(d3['time'],*x))\n",
    "    \n",
    "    #concatenate all residuals into a single array and return\n",
    "    return np.concatenate([r1,r2,r3,r4])\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first argument of `total_fit` is the array of parameter values (\\[H$_3^+$\\]$_0$, $k_1$, $k_2$, and $k_3$), in that order. To get the data points into the function, we can pass them either through the `args` or `kwargs` parameter. In this case, I chose to place the `h3`, `h2d`, `d2h`, and `d3` dataframes into a dictionary that is passed as `kwargs`. Inside the `total_fit` function, we extract the data from `kwargs` using the keys that we set in the dictionary. The function then computes the residuals for each curve and concatenates them into an array, which is returned.\n",
    "\n",
    "With this in place, we can now create the data dictionary and invoke the `least_squares` optimization. We pass initial guesses for the parameters using the `x0` argument, the data using `kwargs`, and I've set `verbose=1` so that a line is printed with information about the optimization at the end. The results of the optimization are captured in the `result` variable. In the code below, `result.fun` contains the return value of the `total_fit` function (i.e., the residuals) with the optimized parameters; I use that to print the rms of the residuals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = {\n",
    "    'h3' : h3,\n",
    "    'h2d' : h2d,\n",
    "    'd2h' : d2h,\n",
    "    'd3' : d3\n",
    "}\n",
    "\n",
    "total_fit([950,1.4e-9*hd,1.3e-9*hd,1e-9*hd],**data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "result = opt.least_squares(total_fit,x0=[950,1.4e-9*hd,1.3e-9*hd,1e-9*hd],\n",
    "                           verbose=1,kwargs=data)\n",
    "print(np.sqrt(np.sum(result.fun**2)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we see that the optimization completed because the change in the cost function from one step to the next was sufficiently small. This is usually what we want: it means that all parameters have likely reached their optimal values. If, instead, `xtol` was the termination condition, it likely means that we would need to rescale one or more of the parameters to minimize the cost function. We can also see that the optimization was fast: only 5 function evaluations were required. In part this is because we had good initial guesses for the parameters. The results are sensitive to this initial choice: bad guesses can cause the optimization to find a local minimum that is not the best fit, or can cause the fit to fail. For this reason, it is very important to have a good strategy for coming up with initial guesses! Here we could take values from the paper, but if we didn't have those, we could have reasoned out approximate values by plotting the rate equations with different trial values, or by reasoning out what the values need to be based on the form of the data.\n",
    "\n",
    "The `result` object contains many useful outputs as described in the documentation. Among these, the most important are `x`, the optimized parameters, and `jac`, which contains the Jacobian $\\bar{J}$ evaluated at the solution (the matrix of derivatives). This matrix can be used to compute the covariance matrix $\\bar{\\sigma}^2$, which is the inverse of the Hessian  $\\bar{H}$ of the cost function (the matrix of second derivatives of the cost function with respect to the parameters):\n",
    "\n",
    "$$ \\bar{\\sigma}^2 = \\bar{H}^{-1} \\approx (\\bar{J}^T\\bar{J})^{-1} $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy.linalg as la\n",
    "\n",
    "pcov = la.inv(result.jac.T @ result.jac)\n",
    "\n",
    "for i,x in enumerate(['[H3+]0','k1','k2','k3']):\n",
    "    den = hd\n",
    "    if i==0:\n",
    "        den = 1.\n",
    "    print(f'{x} = {result.x[i]/den:.2e} +/- {np.sqrt(pcov[i][i])/den:.2e}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These are much more satisfactory: all of the parameters have small uncertainties, and we can see that the off-diagonal terms in the covariance matrix are fairly small."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pcov"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we can make a plot of the complete model and residuals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, (ax1,ax2) = plt.subplots(2,1,figsize=(6,6),gridspec_kw={'height_ratios': [1,3]})\n",
    "\n",
    "ax2.scatter(h3['time'],h3['number'],color='#000000',label=r'H$_3^+$')\n",
    "ax2.scatter(h2d['time'],h2d['number'],color='#ffbf00',label=r'H$_2$D$^+$')\n",
    "ax2.scatter(d2h['time'],d2h['number'],color='#022851',label=r'D$_2$H$^+$')\n",
    "ax2.scatter(d3['time'],d3['number'],color='#c10230',label=r'D$_3^+$')\n",
    "\n",
    "ax2.set_xlabel(\"Time (s)\")\n",
    "ax2.set_ylabel(\"Number\")\n",
    "ax2.legend()\n",
    "ax2.plot(t,h3p_fit(t,*result.x[0:2]),c='#000000')\n",
    "ax2.plot(t,h2d_fit(t,*result.x[0:3]),c='#ffbf00')\n",
    "ax2.plot(t,d2h_fit(t,*result.x),c='#022851')\n",
    "ax2.plot(t,d3_fit(t,*result.x),c='#c10230')\n",
    "\n",
    "ax1.scatter(h3['time'],residuals(h3['number'],h3p_fit(h3['time'],*result.x[0:2])),color='#000000')\n",
    "ax1.scatter(h2d['time'],residuals(h2d['number'],h2d_fit(h2d['time'],*result.x[0:3])),color='#ffbf00')\n",
    "ax1.scatter(d2h['time'],residuals(d2h['number'],d2h_fit(d2h['time'],*result.x)),color='#022851')\n",
    "ax1.scatter(d3['time'],residuals(d3['number'],d3_fit(d3['time'],*result.x)),color='#c10230')\n",
    "ax1.set_xlim(ax2.get_xlim())\n",
    "ax1.set_ylabel('Residuals')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or on a log scale:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ax2.set_yscale('log')\n",
    "ax2.set_ylim(.01,2000)\n",
    "fig"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The final result, though not perfect, is the best fit we can make given the integrated rate equations that we are using. It is best to use this global fitting approach because it makes maximum use of how the fit parameters are related to all of the experimental data, not just a part of it. For example, the best probe of the \\[H$_3^+$\\]$_0$ parameter is actually \\[D$_3^+$\\], because at long times, our model predicts that essentially all of the starting H$_3^+$ is converted to D$_3^+$. This in turn yields a different value (932) than we obtained from the fit of H$_3^+$ itself (919), which in turn affects $k_1$, and so on. All of the parameters are coupled, so fitting all of the data simultanously gives the best estimate of the parameter values.\n",
    "\n",
    "Clearly, though, despite these efforts the model does not completely account for the trends seen in the data. One possibility is that \\[HD\\] is not actually constant, but starts to run out near the end of the experiment. That would cause the conversion of D$_2$H$^+$ $\\to$ D$_3^+$ to slow down. In order to work this in, we would need to rederive the rate equations, and it is possible that there is no closed-form solution to the integrals that come out of that process. Another potential issue is that ion traps are not perfect: some ions are lost over time either because they escape the trap or because they are neutralized by another reaction. Adding that process in would require refderiving integrated rate equations yet again, and that becomes very difficult! If we had a more complicated system of reactions, it would be completely impossible to use our current approach. We will look at another method next week."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}