{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NMR Analysis of POSS-triol Catalyzed Indole Addition Reaction\n",
    "\n",
    "In this notebook we will analyze $^{19}$F NMR kinetics data for the reaction shown below:\n",
    "\n",
    "![Indole Reaction](https://leeping.github.io/che155/assets/images/week04/poss-indole.png)\n",
    "\n",
    "The trifluoromethlynitrostyrene reactant **A**, the internal standard **D**, and product **E** were observed with NMR. The kinetics data are contained inside the `data` folder, which contains multiple subfolders. Each folder is a single time point, and the name of the folder is the time in minutes. Inside each time folder are three csv files:\n",
    "- `S.csv` contains the data for the internal standard **D** at -113.2 ppm\n",
    "- `R.csv` contains the data for the reactant **A** at -62.9 ppm\n",
    "- `P.csv` contains the data for the product **E** at -62.2 ppm\n",
    "\n",
    "## Inspecting the Data\n",
    "\n",
    "The first thing to do is take a look at the data files. We'll choose the first folder (`0007`) and read in the three csv files using `pandas`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "R = pd.read_csv('data/0007/R.csv')\n",
    "P = pd.read_csv('data/0007/P.csv')\n",
    "S = pd.read_csv('data/0007/S.csv')\n",
    "\n",
    "R"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each file contains a column `'delta'` with the chemical shifts, and `'y`' with the NMR data points. Using matplotlib, we can easily plot the three dataframes. The units of the x axis are chemical shift ($\\delta$) in ppm, and the units on the y axis are arbitrary so it is fine to leave that axis unlabeled."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig,axes = plt.subplots(1,3,figsize=(18,5))\n",
    "for ax,df in zip(axes,[S,R,P]):\n",
    "    ax.plot(df['delta'],df['y'])\n",
    "    ax.set_xlabel(r'$\\delta$ (ppm)')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice how the labels on the x axis are overlapping. This is because matplotlib is using too many ticks for the number of digits being displayed. One way to fix this is to set the tick positions manually. We can take the first $\\delta$ value, the middle $\\delta$ value, and the last $\\delta$ value if we want 3 ticks:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig,axes = plt.subplots(1,3,figsize=(18,5))\n",
    "for ax,df in zip(axes,[S,R,P]):\n",
    "    ax.plot(df['delta'],df['y'])\n",
    "    ax.set_xlabel(r'$\\delta$ (ppm)')\n",
    "    x = df['delta'].to_numpy()\n",
    "    ax.set_xticks([x[0],x[x.size//2],x[-1]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, we can customize the [tick locator](https://matplotlib.org/gallery/ticks_and_spines/tick-locators.html) for the plot axis. More complete documentation is available in the [Ticker API](https://matplotlib.org/3.3.3/api/ticker_api.html). To achieve similar results as manually locating 3 ticks, we can use the `LinearLocator` with `numticks` set to 3:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig,axes = plt.subplots(1,3,figsize=(18,5))\n",
    "for ax,df in zip(axes,[S,R,P]):\n",
    "    ax.plot(df['delta'],df['y'])\n",
    "    ax.set_xlabel(r'$\\delta$ (ppm)')\n",
    "    ax.xaxis.set_major_locator(plt.LinearLocator(numticks=3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "More aesthetically pleasing results can be obtained with the `MaxNLocator`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig,axes = plt.subplots(1,3,figsize=(18,5))\n",
    "for ax,df in zip(axes,[S,R,P]):\n",
    "    ax.plot(df['delta'],df['y'])\n",
    "    ax.set_xlabel(r'$\\delta$ (ppm)')\n",
    "    ax.xaxis.set_major_locator(plt.MaxNLocator(5))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It would be tedious to inspect all of the data one-by-one, so instead we can load in all the data at once and plot it. First we want to get a list of all of the folders. We could type it in manually, or make use of python's [`os.listdir`](https://docs.python.org/3/library/os.html#os.listdir) function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "dirs = os.listdir('data')\n",
    "dirs.sort()\n",
    "dirs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the list of folders in hand, we can create a list of dataframes for each molecule, and graph them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "P = []\n",
    "R = []\n",
    "S = []\n",
    "\n",
    "for d in dirs:\n",
    "    P.append(pd.read_csv(f'data/{d}/P.csv'))\n",
    "    R.append(pd.read_csv(f'data/{d}/R.csv'))\n",
    "    S.append(pd.read_csv(f'data/{d}/S.csv'))\n",
    "\n",
    "fig,axes = plt.subplots(1,3,figsize=(18,5))\n",
    "for ax,l in zip(axes,[S,R,P]):\n",
    "    for df in l:\n",
    "        ax.plot(df['delta'],df['y'])\n",
    "            \n",
    "    ax.set_xlabel(r'$\\delta$ (ppm)')\n",
    "    ax.xaxis.set_major_locator(plt.MaxNLocator(5))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's a bit hard to tell what's going on in these plots. One way to make it cleaner is to highlight the first and last time points, and make the other ones mostly transparent and gray:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig,axes = plt.subplots(1,3,figsize=(18,5))\n",
    "for ax,l in zip(axes,[S,R,P]):\n",
    "    for i,df in enumerate(l):\n",
    "        if i == 0:\n",
    "            ax.plot(df['delta'],df['y'],c='#022851',label=f't = {int(dirs[i])} min')\n",
    "        elif i == len(dirs)-1:\n",
    "            ax.plot(df['delta'],df['y'],c='#c10230',label=f't = {int(dirs[i])} min')\n",
    "        else:\n",
    "            ax.plot(df['delta'],df['y'],color='#00000011')\n",
    "            \n",
    "    ax.set_xlabel(r'$\\delta$ (ppm)')\n",
    "    ax.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "    \n",
    "axes[0].legend()\n",
    "axes[1].legend()\n",
    "axes[2].legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that each peak drifts to the right over time; this is a well-known phenomenon in $^{19}$F NMR spectroscopy. We are less interested in the drift, but more interested in the area beneath the peaks. The area under the internal standard curve remains more or less constant, while the reactant decreases over time and the product increases as we would expect.\n",
    "\n",
    "Another common visualization for data that has both a frequency and time axis is a waterfall plot. Matplotlib does not have a built-in waterfall plot function, but it is possible to make them. The implementation below is a bit fancy because it also colors the data. It makes use of the [`matplotlib.colors.Normalize`](https://matplotlib.org/api/_as_gen/matplotlib.colors.Normalize.html) object to make sure that each line on the plot uses the same color scale as well as the [`matplotlib.collections.LineCollection`](https://matplotlib.org/gallery/shapes_and_collections/line_collection.html) object to represent a set of lines on the plot. Each individual spectrum is converted into a LineCollection, which contains a line connecting each (x,z) pair of data points in the spectrum."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mpl_toolkits.mplot3d import Axes3D\n",
    "from matplotlib.collections import LineCollection\n",
    "\n",
    "\n",
    "def waterfall(fig,ax,Z,x,y):\n",
    "    norm = plt.Normalize(Z.min(),Z.max())\n",
    "    \n",
    "    #loop over each column in Z\n",
    "    #we want to add a LineCollection for each spectrum\n",
    "    #the linecollection consists of a series of line segments connecting the points\n",
    "    #each line segment has 2 xy values: the first goes from (x0,y0) to (x1,y1); the next from (x1,y1) to (x2,y2), and so on\n",
    "    for i,z in enumerate(Z[:,].T):\n",
    "        \n",
    "        #create an array of (x,y) values, insert a dummy axis that will be used for concatenation\n",
    "        #pz.shape = (N,1,2) where N = number of points in the spectrum\n",
    "        #pz[0] is an array of length 1 that contains an array [x0,z0]\n",
    "        pz = np.array([x,z]).T.reshape(-1,1,2)\n",
    "        \n",
    "        #we want each segment to be an array with shape(2,2) that will contain the values [[x_n,zn],[x_n+1,z_n+1]]\n",
    "        #so we take the list of all points but the last, and tack on a list of all points but the first.\n",
    "        #using np.concatenate, we perform the concatenation along axis 1\n",
    "        #as a result, p will have shape(2N-1,2,2)\n",
    "        p = np.concatenate([pz[:-1],pz[1:]],axis=1)\n",
    "        \n",
    "        #create the line collection, and color each segment with the average of its two z values\n",
    "        lc = LineCollection(p, cmap='jet', norm=norm, array=(z[1:]+z[:-1])/2, linewidths=2,alpha=0.5)\n",
    "        \n",
    "        #add the collection to the plot, giving it a y value corresponding to the appropriate time value\n",
    "        ax.add_collection3d(lc,zs=y[i],zdir='y')\n",
    "    \n",
    "    ax.set_xlim3d(x.min(),x.max())\n",
    "    ax.xaxis.set_major_locator(plt.MaxNLocator(4))\n",
    "    ax.set_xlabel(r'$\\delta$ (ppm)')\n",
    "    ax.set_ylim3d(y.min(),y.max())\n",
    "    ax.yaxis.set_major_locator(plt.MaxNLocator(4))\n",
    "    ax.set_ylabel(r't (min)')\n",
    "    ax.set_zlim3d(Z.min(),Z.max())\n",
    "    fig.colorbar(lc)\n",
    "\n",
    "#create 2D numpy array containing all the spectra for each molecule:\n",
    "Parr = np.zeros((len(P[0]['delta']),len(dirs)))\n",
    "Rarr = np.zeros((len(R[0]['delta']),len(dirs)))\n",
    "Sarr = np.zeros((len(S[0]['delta']),len(dirs)))\n",
    "for i in range(0,len(dirs)):\n",
    "    Parr[:,i] = P[i]['y'].to_numpy()\n",
    "    Rarr[:,i] = R[i]['y'].to_numpy()\n",
    "    Sarr[:,i] = S[i]['y'].to_numpy()\n",
    "\n",
    "#create array for the chemical shift axis; it is the same for all time points.\n",
    "Pdelta = P[0]['delta']\n",
    "Rdelta = R[0]['delta']\n",
    "Sdelta = S[0]['delta']\n",
    "\n",
    "#create an array of time values\n",
    "time = np.array(dirs,dtype=int)\n",
    "\n",
    "\n",
    "fig = plt.figure(figsize=(20,6),dpi=200)\n",
    "ax = fig.add_subplot(131,projection='3d')\n",
    "waterfall(fig,ax,Sarr,Sdelta,time)\n",
    "ax.view_init(elev=15,azim=-80)\n",
    "ax.set_title('Internal Standard')\n",
    "\n",
    "ax = fig.add_subplot(132,projection='3d')\n",
    "waterfall(fig,ax,Rarr,Rdelta,time)\n",
    "ax.view_init(elev=15,azim=-80)\n",
    "ax.set_title('Reactant')\n",
    "\n",
    "ax = fig.add_subplot(133,projection='3d')\n",
    "waterfall(fig,ax,Parr,Pdelta,time)\n",
    "ax.view_init(elev=15,azim=-80)\n",
    "ax.set_title('Product')\n",
    "\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of course, we can also use `pcolormesh` for a false color plot. There is a lot of value in being comfortable with visualizing the data in different ways!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Sarr[:,0].T.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "plt.Axes.pcolormesh?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig,axes = plt.subplots(3,1,figsize=(8,9))\n",
    "\n",
    "p1 = axes[0].pcolormesh(Sdelta,time,Sarr.T,shading='gouraud',cmap='cividis')\n",
    "fig.colorbar(p1,ax=axes[0])\n",
    "axes[0].set_title(\"Standard\")\n",
    "axes[0].xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "axes[0].set_xlabel(r'$\\delta$ (ppm)')\n",
    "axes[0].set_ylabel('t (min)')\n",
    "\n",
    "p2 = axes[1].pcolormesh(Rdelta,time,Rarr.T,shading='gouraud',cmap='cividis')\n",
    "axes[1].set_title(\"Reactant\")\n",
    "axes[1].xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "axes[1].set_xlabel(r'$\\delta$ (ppm)')\n",
    "axes[1].set_ylabel('t (min)')\n",
    "fig.colorbar(p2,ax=axes[1])\n",
    "\n",
    "p3 = axes[2].pcolormesh(Pdelta,time,Parr.T,shading='gouraud',cmap='cividis')\n",
    "axes[2].set_title(\"Product\")\n",
    "axes[2].xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "axes[2].set_xlabel(r'$\\delta$ (ppm)')\n",
    "axes[2].set_ylabel('t (min)')\n",
    "fig.colorbar(p3,ax=axes[2])\n",
    "\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Peak Integration\n",
    "\n",
    "Our goal is to determine the concentration of the reactant and product as a function of time. In $^{19}$F NMR spectroscopy, the peak area $A$ is proportional to the concentration of the molecule \\[**X**\\] times the number of F atoms that give rise to the peak $N_F$:\n",
    "\n",
    "$$ A = \\chi N_F[\\textbf{X}] $$\n",
    "\n",
    "where $\\chi$ is the instrumental response factor (which is equal for each peak). We have 3 molecules: the internal standard **S** which has a known concentration of 0.11 M, the reactant **R**, and the product **P**. The signals from **R** and **P** cone from a trifluoromethyl group, while the signal from **S** comes from a single F atom. Therefore we can write:\n",
    "\n",
    "$$ A_\\textbf{S} = \\chi 0.11\\text{ M} $$\n",
    "$$ A_\\textbf{R} = \\chi 3[\\textbf{R}] $$\n",
    "$$ A_\\textbf{P} = \\chi 3[\\textbf{P}] $$\n",
    "\n",
    "Dividing equations cancels $\\chi$ and allows us to solve for the concentrations of **R** and **P**:\n",
    "\n",
    "$$ [\\textbf{R}] = \\frac{0.11\\text{ M}}{3}\\frac{A_\\textbf{R}}{A_\\textbf{S}}, \\quad [\\textbf{P}] = \\frac{0.11\\text{ M}}{3}\\frac{A_\\textbf{P}}{A_\\textbf{S}} $$\n",
    "\n",
    "To determine the concentrations, we need to calculate the peak areas. For a peak described by a 1D function $f(x)$, the peak area is given by its integral:\n",
    "\n",
    "$$ A = \\int_{-\\infty}^{\\infty} f(x)\\,\\text{d}x $$\n",
    "\n",
    "However, we do not have a continuous function $f(x)$; we have a discrete set of $N$ data points $f(x_1), f(x_2), \\ldots, f(x_N)$. Instead, we can approximate the integral as a [Riemann Sum](https://en.wikipedia.org/wiki/Riemann_sum), which is like adding up a set of rectangles located at each datapoint:\n",
    "\n",
    "$$ \\int_a^b f(x)\\,\\text{d}x \\approx \\sum_i f(x_i) \\Delta x_i $$\n",
    "\n",
    "where $a$ and $b$ are the endpoints of the integration, and $\\Delta x_i$ is $|x_i - x_{i+1}|$, the spacing between points $x_i$ and $x_{i+1}$. A discrete Riemann sum can be computed as either a left-sum or a right-sum, illustrated below as a function of the point spacing:\n",
    "\n",
    "$$ \\text{Left sum} = \\sum_{i=1}^{N-1} f(x_i) \\Delta x_i, \\quad \\text{Right sum} = \\sum_{i=1}^{N-1} f(x_{i+1}) \\Delta x_i$$\n",
    "\n",
    "| Left Sum | Right Sum |\n",
    "| --- | --- |\n",
    "| [![Left Sum Image 09glasgow09 CC-BY-SA-3.0](https://upload.wikimedia.org/wikipedia/commons/1/19/Riemann_sum_%28leftbox%29.gif)](https://commons.wikimedia.org/w/index.php?curid=7697902) | [![Right Sum Image 09glasgow09 CC-BY-SA-3.0](https://upload.wikimedia.org/wikipedia/commons/6/61/Riemann_sum_%28rightbox%29.gif)](https://commons.wikimedia.org/w/index.php?curid=7697920) |\n",
    "\n",
    "The smaller $\\Delta x_i$, the more accurate the integral, and in the limit that $\\Delta x_i \\to 0$, all three methods yield the same value, which is equal to the integral. As a special case, if all of the data points are evenly spaced, then $\\Delta x$ is a constant, and the integral is proportional to the sum of the data points (show here as the left sum):\n",
    "\n",
    "$$ \\int_a^b f(x)\\,\\text{d}x \\quad \\approx \\quad \\sum_{i=1}^{N-1} f(x_i) \\Delta x_i \\quad = \\quad \\Delta x\\sum_{i=1}^{N-1} f(x_i) \\quad \\propto \\quad \\sum_{i=1}^{N-1} f(x_i) $$\n",
    "\n",
    "Fortunately for us, our data are evenly spaced. And since we only care about the ratios of the areas, the factors of $\\Delta x$ will cancel when we calculate concentration anyways, so we can make use of the [`numpy.cumsum`](https://numpy.org/doc/stable/reference/generated/numpy.cumsum.html) function. This function takes an array as an argument and returns an array whose elements are the cumulative sum of the elements of the input. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.cumsum(np.array([1,2,3,4,5,6]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we want to estimate the area under the curve $f(x) = x$ from $x=1$ to $x=6$, with $\\Delta x = 1$ we can use `numpy.cumsum`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fx = np.array([1,2,3,4,5,6])\n",
    "ls = np.cumsum(fx[:-1]) #add all the points except the last\n",
    "rs = np.cumsum(fx[1:]) # add all the points except the first\n",
    "\n",
    "print(f'Left sum: {ls}, Integral = {ls[-1]}')\n",
    "print(f'Right sum: {rs}, Integral = {rs[-1]}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The exact value of the integral of course is 17.5. As we make $\\Delta x$ smaller, the left and right sums both asymptotically approach the correct answer (as long as we multiply by $\\Delta x$):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fx = np.linspace(1,6,100)\n",
    "ls = np.cumsum(fx[:-1])*(fx[1]-fx[0]) #add all the points except the last\n",
    "rs = np.cumsum(fx[1:])*(fx[1]-fx[0]) # add all the points except the first\n",
    "\n",
    "print(f'Left sum: {ls}, Integral = {ls[-1]}')\n",
    "print(f'Right sum: {rs}, Integral = {rs[-1]}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's apply this to one of our curves and visualize the result (note that the x data starts at -113.05 and ends at -113.25, so the data go from right to left). Note that the first and last point of each spectrum are 0, so to show the difference between left and right integration, we're slicing those points off of the ends."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fx = S[0]['delta'].to_numpy()[1:-1]\n",
    "fy = S[0]['y'].to_numpy()[1:-1]\n",
    "\n",
    "fig,(ax0,ax1) = plt.subplots(1,2,figsize=(12,4))\n",
    "ax0.plot(fx,fy)\n",
    "ax0.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax0.set_xlabel(r'$\\delta$ (ppm)')\n",
    "ax0.set_title('Peak')\n",
    "\n",
    "ls = np.cumsum(fy[:-1])\n",
    "rs = np.cumsum(fy[1:])\n",
    "ax1.plot(fx[:-1],ls,color='#022851',label=f'Left Sum')\n",
    "ax1.plot(fx[1:],rs,color='#ffbf00',label=f'Right Sum')\n",
    "ax1.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax1.set_xlabel(r'$\\delta$ (ppm)')\n",
    "ax1.set_title('Cumulative Sum')\n",
    "ax1.legend()\n",
    "\n",
    "print(f'Left sum = {ls[-1]:.1f}, Right sum = {rs[-1]:.1f}')\n",
    "print(f'Left integral = {ls[-1]*np.abs(fx[1]-fx[0]):.1f}, Right integral = {rs[-1]*np.abs(fx[1]-fx[0]):.1f}')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have so many data points that are spaced so closely together that the left and right sums agree within 0.1%; the differences are in the 6th digit of the integral. However, if we had fewer data points, or if the points were not evenly spaced, the differences might have been larger. In such cases, the [Trapozoidal Rule](https://en.wikipedia.org/wiki/Trapezoidal_rule) can be used. In practice, it gives a value in between the left sum integral and the right sum integral. When the points are evenly spaced, the trapezoid integration is exactly the average of the two.\n",
    "\n",
    "The [`scipy.integrate`](https://docs.scipy.org/doc/scipy/reference/integrate.html) module contains a number of functions for numerical integration. When using discrete samples instead of analytical functions, there are three common functions to use:\n",
    "- [`scipy.integrate.trapezoid`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.trapezoid.html) Integrates using the trapezoid rule and returns the value of the integral. This function was renamed in Scipy 1.6.0; its old name was `trapz`, and that name is used below.\n",
    "- [`scipy.integrate.cumulative_trapezoid`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.cumulative_trapezoid.html) Similar to `cumsum`, but contains the cumulative trapezoid rule integral instead of the cumulative sum. It was also renamed in Scipy 1.6.0; its old name was `cumtrapz`.\n",
    "- [`scipy.integrate.simpson`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.simpson.html) Integrates using [Simpson's Rule](https://en.wikipedia.org/wiki/Simpson%27s_rule), which approximates the function as a quadratic polynomial between points instead of as a line as in the trapezoid rule."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import scipy.integrate as spi\n",
    "\n",
    "fx = S[0]['delta'].to_numpy()[1:-1]\n",
    "fy = S[0]['y'].to_numpy()[1:-1]\n",
    "\n",
    "fig,(ax0,ax1) = plt.subplots(1,2,figsize=(12,4))\n",
    "ax0.plot(fx,fy)\n",
    "ax0.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax0.set_xlabel(r'$\\delta$ (ppm)')\n",
    "ax0.set_title('Peak')\n",
    "#ax0.set_xlim(-113.17,-113.13)\n",
    "\n",
    "ls = np.cumsum(fy[:-1])*np.abs(fx[1]-fx[0])\n",
    "rs = np.cumsum(fy[1:])*np.abs(fx[1]-fx[0])\n",
    "ts = -spi.cumtrapz(fy,fx)\n",
    "ax1.plot(fx[:-1],ls,color='#022851',label=f'Left Sum Integral')\n",
    "ax1.plot(fx[1:],rs,color='#ffbf00',label=f'Right Sum Integral')\n",
    "ax1.plot(fx[:-1],ts,color='#c10230',label=f'Trapezoidal')\n",
    "ax1.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax1.set_xlabel(r'$\\delta$ (ppm)')\n",
    "ax1.set_title('Integral')\n",
    "ax1.legend()\n",
    "\n",
    "print(f'Left =\\t {ls[-1]:.1f}\\nRight =\\t {rs[-1]:.1f}\\nTrapz =\\t {-spi.trapz(fy,fx):.1f}\\nSimps =\\t {-spi.simps(fy,fx):.1f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Clearly, the method we choose to integrate this peak makes little difference! However, there is one potential issue with numerical integration which become more apparent if we zoom in vertically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fx = S[0]['delta'].to_numpy()[1:-1]\n",
    "fy = S[0]['y'].to_numpy()[1:-1]\n",
    "\n",
    "fig,(ax0,ax1) = plt.subplots(1,2,figsize=(12,4))\n",
    "ax0.plot(fx,fy)\n",
    "ax0.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax0.set_xlabel(r'$\\delta$ (ppm)')\n",
    "ax0.set_title('Peak')\n",
    "ax0.set_ylim(-1e6,1e7)\n",
    "\n",
    "ls = np.cumsum(fy[:-1])*np.abs(fx[1]-fx[0])\n",
    "rs = np.cumsum(fy[1:])*np.abs(fx[1]-fx[0])\n",
    "ts = -spi.cumtrapz(fy,fx)\n",
    "ax1.plot(fx[:-1],ls,color='#022851',label=f'Left Sum Integral')\n",
    "ax1.plot(fx[1:],rs,color='#ffbf00',label=f'Right Sum Integral')\n",
    "ax1.plot(fx[:-1],ts,color='#c10230',label=f'Trapezoidal')\n",
    "ax1.set_ylim(2.9e6,3.1e6)\n",
    "ax1.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax1.set_xlabel(r'$\\delta$ (ppm)')\n",
    "ax1.set_title('Integral')\n",
    "ax1.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is a second peak at about -113.21 ppm that is also contributing to the value of the integral, probably arising from an impurity in the internal standard or one of the other reagents in the system. From the plot on the right, we can see that is contributes ~1% to the total integral, which might be a bit of a concern. It is also not such an easy task to eliminate the peak! We could change the integration limits, but because the interloper peak is sitting on the side of the peak of interest, we can't exclude it without also excluding the peak we are interested in. By eye, we can visualize what the spectrum would look like if that peak weren't there, so it would be nice to find a mathematical approach that reconstructs the spetrum without it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lineshape Functions\n",
    "\n",
    "A **lineshape function** is a mathematical function that describes the shape of a peak in spectrum. In spectroscopy (like NMR), we know from quantum mechanics that the frequency of light needed to cause a spectroscopic transition must equal the difference in energy between quantum states. However, there are some physical mechanisms that result in line broadening. In NMR, the raw signal is a **Free Induction Decay**, which is a cosine-like wave that exponentally decays over time. That exponential decay causes the spectral line in an NMR spectrum to take on a form very close to a **Lorentzian** lineshape:\n",
    "\n",
    "$$ f(\\delta) = \\frac{A}{\\pi}\\frac{w}{(\\delta - \\delta_0)^2 + w^2} $$\n",
    "\n",
    "where:\n",
    "- $A$ is the peak area\n",
    "- $w$ is the Half Width at Half Maximum (HWHM)\n",
    "- $\\delta_0$ is the center of the peak\n",
    "\n",
    "The Lorentzian function looks like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def lor(x,A,w,x0):\n",
    "    #print(f\"Hello from inside function: A = {A:.4e} w = {w:.4f} x0 = {x0:.4f}\")\n",
    "    return A/np.pi*w/((x-x0)**2. + w**2.)\n",
    "\n",
    "A = 1.\n",
    "w = 0.6\n",
    "x0 = 5.0\n",
    "xx = np.linspace(0,10,1000)\n",
    "lory = lor(xx,A,w,x0)\n",
    "\n",
    "\n",
    "fig,ax = plt.subplots(dpi=200)\n",
    "ax.plot(xx,lory,'k-')\n",
    "\n",
    "halfmax = lor(x0+w,A,w,x0)\n",
    "ax.plot([x0,x0+w],[halfmax,halfmax],'k:')\n",
    "ax.annotate('$w$',xy=(x0+w/2,halfmax-0.1*halfmax),ha='center')\n",
    "ax.plot([x0,x0],[0,2.2*halfmax],'k:')\n",
    "ax.annotate(r'$\\delta_0$',xy=(x0+0.2,2.1*halfmax),ha='left')\n",
    "ax.annotate(r'area = $A$',xy=(x0-w/2,0.5*halfmax),xytext=(0.1,0.5),textcoords='axes fraction',arrowprops={'arrowstyle':'->'})\n",
    "ax.annotate(r'$f(\\delta_0 + w) = \\frac{1}{2}f(\\delta_0)$',xy=(x0+w,halfmax),xytext=(0.9,0.6),textcoords='axes fraction',arrowprops={'arrowstyle':'->'},ha='right')\n",
    "yl = ax.get_ylim()\n",
    "ax.set_ylim(0,yl[1])\n",
    "\n",
    "legtext = 'Lorentzian lineshape\\n'\n",
    "legtext += r'$f(\\delta) = \\frac{A}{\\pi}\\frac{w}{(\\delta-\\delta_0)^2 + w^2}$'\n",
    "legtext += f'\\n$A$ = {A:.1f}\\n$\\\\delta_0$ = {x0:.1f}\\n$w$ = {w:.1f}'\n",
    "ax.annotate(legtext,xy=(0.05,0.95),xycoords='axes fraction',va='top')\n",
    "\n",
    "ax.set_xlabel(r'$\\delta$')\n",
    "ax.set_ylabel(r'$f(\\delta)$')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the Lorentzian lineshape function to model the data. If we take the integrals that we calculated above and use that as the area, and guess at values for the HWHM and peak center, we can get reasonable agreement by just plugging in good guesses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fx = S[0]['delta'].to_numpy()[1:-1]\n",
    "fy = S[0]['y'].to_numpy()[1:-1]\n",
    "xx = np.linspace(fx[0],fx[-1],1000)\n",
    "\n",
    "fig,ax = plt.subplots()\n",
    "ax.scatter(fx,fy,color='black')\n",
    "ax.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax.set_xlabel(r'$\\delta$ (ppm)')\n",
    "ax.plot(xx,lor(xx,3.0e6,.005,-113.15),color='#ffbf00')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To quantify the \"goodness of fit\", we can take a look at the **residuals** between the model and the real data. For a data set consisting of points $(x_i,y_i)$ modeled by a function $f(x)$, the $i$th residual $r_i$ is defined as:\n",
    "\n",
    "$$ r_i = y_i - f(x_i) $$\n",
    "\n",
    "In a \"good\" fit, the residuals are scattered around 0 with no systematic trends. We can add a residual plot to the graph to visualize them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fx = S[0]['delta'].to_numpy()[1:-1]\n",
    "fy = S[0]['y'].to_numpy()[1:-1]\n",
    "xx = np.linspace(fx[0],fx[-1],1000)\n",
    "\n",
    "fig,(ax0,ax1) = plt.subplots(2,1,figsize=(6,6),gridspec_kw={'height_ratios': [1,3]})\n",
    "ax1.scatter(fx,fy,color='black')\n",
    "ax1.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax1.set_xlabel(r'$\\delta$ (ppm)')\n",
    "ax1.plot(xx,lor(xx,3.0e6,.005,-113.15),color='#ffbf00')\n",
    "\n",
    "ax0.scatter(fx,fy-lor(fx,3.0e6,.005,-113.15),color='black')\n",
    "ax0.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax0.set_ylabel('Residuals')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Clearly, our model can be improved. The way to do this rigorously is by varying the model parameters $A$, $w$, and $\\delta_0$ and minimizing the sum of the squares of the residuals:\n",
    "\n",
    "$$ SS = \\sum_i r_i^2 $$\n",
    "\n",
    "This process is called **least squares optimization**, and as discussed in this week's introduction, our model is nonlinear in the parameters, so performing the optimization requires solving coupled differential equations iteratively. Fortunately, the [`scipy.optimize`](https://docs.scipy.org/doc/scipy/reference/optimize.html) module contains a wide variety of functions that are designed to minimize, maximize, or find zeros of various functions or datasets. For the case of data that depend only on one independent variable, the [`scipy.optimize.curve_fit`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html) function is especially convenient. We provide the function, the x and y data points, and initial guesses for the fit parameters, and then scipy carries out the nonlinear least squares optimization for us.\n",
    "\n",
    "Take a look at the function documentation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import scipy.optimize as opt\n",
    "\n",
    "opt.curve_fit?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the list of arguments, `f` is the model function (in our case `lor`). That function needs to be in a particular form: its first argument must be the independent variable, and the remaining arguments are the parameters that will be optimized. Our `lor` function already satisfies this requirement. `xdata` and `ydata` are the data to fit, and `p0` is a list of initial guesses for the parameters. The size of `p0` must be the same as the number of parameters in the model function (i.e., one less than the total number of arguments). The parameters `sigma` and `absolute_sigma` can be used if the data points have error bars already associated with them. The `bounds` parameter can be used to place restrictions on the values of the parameters during the optimization by providing a list of tuples setting the min and max allowed value for each parameter. For instance, we know that the peak area must be a positive number, so we could set a bound of `(0,np.inf)`. If `bounds` is not set, then the parameters are allowed to take on any value, and sometimes the fit can go crazy (especially when the initial guesses are poor). `method` can be used to select a specific optimization algorithm (the default is usually fine), and `jac` allows you to pass a function that computes the Jacobian matrix if desired. Passing the Jacobian can speed up the calculation and make it slightly more accurate, but otherwise, the algorithm itself will approximate the Jacobian using finite differences, so it is usually not required.\n",
    "\n",
    "The function returns 2 arrays (conventionally names `popt` and `pcov`, respectively) which contain the optimized parameter values and the covariance matrix, which is related to the uncertainties of the parameters. We'll discuss this more in a bit.\n",
    "\n",
    "Now we'll use `curve_fit` to model our data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fx = S[0]['delta'].to_numpy()[1:-1]\n",
    "fy = S[0]['y'].to_numpy()[1:-1]\n",
    "\n",
    "popt,pcov = opt.curve_fit(lor,fx,fy,p0=[3e6,.001,-113.15])\n",
    "\n",
    "print(popt)\n",
    "print(pcov)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the optimized parameters, we can now plot the best fit model and residuals:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fx = S[0]['delta'].to_numpy()[1:-1]\n",
    "fy = S[0]['y'].to_numpy()[1:-1]\n",
    "\n",
    "popt,pcov = opt.curve_fit(lor,fx,fy,p0=[3e6,.001,-113.15])\n",
    "\n",
    "xx = np.linspace(fx[0],fx[-1],1000)\n",
    "\n",
    "fig,(ax0,ax1) = plt.subplots(2,1,figsize=(6,6),gridspec_kw={'height_ratios': [1,3]})\n",
    "ax1.scatter(fx,fy,color='black')\n",
    "ax1.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax1.set_xlabel(r'$\\delta$ (ppm)')\n",
    "\n",
    "#note: we can use python's argument expansion to pass popt directly to the lor function\n",
    "ax1.plot(xx,lor(xx,*popt),color='#ffbf00')\n",
    "\n",
    "ax0.scatter(fx,fy-lor(fx,*popt),color='black')\n",
    "ax0.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax0.set_ylabel('Residuals')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By eye, the model looks pretty good, though there is some extra structure in the residuals. This likely comes from 3 sources:\n",
    "1. The actual lineshape of the data is probably described better by the [Voigt profile](https://en.wikipedia.org/wiki/Voigt_profile) rather than a Lorentzian.\n",
    "2. For technical reasons, the NMR spectrum is not perfectly symmetric. This has to do with how the raw signals from the NMR are processed to convert them into spectra.\n",
    "3. There is at least 1 impurity peak at -113.21 ppm, and there might be a second one at -113,17 ppm.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ax1.set_ylim(-1e6,1e7)\n",
    "fig.patch.set_facecolor('white') #workaround for dark jupyterlab theme\n",
    "fig"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One noticeable deviation is that the \"wings\" of the model are a bit too wide; this could be improved by using a more sophisticated lineshape model like the Voigt function mentioned before. For simplicity, we will not do that here. However, the extra peak we can incorporate into the model by adding a second Lorentzian to the model function. This time we'll use the bounds parameter as an example. Without it, there is a chance that the model will decide that the second peak has both a negative area and width. While that may seem like a problem; note that `lor(x,A,w,x0) = lor(x,-A,-w,x0)`, so there is no real difference. Still, it's nice for the numbers to both be positive."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def lor2(x,A1,w1,x01,A2,w2,x02):\n",
    "    return lor(x,A1,w1,x01) + lor(x,A2,w2,x02)\n",
    "\n",
    "#bounds is a 2-tuple of arrays: first array contains min values and second contains max values\n",
    "bounds = ([0,.001,-113.25,0,.001,-113.25],[np.inf,.01,-113.05,np.inf,.01,-113.05])\n",
    "\n",
    "popt,pcov = opt.curve_fit(lor2,fx,fy,p0=[3e6,.004,-113.15,3e4,.004,-113.21],bounds=bounds)\n",
    "\n",
    "xx = np.linspace(fx[0],fx[-1],1000)\n",
    "\n",
    "fig,(ax0,ax1) = plt.subplots(2,1,figsize=(6,6),gridspec_kw={'height_ratios': [1,3]})\n",
    "ax1.scatter(fx,fy,color='black')\n",
    "ax1.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax1.set_xlabel(r'$\\delta$ (ppm)')\n",
    "ax1.set_ylim(-1e6,1e7)\n",
    "#note: we can use python's argument expansion to pass popt directly to the lor function\n",
    "ax1.plot(xx,lor2(xx,*popt),color='#ffbf00')\n",
    "\n",
    "ax0.scatter(fx,fy-lor2(fx,*popt),color='black')\n",
    "ax0.xaxis.set_major_locator(plt.MaxNLocator(5))\n",
    "ax0.set_ylabel('Residuals')\n",
    "\n",
    "print(popt)\n",
    "print(pcov)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `pcov` array is the [**covariance matrix**](https://en.wikipedia.org/wiki/Covariance_matrix), and it can be used to obtain estimates of the uncertainties in the model parameters. The diagonal elements of the covariance matrix give the variances of the model parameters, and the off-diagonal elements are related to the correlation between the parameters (see the linked article for the relationship between the covariance matrix and the correlation matrix). We can obtain the 1$\\sigma$ uncertainties on the parameters by taking the square root of the diagonal elements. In other words, this gives a measure of how much each parameter would need to change in order to double the standard deviation of the fit residuals, which is a good measure of how reliable the parameters are."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "psigma = np.sqrt(np.diag(pcov))\n",
    "\n",
    "print(f'A = {popt[0]:.1f} +/- {psigma[0]:.1f}')\n",
    "print(f'w = {popt[1]:.7f} +/- {psigma[1]:.7f}')\n",
    "print(f'x0 = {popt[2]:.7f} +/- {psigma[2]:.7f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After this analysis, we can see that the difference between the peak areas of the large peak with and without the extra small peak are virtually identical. But we also see that the estimate of the peak area is a few percent larger than our estimate from direct integration. It would be desirable to improve the lineshape model for the best quantitative analysis, but for now we will proceed by using both integration and curve fitting for the rest of our analysis.\n",
    "\n",
    "## Kinetic Analysis\n",
    "\n",
    "To do the kinetic analysis, we need to extract the areas for all of the peaks as a function of time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def integrate_peak(x,y,method=spi.trapz):\n",
    "    return np.abs(method(y,x)) #return absolute value since delta x is negative\n",
    "\n",
    "int_list = []\n",
    "\n",
    "for (t,r,p,s) in zip(time,R,P,S):\n",
    "    il = [t]\n",
    "    il.append(integrate_peak(r['delta'],r['y']))\n",
    "    il.append(integrate_peak(p['delta'],p['y']))\n",
    "    il.append(integrate_peak(s['delta'],s['y']))\n",
    "    int_list.append(il)\n",
    "    \n",
    "int_df = pd.DataFrame(int_list,columns=['t','R','P','S'])\n",
    "int_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we have the peak integrals as a function of time, and we can calculate the concentrations of R and P using the equations above. We'll also calculate \\[R\\] + \\[P\\], which should remain constant over time. Then, make a scatter plot:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "int_df['[R]'] = int_df['R']/3./int_df['S']*0.11\n",
    "int_df['[P]'] = int_df['P']/3./int_df['S']*0.11\n",
    "int_df['[R+P]'] = int_df['[R]']+int_df['[P]']\n",
    "\n",
    "fig,ax = plt.subplots()\n",
    "ax.scatter(int_df['t'],int_df['[R]'],label='[R]')\n",
    "ax.scatter(int_df['t'],int_df['[P]'],label='[P]')\n",
    "ax.scatter(int_df['t'],int_df['[R+P]'],label='[R]+[P]')\n",
    "\n",
    "ax.set_xlabel('t (min)')\n",
    "ax.set_ylabel('Concentration (M)')\n",
    "ax.legend()\n",
    "int_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can try to determine the reaction order in **R** by regression analysis of \\[R\\] vs time, ln \\[R\\] vs time, and 1/\\[R\\] vs time using `pingouin`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pingouin as pg\n",
    "\n",
    "fig,(ax1,ax2,ax3) = plt.subplots(1,3,figsize=(18,5))\n",
    "\n",
    "xx = np.linspace(0,np.max(int_df['t']),1000)\n",
    "\n",
    "ax1.scatter(int_df['t'],int_df['[R]'])\n",
    "ax1.set_xlabel('t (min)')\n",
    "ax1.set_ylabel('[R] (M)')\n",
    "\n",
    "lr = pg.linear_regression(int_df['t'],int_df['[R]'])\n",
    "ax1.plot(xx,lr['coef'][0]+lr['coef'][1]*xx,'k-')\n",
    "ax1.annotate(f'R$^2$ = {lr[\"r2\"][0]:.4f}',xy=(0.5,0.9),xycoords='axes fraction')\n",
    "\n",
    "\n",
    "ax2.scatter(int_df['t'],np.log(int_df['[R]']))\n",
    "ax2.set_xlabel('t (min)')\n",
    "ax2.set_ylabel('ln [R]')\n",
    "lr = pg.linear_regression(int_df['t'],np.log(int_df['[R]']))\n",
    "ax2.plot(xx,lr['coef'][0]+lr['coef'][1]*xx,'k-')\n",
    "ax2.annotate(f'R$^2$ = {lr[\"r2\"][0]:.4f}',xy=(0.5,0.9),xycoords='axes fraction')\n",
    "\n",
    "ax3.scatter(int_df['t'],1./int_df['[R]'])\n",
    "ax3.set_xlabel('t (min)')\n",
    "ax3.set_ylabel('1/[R] (M$^{-1}$)')\n",
    "lr = pg.linear_regression(int_df['t'],1./int_df['[R]'])\n",
    "ax3.plot(xx,lr['coef'][0]+lr['coef'][1]*xx,'k-')\n",
    "ax3.annotate(f'R$^2$ = {lr[\"r2\"][0]:.4f}',xy=(0.5,0.9),xycoords='axes fraction')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's a tough call, but the linear correlation is best in the third plot, which suggests the reaction is second-order in the reactant. We can also perform a similar analysis with curve fitting, and it has the added bonus that we have a measure of the uncertainty of each peak area so we can calculate errors on the concentrations using [propagation of uncertainty](https://en.wikipedia.org/wiki/Propagation_of_uncertainty). Given peak areas $R$, $P$, and $S$ and their uncertainties $dR$, $dP$, and $dS$, the uncertainties on the concentrations are:\n",
    "\n",
    "$$ d[\\textbf{R}] = [\\textbf{R}]\\sqrt{ \\left(\\frac{dR}{R}\\right)^2 + \\left(\\frac{dS}{S}\\right)^2 + \\left(\\frac{d[\\textbf{S}]}{[\\textbf{S}]}\\right)^2} $$\n",
    "\n",
    "$$ d[\\textbf{P}] = [\\textbf{P}]\\sqrt{ \\left(\\frac{dP}{P}\\right)^2 + \\left(\\frac{dS}{S}\\right)^2 + \\left(\\frac{d[\\textbf{S}]}{[\\textbf{S}]}\\right)^2} $$\n",
    "\n",
    "$$ d([\\textbf{R}] + [\\textbf{P}]) = \\sqrt{d[\\textbf{R}]^2 + d[\\textbf{P}]^2} $$\n",
    "\n",
    "From our initial analysis of $S$ and $dS$, we know that the ratio $dS/S$ is about 0.0001. We would expect similar uncertainties for all of the integrals. What remains is the last term: the uncertainty of the concentration of the internal standard. We have no information to assess this quantity, as it depends on how the sample was prepared. It is probably close to 1%, however, which would make it the leading source of uncertainty in the analysis. However, for now we will just neglect that term and assume that the internal standard concentration is known exactly (i.e., $d[\\textbf{S}] = 0$)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def fit_peak(x,y,w_guess):\n",
    "    popt,pcov = opt.curve_fit(lor,x,y,p0=[0.5*w_guess*np.max(y),w_guess,x[np.argmax(y)]])\n",
    "    \n",
    "    return popt[0],np.sqrt(pcov[0][0])\n",
    "\n",
    "int_list = []\n",
    "\n",
    "for (t,r,p,s) in zip(time,R,P,S):\n",
    "    il = [t]\n",
    "    A,dA = fit_peak(r['delta'],r['y'],.004)\n",
    "    il.append(A)\n",
    "    il.append(dA)\n",
    "    A,dA = fit_peak(p['delta'],p['y'],.004)\n",
    "    il.append(A)\n",
    "    il.append(dA)\n",
    "    A,dA = fit_peak(s['delta'],s['y'],.004)\n",
    "    il.append(A)\n",
    "    il.append(dA)\n",
    "    int_list.append(il)\n",
    "    \n",
    "int_df = pd.DataFrame(int_list,columns=['t','R','dR','P','dP','S','dS'])\n",
    "int_df['[R]'] = int_df['R']/3/int_df['S']*0.11\n",
    "int_df['d[R]'] = int_df['[R]']*np.sqrt( (int_df['dR']/int_df['R'])**2 + (int_df['dS']/int_df['S'])**2)\n",
    "int_df['[P]'] = int_df['P']/3/int_df['S']*0.11\n",
    "int_df['d[P]'] = int_df['[P]']*np.sqrt( (int_df['dP']/int_df['P'])**2 + (int_df['dS']/int_df['S'])**2)\n",
    "int_df['[R+P]'] = int_df['[R]']+int_df['[P]']\n",
    "int_df['d[R+P]'] = np.sqrt(int_df['d[R]']**2 + int_df['d[P]']**2)\n",
    "\n",
    "fig,ax = plt.subplots()\n",
    "ax.errorbar(int_df['t'],int_df['[R]'],yerr=int_df['d[R]'],fmt='o',label='[R]')\n",
    "ax.errorbar(int_df['t'],int_df['[P]'],yerr=int_df['d[P]'],fmt='o',label='[P]')\n",
    "ax.errorbar(int_df['t'],int_df['[R+P]'],yerr=int_df['d[R+P]'],fmt='o',label='[R]+[P]')\n",
    "\n",
    "ax.set_xlabel('t (min)')\n",
    "ax.set_ylabel('Concentration (M)')\n",
    "ax.legend()\n",
    "\n",
    "int_df\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Even though we plotted the error bars, they're smaller than the sizes of the markers!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig,(ax1,ax2,ax3) = plt.subplots(1,3,figsize=(18,5))\n",
    "\n",
    "xx = np.linspace(0,np.max(int_df['t']),1000)\n",
    "\n",
    "ax1.scatter(int_df['t'],int_df['[R]'])\n",
    "ax1.set_xlabel('t (min)')\n",
    "ax1.set_ylabel('[R] (M)')\n",
    "\n",
    "lr = pg.linear_regression(int_df['t'],int_df['[R]'])\n",
    "ax1.plot(xx,lr['coef'][0]+lr['coef'][1]*xx,'k-')\n",
    "ax1.annotate(f'R$^2$ = {lr[\"r2\"][0]:.4f}',xy=(0.5,0.9),xycoords='axes fraction')\n",
    "\n",
    "\n",
    "ax2.scatter(int_df['t'],np.log(int_df['[R]']))\n",
    "ax2.set_xlabel('t (min)')\n",
    "ax2.set_ylabel('ln [R]')\n",
    "lr = pg.linear_regression(int_df['t'],np.log(int_df['[R]']))\n",
    "ax2.plot(xx,lr['coef'][0]+lr['coef'][1]*xx,'k-')\n",
    "ax2.annotate(f'R$^2$ = {lr[\"r2\"][0]:.4f}',xy=(0.5,0.9),xycoords='axes fraction')\n",
    "\n",
    "ax3.scatter(int_df['t'],1./int_df['[R]'])\n",
    "ax3.set_xlabel('t (min)')\n",
    "ax3.set_ylabel('1/[R] (M$^{-1}$)')\n",
    "lr = pg.linear_regression(int_df['t'],1./int_df['[R]'])\n",
    "ax3.plot(xx,lr['coef'][0]+lr['coef'][1]*xx,'k-')\n",
    "ax3.annotate(f'R$^2$ = {lr[\"r2\"][0]:.4f}',xy=(0.5,0.9),xycoords='axes fraction')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With curve fitting, we obtain very similar results as we did with direct integration. Notice how quick this analysis is! We analyzed several hours of experimental data in seconds, once the code was written. This type of NMR experiment would be repeated several times with different reactant concentrations and different catalysts, so having a processing pipeline that can analyze the experimental results quickly is very helpful. The tables and figures produced here can do directly into reports and publications, the Jupyter notebook clearly shows how the analysis was done along the way. It also ensures that the data are processed reproducibly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}