{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 0. Introduction to fitting spectral data\n", "These tutorial notebooks will give a basic introduction to peak fitting of XRD scattering spectra using the the xrdfit package.\n", "\n", "Some example spectra can be found in the `example_data` folder in the root folder of the `xrd_fit` package and will be used here to demonstrate the various functions of the scripts. To use these notebooks you should proceed in order, executing each cell at a time, since some cells depend on variables defined in previous cells.\n", "\n", "## 0.1 Required packages\n", "\n", "If you installed `xrdfit` using pip all of these packages should already be installed.\n", "\n", "`xrdfit` uses the `lmfit` package to do the fitting: https://lmfit.github.io/. It also uses `Pandas` for loading data, `Numpy` for data processing, `matplotlib` for plotting, `dill` for saving fits to disk and `tqdm` for progress bars.\n", "\n", "## 0.2. Data format\n", "\n", "These scripts read data from tab separated text files. The format of the file is the standard format output by DAWN (https://dawnsci.org/about/). DAWN is a data analysis and processing program commonly used for processing synchrotron data. \n", "\n", "The data file does not have any column headings or title line. The first column of the data file is the two theta diffraction angle. Each subsequent column corresponds to a cake of intensity data from the diffraction pattern.\n", "\n", "In this analysis we specify that a high-energy SXRD (synchrotron x-ray diffraction) experiment in transmission setup produces a circular **diffraction pattern**\n", "$$I = f(\\chi, 2\\theta)$$\n", "where $I$ is beam intensity, $\\chi$ is the azimuthal angle and $2\\theta$ is the scattering angle. The intensity is measured at a discreet number of azimuthal angles which results in the pattern being made up of a number of sectors which we call **cakes** (since the sectors look like a slice of cake).\n", "\n", "\"Illustration\n", " \n", "By convention the **cakes** in the input data files are ordered clockwise, but no information is provided about the location of the first cake. This means the angle of the first cake must be provided as part of the analysis.\n", "\n", "\n", "## 0.3. Peak model fit\n", "\n", "The model fit used in `xrdfit` is the Pseudo-Voigt (PV) model. This is a combination of a Gaussian and Lorentzian distribution function which is commonly used for the fitting of spectroscopic peaks since it is reliably robust to peak broadening. `xrdfit` uses the PV model from `lmfit` (https://lmfit.github.io/lmfit-py/builtin_models.html#pseudovoigtmodel). The model parameters are as follows:\n", "\n", "* sigma $(\\sigma)$ - This controls the width of the fitted peak\n", "* fraction $(\\alpha)$ - The ratio of the Gaussian and Lorentzian components. \n", "* center $(\\mu)$ - The center of the fitted peak\n", "* amplitude ($A$) - The amplitude of the Gaussian and Lorentzian components\n", "* background - In addition to the PV parameters there is a constant background parameter to model the background detector noise.\n", "\n", "There are three parameters which are derived from combinations of the above parameters:\n", "* Height - The y-value at the maximum of the fitted peak\n", "* FWHM - The full width at half maximum of the fitted peak\n", "* SNR - Signal to noise ratio - the peak height divided by the background" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Analysing a single diffraction pattern\n", "## 1.1 Loading and plotting a spectrum\n", "\n", "In this and following notebooks we use some sample SXRD data from the `example_data` folder in the root of the `xrdfit` package to demonstrate the fitting functions. \n", "\n", "As the first in a script, we import the required fitting functions from the `spectrum_fitting` module of the `xrdfit` package." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "# Set the matplotlib backend and make the plots a bit bigger\n", "%matplotlib inline\n", "import matplotlib\n", "matplotlib.rcParams['figure.figsize'] = [8, 6]\n", "\n", "from xrdfit.spectrum_fitting import PeakParams, FitSpectrum" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By convention the first cake in the data file is at the \"East\" position, 90° clockwise from vertical. However, depending on how your data was captured, this may be different for your dataset and so when loading data you must provide the angle of the first cake as a parameter. This is given in degrees clockwise from vertical. The example dataset begins at 90°." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "first_cake_angle = 90" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A **spectrum** is a scan of intensity at a set azimuthal angle (cake) across a range of two theta angles.\n", "A **spectrum** can have one or more **peaks** in intensity which are a group of one or more **maxima**. A singlet peak has one maximum, a doublet peak has two maxima etc. In these tutorials these terms are used very specifically, so the distinction is important to understand how the fits work and what parameters to provide.\n", "\n", "We start by analysing a single file which contains a single **diffraction pattern**.\n", "\n", "First we define the path to the file to be analysed:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "file_path = '../example_data/adc_041_7Nb_NDload_700C_15mms_00001.dat'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To fit the peaks in a spectrum, the diffraction pattern must first be loaded into a `FitSpectrum` object. The data for the whole diffraction pattern is stored in the instance of the object. In this case we name our instance *spectral_data*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data = FitSpectrum(file_path, first_cake_angle)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A confirmation message is shown when the data is loaded.\n", "\n", "We can then use the `plot_polar` method to plot the entire diffraction pattern:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "spectral_data.plot_polar()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The diffraction pattern is plotted with the cake number specified next to each segment. In the example data file we have used there are 36 cakes. The number of cakes is dependent on the input dataset (which is affected by how the user captured the data). The number of cakes in your dataset is determined automatically from the number of columns in the input dataset. The plot_polar command automatically adjusts to show the correct number of cakes and cake numbers according to the data file and first cake angle specified.\n", "\n", "\n", "In order to plot a spectrum of intensity against $2\\theta$ at a particular azimuthal angle, use the `plot` method of a `FitSpectrum` object, specifying which cake to plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.plot(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you are using the interactive matplotlib backend then it should be possible to zoom and pan the graph directly.\n", "If you are using a non-interactive backend then you can specify the x-axis limits by passing these as a Tuple to the `plot` \n", "method. To show the raw data points, use the *show_points* parameter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.plot(1, x_range=(2.7, 3), show_points=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.2 Fitting a single peak from a single cake\n", "\n", "First we consider fitting a **spectrum** from a single cake. We will cover combining data from multiple cakes later.\n", "\n", "To begin a fit, we need to define the **peak bounds** and the **maxima name**. The **peak bounds** specify where the peak is in the spectrum and the **maxima name** is a unique identifier used to refer to the maxima being fitted. All of the data between the upper and lower peak bound is used in the fit. We provide this data by initialising a `PeakParams` object. Generally speaking, is a good idea to include at least a few data points either side of the peak so the fit can find the baseline. \n", "\n", "Lets begin by fitting the above peak in the range 2.75 to 2.95." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "peak_params = PeakParams((2.75, 2.95), '(10-10)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because the peak is only a singlet peak we do not need to provide any additional parameters other than the **peak bounds**. The fitting is able to automatically fit the maximum in a singlet peak without much help.\n", "\n", "Now we have specified where the peak is, the method `fit_peaks` does the fitting. We use the peak params we specified above and the number of the cake we want to fit (in this case cake 1) as parameters." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.fit_peaks(peak_params, 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results of the fit are stored in a `PeakFit` object which is stored in `FitSpectrum` object (this instance of FitSpectrum is called *spectral_data*). The fit parameters can be viewed by using the values parameter of the `PeakFit` object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.fitted_peaks[0].result.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that internally in lmfit, the parameters are labelled using the syntax, `maximum_X_Y` where `X` is the zero based numerical index of the peak and `Y` is the paramter type.\n", "\n", "To get a peak by name we can use the `get_fit` method of the `FitSpectrum` object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.get_fit(\"(10-10)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This returns a `PeakFit` object. This object has a number of useful methods.\n", "\n", "`PeakFit.result` is an lmfit `ModelResult` object. This gives detailed results of the fit - further documentation can be found at: https://lmfit.github.io/lmfit-py/model.html#the-modelresult-class" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.get_fit(\"(10-10)\").result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get at the parameters of the fit we can use `ModelResult.values`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.get_fit(\"(10-10)\").result.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*values* is just a python dictionary so can be subscripted to get a particular parameter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.get_fit(\"(10-10)\").result.values['maximum_0_center']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want a plot of the fit the `PeakFit` object has a `plot` method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.get_fit(\"(10-10)\").plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notes on Choosing Peak Bounds for single peaks\n", "The peak bounds you choose to select a peak are used to tell the fitting code the data range to fit over. The peak bounds are also used to set the initial values of the parameters for the fit. For optimal fitting speed, try to set the peak bounds such that the range of the peak bounds is approximately 8 times the full width at half maximum (FWHM) of the peak bounds. Don't worry about this too much, since the fit result should not be sensitive to small changes in peak bounds and the fit will work if the peak bounds are set between 3 to 20 times the FWHM.\n", "\n", "In this case the FWHM for this peak is 0.025 which means that peak bounds with range between 0.075 and 0.5 should work. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.fit_peaks(PeakParams((2.82, 2.90), '3 times FWHM'), 1)\n", "spectral_data.get_fit(\"3 times FWHM\").plot()\n", "spectral_data.fit_peaks(PeakParams((2.75, 2.95), '8 times FWHM'), 1)\n", "spectral_data.get_fit(\"8 times FWHM\").plot()\n", "spectral_data.fit_peaks(PeakParams((2.55, 3.05), '20 times FWHM'), 1)\n", "spectral_data.get_fit(\"20 times FWHM\").plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.3 Fitting multiple peaks simultaneously" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It can be more convenient to specify multiple peaks and fit them all at once. Here we try to fit the first 4 peaks in the spectrum at once. First we get a good zoomed in view to see where the peaks are. As before, the arguments here are *cake_number* and *x_range*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.plot(1, (2.7, 4.3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By providing a list of `PeakParams`, we can fit multiple peaks simultaneously. Once we start specifying more than one set of peak params it can be a little confusing to set the peak bounds. To make sure that we have set the peak params correctly we can use the `plot_peak_params` method of the `FitSpectrum` object to summarise the PeakParams we have set. As with any plotting function in `xrdfit`, we can use the *show_points* parameter to see the raw data points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "peak_params = [PeakParams((2.75, 2.95), '1'),\n", " PeakParams((3.02, 3.15), '2'),\n", " PeakParams((3.15, 3.35), '3'),\n", " PeakParams((4.13, 4.30), '4')]\n", "\n", "spectral_data.plot_peak_params(peak_params, 1, x_range=(2.7, 4.3), show_points=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The grey hashed region indicates the peak bounds of the `PeakParams`. The green dashed line indicates the start of the peak bounds and the red dashed line indicates the end of the peak bounds. The code is looking for a peak in this area. The name of the peak as specified in the PeakParams is shown above each shaded area.\n", "\n", "Now we have specified the `PeakParams`, they are passed to the `fit_peaks` method as before to do the fitting.\n", "\n", "Each fitted peak is stored as a `PeakFit` object in the `FitSpectrum`. To plot the fits to multiple peaks we iterate over the list of `PeakFit`s, calling the plot method of each." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.fit_peaks(peak_params, 1)\n", "\n", "print(spectral_data.fitted_peaks)\n", "\n", "for fit in spectral_data.fitted_peaks:\n", " fit.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may notice that the third peak is fitted poorly since it is really two peaks close together. It would not be possible to fit the two peaks separately as two singlet peaks since they are too close together. We call cases like these **multiplet** peaks, a single peak with multiple **maxima**. In order to fit multiplet peaks we must provide the fitting with a little more information." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.4 Fitting peaks with multiple maxima\n", "\n", "First lets zoom in to get a better view of the doublet peak" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.plot(1, (3.15, 3.32), show_points=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To tell the fitting algorithm to fit two **maxima** in a single peak, we need to specify where one peak starts and the other ends. This means providing the `PeakParams` object with information about the position of the maximum for each peak - the **maxima bounds**.\n", "\n", "We need to provide a min and max bound for each peak. The min and max values bound a small region where the peak maximum can be found. While the **peak bounds** can be reasonably wide to include some baseline points, the **maxima bounds** can be specified quite tightly since they are just used to distinguish the maxima centers within the peak.\n", "\n", "The **maxima bounds** must be provided as a list of Tuples, one for each maxima in the peak. Each tuple consists of a max and min bound. The maxima bounds should be provided from left to right.\n", "\n", "In addition, we must also provide one name for each maxima. Previously it was possible to provide a single string for a single maxima. In the case of a peak with more than one maxima the **maxima names** must be provided as a list of strings." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "maxima_bounds = [(3.19, 3.22), (3.24, 3.26)]\n", "\n", "peak_params = PeakParams((3.14, 3.32), ['2', '3'], maxima_bounds)\n", "\n", "spectral_data.plot_peak_params(peak_params, 1, (3.0, 3.5), show_points=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we have plotted the *peak_params* provided for the double peak. As before, the grey shaded area shows the **peak bounds** but this time we can see that the green and red dashed lines show the **maxima bounds**.\n", "\n", "In the single peak example in section 1.3 above, the green and red lines were coincident with the edges of the shaded region. This is because for a singlet peak the **maxima bounds** are set to the **peak bounds** by default.\n", "\n", "We can now do the fit as before using the `fit_peaks` method.\n", "\n", "To plot a multiplet peak you can specify the name of either of the maxima - in this case ``spectral_data.get_fit('2').plot()`` and ``spectral_data.get_fit('3').plot()`` would plot the same graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "spectral_data.fit_peaks(peak_params, 1)\n", "spectral_data.get_fit('2 3').plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you have set the `PeakParams`, it may be useful to print them out to check exactly what peak params have been set or to output them as part of a report. Printing the `PeakParams` object gives the string representation used to create it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(peak_params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notes on Choosing Maxima Bounds for multiplet peaks\n", "\n", "While we can be fairly imprecise when specifying the peak bounds, the maxima bounds must be specified a little more carefully. You should make sure that the highest data point in the maximum is captured between the min and max bounds and ideally the maxima bounds should be specified at the FWHM of the peak.\n", "\n", "The maxima bounds are used to determine the starting parameters for the fitting function and so setting them well can significantly affect the quality of the fit and the fitting speed.\n", "\n", "Here we show the effects of setting the maxima bounds in different places:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "peak_params_1 = PeakParams((3.14, 3.32), [\"Good\", \"bounds\"], [(3.19, 3.22), (3.24, 3.26)])\n", "peak_params_2 = PeakParams((3.14, 3.32), [\"Wide\", \"bounds\"],[(3.14, 3.25), (3.2, 3.32)])\n", "peak_params_3 = PeakParams((3.14, 3.32), [\"Narrow\", \"bounds\"], [(3.205, 3.21), (3.25, 3.255)])\n", "\n", "spectral_data.plot_peak_params(peak_params_1, 1, (3.0, 3.5), show_points=True, label_angle=45)\n", "spectral_data.plot_peak_params(peak_params_2, 1, (3.0, 3.5), show_points=True, label_angle=45)\n", "spectral_data.plot_peak_params(peak_params_3, 1, (3.0, 3.5), show_points=True, label_angle=45)\n", "\n", "import time\n", "\n", "# Time the good bounds fit\n", "start_1 = time.perf_counter()\n", "spectral_data.fit_peaks(peak_params_1, 1)\n", "stop_1 = time.perf_counter()\n", "spectral_data.get_fit(\"Good\").plot(label_angle=45)\n", "\n", "# Time the wide bounds fit\n", "start_2 = time.perf_counter()\n", "spectral_data.fit_peaks(peak_params_2, 1)\n", "stop_2 = time.perf_counter()\n", "spectral_data.get_fit(\"Wide\").plot()\n", "\n", "# Time the narrow bounds fit\n", "start_3 = time.perf_counter()\n", "spectral_data.fit_peaks(peak_params_3, 1)\n", "stop_3 = time.perf_counter()\n", "spectral_data.get_fit(\"Narrow\").plot()\n", "\n", "\n", "print(f\"Good bounds fit took {stop_1 - start_1:05f} Seconds\")\n", "print(f\"Wide bounds fit took {stop_2 - start_2:05f} Seconds\")\n", "print(f\"Narrow bounds fit took {stop_3 - start_3:05f} Seconds\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case the fit has converged well for all three sets of maxima bounds although the good bounds converge two or three times as fast.\n", "\n", "While double the speed might not seem that significant, the difference can be much greater when fitting higher number of maxima (triplets, quartets etc.) or if the maxima are less distinct. If you are fitting many peaks over hundreds or thousands of data sets this can make a big difference.\n", "\n", "Ultimately, if the maxima bounds are too far from optimal then the fitting may not converge at all and the fits will not be useful.\n", "\n", "Also note the `label_angle` parameter to `plot_peak_params` and `FitResult.plot`, this can be provided to any fit plot to rotate the angle of the maxima labels. This can be useful if maxima are close together and this causes the labels to overlap." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.5. Putting it all together\n", "\n", "To fit the single and doublet peaks at the same time we can combine `PeakParams` seamlessly as before. Going back to the above example, we can fit the first 4 peaks in the spectrum like this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "peak_params = [PeakParams((2.75, 2.95), '1'),\n", " PeakParams((3.02, 3.15), '2'),\n", " PeakParams((3.14, 3.32), ['3', '4'], [(3.19, 3.22), (3.24, 3.26)]),\n", " PeakParams((4.13, 4.30), '5')]\n", "\n", "spectral_data.fit_peaks(peak_params, 1)\n", "\n", "for fit in spectral_data.fitted_peaks:\n", " fit.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This now fits all of the peaks appropriately.\n", "\n", "However, peaks 2 and 3 are quite close - they don't have many baseline points between them. In this case they have fitted OK, but if the fit was having trouble it would also be valid to combine the peaks as a triplet peak." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "peak_params = [PeakParams((2.75, 2.95), '1'),\n", " PeakParams((3.02, 3.32), ['2', '3', '4'], [(3.09, 3.12), (3.19, 3.22), (3.24, 3.26)]),\n", " PeakParams((4.13, 4.30), '5')]\n", "\n", "spectral_data.fit_peaks(peak_params, 1)\n", "\n", "for fit in spectral_data.fitted_peaks:\n", " fit.plot()" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 2 }