{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "[PyBroMo](http://tritemio.github.io/PyBroMo/) - 3. Generate and export smFRET data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> *This notebook is part of [PyBroMo](http://tritemio.github.io/PyBroMo/) a \n", "> python-based single-molecule Brownian motion diffusion simulator \n", "> that simulates confocal [smFRET](http://en.wikipedia.org/wiki/Single-molecule_FRET)\n", "> experiments. You can find the full list of notebooks in \n", "> [Usage Examples](http://tritemio.github.io/PyBroMo/#usage-examples).*" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Generate a smFRET experiment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> In this notebooks it is assumed that you have already generated a series\n", "> of timestamps for different emission levels and different background rates.\n", "> You can do this by running the notebook: **2. Generate timestamps - Parallel**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once we simulated timestamps for different emission rates (and possibly for different\n", "background rates) it is easy to obtain smFRET data. We just need to assign donor timestamps and acceptor timestamps.\n", "The FRET efficiency will be function of the ratio between max acceptor emission and max\n", "donor emission. Calling this rate $k$:\n", "\n", "$$k = \\mathrm{\\frac{acceptor\\;rate}{donor\\; rate}}$$\n", "\n", "We can compute the simulated FRET efficiency $E$ as:\n", "\n", "$$ E = \\frac{k}{k+1} $$\n", "\n", "So if, for example, you want a FRET efficiency of 20% you need a $k$ of 0.25:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "0.2/(1-0.2)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 1, "text": [ "0.25" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Therefore you need two timestamps files with max emission rate of 150 kcps and \n", "50 kcps. \n", "Make sure you already generated these timestamps (if not go back to the previous\n", "notebook). In this example you simply need to use the 150 kcps rate for the donor \n", "timestamps and the 50 kcps rate for the acceptor timestamps.\n", "\n", "> **Note**: You can simulate arbitrarly high emission rates, however as \n", "> a rule of thumb for a good green-red FRET pair (for example ATTO550 and ATTO647N),\n", "> you should not go above 200-300 kcps as total max emission rate (donor + acceptor).\n", "> YMMV." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Load the simulated timestamps" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%run -i load_bromo.py" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "/home/anto/Documents/ucla/src/brownian\n" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the avalilable timestamps should be in the folder `SIM_PH_DIR` \n", "(defined in `path_def.py` in **PyBroMo** folder):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "SIM_PH_DIR" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "'/home/anto/Documents/ucla/src/data/sim/brownian/ph_times/'" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "#ls $SIM_PH_DIR/*ID1+2+3+4+5+6*" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we need to specify which simulation to load providing the simulation parameters.\n", "\n", "* **ID** indicates all the simulation `ID` that are fused in the timestamp file\n", "* **t_tot** the strings representing the total simulated times (in seconds)\n", "* **num_p** the number of simulated particles\n", "* **pM** the particles concentration in pico-Mol\n", "* **t_step** the simulation time-step used during simulation\n", "* **D** the diffusion coefficient used during simulation\n", "* **dir_** the folder where the timestamps were saved\n", "\n", "For convenience we save these parameters in a dictionary called `pybromo_ts_params`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pybromo_ts_params = dict(\n", " ID = '1+2+3+4+5+6',\n", " t_tot = '480',\n", " num_p = '30',\n", " pM = '64',\n", " t_step = 0.5e-6,\n", " D = 1.2e-11,\n", " dir_=SIM_PH_DIR,\n", " )" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The previous parameters do not change most of the times. What you probably want to \n", "change often are the emission and background rates, in order to perform some comparison.\n", "Let specify the emission and background rates (in kHz) for both the donor and \n", "the acceptor channel:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pybromo_ts_params.update(\n", " d_em_kHz = 20.,\n", " a_em_kHz = 180.,\n", " d_bg_kHz = 6,\n", " a_bg_kHz = 3,\n", " )" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we use an helper function to obtain the timestamp filenames. As a side effect the \n", "simulated FRET efficiency is printed and returned:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "fname_d, fname_a, name, clk_p, E_sim = lu.get_bromo_fnames_da(**pybromo_ts_params)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Simulated FRET value: 90.0%\n", "D: EM 0020 BG 06.0 \n", "A: EM 0180 BG 03.0 \n", "ph_times_480s_D1.2e-11_30P_64pM_step0.5us_ID1+2+3+4+5+6_EM0020kHz_BG06.0kHz.npy\n", "ph_times_480s_D1.2e-11_30P_64pM_step0.5us_ID1+2+3+4+5+6_EM0180kHz_BG03.0kHz.npy\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We finally try to load the timestamps:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ph_times_d = np.load(fname_d)\n", "ph_times_a = np.load(fname_a)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the previous command fails you either specified wrong parameters (check the **ID**,\n", "**t_tot**, **num_p**, **pM**, etc...) or a wrong folder for timestamps (check **dir_**).\n", "Check if the filenames in `fname_d` and `fname_a` really exist. \n", "\n", "The timestamps are **float64** and represent the absolute time in seconds from \n", "the beginning of the simulated experiment. You can convert the array to integers\n", "(**int64**) and assign a clock period in order to have data that looks more like timestamps generated\n", "by an FPGA hardware:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ph_times, a_em = merge_DA_ph_times(ph_times_d, ph_times_a)\n", "\n", "clk_p = t_step/32. # with t_step=0.5us -> 156.25 ns\n", "ph_times_int = (ph_times/clk_p).astype('int64')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Save or export timestamps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Timestamps are Numpy arrays that can be saved in any conceivable format. \n", "Here I provide some examples." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Export to MATLAB .mat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we save two arrays `ph_times_d` and `ph_times_a` into a MATLAB file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from scipy.io import savemat" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "data_to_save = dict(timestamps_donor=ph_times_d, \n", " timestamps_acceptor=ph_times_a)\n", "\n", "savemat('timestamps_matlab_example', data_to_save, oned_as='row')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Export to ASCII format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even if not very efficient sometimes can be handy to save an array to a txt file. \n", "We use the numpy array method \n", "[`tofile()`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tofile.html) \n", "that can save both in binary and ASCII format.\n", "\n", "To save in ASCII just specify a separator (`sep`). Format is the standard python\n", "(or C) \n", "[string format](http://docs.python.org/2/library/stdtypes.html#string-formatting-operations)\n", "for numbers:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ph_times_d.tofile('timestamps_donor.txt', sep='\\n', format=\"%15e\")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a more flexible function see [`savetxt()`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html)." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Export to a binary format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let see an example on how to save (and load) an array in an arbitrary binary format. \n", "We use the numpy method \n", "[`tofile()`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tofile.html)\n", "that saves the array in the same representation\n", "as is currently in memory. So we just need to convert the array in memory and then save it to a file.\n", "\n", "Converting an array memory format and/or its declared type is explained pretty well \n", "in the numpy docs:\n", "\n", "* [Numpy Basics Byte-swapping](http://docs.scipy.org/doc/numpy/user/basics.byteswapping.html)\n", "\n", "Here let say we want to save the array `ph_times_int`. We\n", "can check the type and byte order asking to the array itself:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ph_times_int.dtype" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "dtype('int64')" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "ph_times_int.dtype.byteorder" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "'='" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So it is an **int64** array with native byte order '=' (the same as the CPU). You can\n", "[safely assume](http://en.wikipedia.org/wiki/Endianness#Endianness_and_hardware) \n", "that a modern CPU is little-endian. But if you are really paranoic you can check:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import sys\n", "sys.byteorder" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "'little'" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok. Now, as an example, we want to convert this array to big-endian int32:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Convert to int32\n", "ph_times_int32 = ph_times_int.astype('int32')\n", "\n", "# Swap the bytes in memory to change endianess (byteswap)\n", "# and change also the \"declared\" byte order (newbyteorder) \n", "ph_times_int32_bigendian = ph_times_int32.byteswap().newbyteorder()\n", "\n", "print ph_times_int32_bigendian.dtype" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">i4\n" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can finally save the array:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ph_times_int32_bigendian.tofile('times_bigend_int32.dat')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "new_ph_times = np.fromfile('times_bigend_int32.dat', dtype='>i4')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "(new_ph_times == ph_times_int32_bigendian).all()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ "True" ] } ], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Form the previous command we see that all the elements of the two arrays are equal." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**METHOD 2** You may not trust `fromfile`: it may be able to load the array \n", "just because `fromfile` is the twin method of `tofile` (we are a bit paranoic\n", "here but just check). \n", "\n", "So here we load the binary data and build an array from this binary stream:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open('times_bigend_int32.dat', 'rb') as f:\n", " data = f.read()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "new_ph_times = np.ndarray(len(data)/4, dtype='>i4', buffer=data)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "(new_ph_times == ph_times_int32_bigendian).all()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 22, "text": [ "True" ] } ], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, all the elements of the two arrays are equal." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**METHOD 3** Still not convinced? Let take look at the raw data (first two timestamps):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print new_ph_times[0], hex(new_ph_times[0])\n", "print new_ph_times[1], hex(new_ph_times[1])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1856 0x740\n", "4544 0x11c0\n" ] } ], "prompt_number": 23 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In `data` we loaded the byte-stream that was saved in the file. Here we convert each byte-string to integer (`ord`) and then to hexadecimal (`hex`):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "[hex(ord(d)) for d in data[:8]]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 24, "text": [ "['0x0', '0x0', '0x7', '0x40', '0x0', '0x0', '0x11', '0xc0']" ] } ], "prompt_number": 24 }, { "cell_type": "markdown", "metadata": {}, "source": [ " You can see that the first two timestamps correspond to what we have in `new_ph_times`\n", " and the order is big-endian (most significant byte first)." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.core.display import HTML\n", "def css_styling():\n", " styles = open(\"./styles/custom2.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 1, "text": [ "" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }