{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Homework 2.2: Exponential conjugate prior (55 pts)\n",
    "\n",
    "[Data set download](http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?tool=portal&sendto=on&log$=seqview&db=nuccore&dopt=fasta&val=CP042865.1)\n",
    "\n",
    "*This problem was motivated by discussion in section 2.10 of [Holmes and Huber's book](http://web.stanford.edu/class/bios221/book/).*\n",
    "\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**a)** Show that the conjugate distribution for the Exponential distribution is the Gamma distribution. How are the parameters of the Gamma updated from the prior to the posterior? (*You are welcome to look at [Wikipedia's table of conjugate priors](https://en.wikipedia.org/wiki/Conjugate_prior) to check your answer, but you should not look up the actual proof that the Gamma distribution is conjugate to the Exponential.*)\n",
    "\n",
    "**b)** Download the sequence of the chromosomal DNA of *E coli* strain ATCC BAA-196 [here](http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?tool=portal&sendto=on&log$=seqview&db=nuccore&dopt=fasta&val=CP042865.1) in FASTA format. This strain is resistant to multiple drugs and is used in studies of antibiotic resistance. The sequence was published in [this paper](https://doi.org/10.1128/MRA.01118-19). Read the sequence in as a single string, using the function below, if you like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_fasta_single_record(filename):\n",
    "    \"\"\"Read a sequence in from a FASTA file containing a single sequence.\n",
    "    \n",
    "    We assume that the first line of the file is the descriptor and all\n",
    "    subsequent lines are sequence.    \n",
    "    \"\"\"\n",
    "    with open(filename, 'r') as f:\n",
    "        # Read in descriptor\n",
    "        descriptor = f.readline().rstrip()\n",
    "\n",
    "        # Read in sequence, stripping the whitespace from each line\n",
    "        seq = ''\n",
    "        line = f.readline().rstrip()\n",
    "        while line != '':\n",
    "            seq += line\n",
    "            line = f.readline().rstrip()\n",
    "            \n",
    "    return descriptor, seq\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**c)** Find the index in the sequence of all [Shine-Delgarno motifs](https://en.wikipedia.org/wiki/Shine-Dalgarno_sequence), which is an important motif in initiation of protein synthesis. The Shine-Delgarno sequence is `AGGAGGT`. You can use the function below to find recognition sequences."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "\n",
    "\n",
    "def recognition_sites_with_re(seq, recog_seq):\n",
    "    \"\"\"Find the indices of all recognition sites in a sequence.\"\"\"\n",
    "    sites = []\n",
    "    for site in re.finditer(recog_seq, seq):\n",
    "        sites.append(site.start())\n",
    "        \n",
    "    return sites"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Store the number of bases between each occurrence of the motif as an array.\n",
    "\n",
    "**d)** We will model the distance between each motif as Exponentially distributed. Explain why this may be a reasonable model.\n",
    "\n",
    "**e)** Make an exploratory plot of the ECDF of inter-motif distances. Does it look Exponential?\n",
    "\n",
    "**f)** Use an Exponential likelihood and a Gamma prior distribution to compute and plot the posterior distribution of the rate parameter (the inverse of the characteristic distance between motifs) of the Exponential distribution."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br />"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}