{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# [5.2]() Glossary <a class='iab-edit' href='https://github.com/caporaso-lab/An-Introduction-to-Applied-Bioinformatics/edit/master/book/back-matter/glossary.md#L1' target='_blank'>[edit]</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Table of Contents**\n",
    "0. [Pairwise alignment (noun)](#1)\n",
    "0. [kmer (noun)](#2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## [5.2.1](#1) Pairwise alignment (noun)<a name='1'></a> <a class='iab-edit' href='https://github.com/caporaso-lab/An-Introduction-to-Applied-Bioinformatics/edit/master/book/back-matter/glossary.md#L3' target='_blank'>[edit]</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "A hypothesis about which bases or amino acids in two biological sequences are derived from a common ancestral base or amino acid. By definition, the *aligned sequences* will be of equal length with gaps (usually denoted with ``-``, or ``.`` for terminal gaps) indicating hypothesized insertion deletion events. A pairwise alignment may be represented as follows:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "ACC---GTAC\n",
    "CCCATCGTAG\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [5.2.2](#2) kmer (noun)<a name='2'></a> <a class='iab-edit' href='https://github.com/caporaso-lab/An-Introduction-to-Applied-Bioinformatics/edit/master/book/back-matter/glossary.md#L12' target='_blank'>[edit]</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "A kmer is simply a word (or list of adjacent characters) in a sequence of length k. For example, the overlapping kmers in the sequence ``ACCGTGACCAGTTACCAGTTTGACCAA`` are as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import skbio\n",
    "skbio.DNA('ACCGTGACCAGTTACCAGTTTGACCAA').kmer_frequencies(k=5, overlap=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is common for bioinformaticians to substitute the value of `k` for the letter _k_ in the word _kmer_. For example, you might here someone say \"we identified all seven-mers in our sequence\", to mean they identified all kmers of length seven."
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 4
}