{
"cells": [
{
"cell_type": "markdown",
"id": "ba0e9f28-bcf7-460c-a94a-99aefa3790d0",
"metadata": {},
"source": [
"# Introduction to pyalign\n",
"\n",
"> Bernhard Liebl
\n",
"> Computational Humanities Group
\n",
"> Leipzig University"
]
},
{
"cell_type": "markdown",
"id": "3ccc55c2-dc2a-4a2f-b928-7af323135abe",
"metadata": {},
"source": [
"In the following sections, this notebook will introduce you to the core concepts of `pyalign`. This will enable you to\n",
"\n",
"* correctly and efficiently formulate specific alignment problems\n",
"* choose the right solver for solving them\n",
"* take care of runtime performance considerations to obtain fast results"
]
},
{
"cell_type": "markdown",
"id": "7b40f0bd-ecff-48e4-942c-661461a13487",
"metadata": {},
"source": [
"### Setup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6ce19464-0d96-41f8-b993-e64389df0770",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"
\\n\"+\n", " \"\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"\\n\"+\n",
" \"\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"| ⚠ | For performance reasons, try to avoid recreating Solver objects, and instead try to reuse them for as many Problem instances as possible. |
| A | A | T | C | G |
| | | | | | | | | |
| A | A | C | G |
| \n", " \n", " | \n", " \n", " | \n", " \n", " | \n", " \n", " | \n", " \n", " |
| | | | | | | | | |
| \n", " \n", " | \n", " \n", " | \n", " \n", " | \n", " \n", " |
| ⚠ | If you can specify a small alphabet, problems created through `alphabetic` are solved faster than those specified through `general`. Therefore always check that your problem ist not suitable for `alphabetic` first. | \n", "
| \n", " \n", " | \n", " \n", " | \n", " \n", " | \n", " \n", " | \n", " \n", " |
| | | | | | | | | |
| \n", " \n", " | \n", " \n", " | \n", " \n", " | \n", " \n", " |
| ⚠ | \n", "`pyalign` chooses algorithms automatically. You just pass the desired gap cost function to the `Solver` and `pyalign` will internally pick the optimal implemented algorithm to use. For example, if you pick affine gap costs, you will get Gotoh's algorithm and O(n^2) runtime. If you configure logarithmic gap cost, `pyalign` will internally use a O(n^3) solver. | \n", "
| A | A | T | C | G | ||
| | | | | |||||
| A | A | C | G |
| ⚠ | pyalign will internally create an optimized solver that calls optimized C++ code for the specific codomain you request. For example, if you request to compute only scores, pyalign will not compute any traceback information in the first place. |
| A | A | T | C | G | ||
| | | | | |||||
| A | A | C | G |
| ⚠ | \n", "Batches incur some overhead and the speed-up is more noticeable with longer sequences and batches that contain sequences of similar length.\n", " |