{ "cells": [ { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": false }, "source": [ "# The curious case of an optimization bug" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "TIP: if you're reading this on GitHub, I recommend starting a Binder session to have color output (and an interactive session!). Here is the link: [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/luizirber/nthash_bug/master?filepath=analysis.ipynb)\n", "\n", "Alternatively, use [nbviewer](https://nbviewer.jupyter.org/github/luizirber/nthash_bug/blob/master/analysis.ipynb), which preserves the formatting.\n", "\n", "UPDATE: I opened an [issue](https://github.com/bcgsc/ntHash/issues/7) and seems like the current `master` version is the culprit (more specifically [57af16a](https://github.com/bcgsc/ntHash/tree/57af16a972a553ecccea0cda25af85ac1f96a94b/)). Reverting back to the [`1.0.4` release](https://github.com/bcgsc/ntHash/releases/tag/1.0.4) fixes the problem. There is a new section at the end of this notebook with more details." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## So, how did you get in this rabbit hole?\n", "\n", "Last week [Will Rowe](https://will-rowe.github.io/) posted the [preprint](https://doi.org/10.1101/408070) for [HULK](https://github.com/will-rowe/hulk), and in a comment in a related discussion [Heng Li](https://twitter.com/lh3lh3/status/1037487551504896001) suggested using `ntHash` to speed up hashing of k-mers, which makes more sense than using murmurhash3 when using sliding windows (since you can avoid recalculating the full hash all the time). He also linked to an answer on [the bioinformatics stack exchange](https://bioinformatics.stackexchange.com/questions/19/are-there-any-rolling-hash-functions-that-can-hash-a-dna-sequence-and-its-revers/293#293) with a very clear explanation by Karel Brinda on how it works.\n", "\n", "So clear was this explanation that I used it to implement a [Rust version](https://github.com/luizirber/nthash/blob/787faba1d9918791076d556dca0ea4bc38c85330/src/lib.rs). And since the default template for libraries (when running `cargo init --lib`) already includes a test example, I wrote a simple one that calculated the ntHash for a 5-mer and compared to the same value generated by the [original implementation (in C++)](https://github.com/bcgsc/ntHash).\n", "\n", "And... the test failed.\n", "\n", "I spent some time figuring out if I did something wrong, if there was something weird with my for loop, up to going to a whiteboard and checking indexing on the string by hand. It seemed to be right, so maybe there was something I was misinterpreting in the Rust syntax?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The implementations\n", "\n", "Well, back to the comfort zone: let's try in Python!\n", "\n", "Again, pretty straighforward to implement, but I had to make a `rol` function to rotate a 64-bit integer. Here is the implementation:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "import sys\n", "\n", "h = {\n", " 'A': 0x3c8bfbb395c60474,\n", " 'C': 0x3193c18562a02b4c,\n", " 'G': 0x20323ed082572324,\n", " 'T': 0x295549f54be24456,\n", " 'N': 0,\n", "}\n", "\n", "rc = {\n", " 'A': 0x295549f54be24456,\n", " 'C': 0x20323ed082572324,\n", " 'G': 0x3193c18562a02b4c,\n", " 'T': 0x3c8bfbb395c60474,\n", " 'N': 0,\n", "}\n", "\n", "def rol(x, k):\n", " return ((x << k % 64) & (2 ** 64 - 1) |\n", " ((x & (2 ** 64 - 1)) >> (64 - (k % 64))))\n", "\n", "def f(s, idx, k):\n", " out = 0\n", " for i, v in enumerate(s[idx: idx+k], 1):\n", " out ^= rol(h[v], k - i)\n", " return out\n", "\n", "def r(s, idx, k):\n", " out = 0\n", " for i, v in enumerate(reversed(s[idx: idx+k]), 1):\n", " out ^= rol(rc[v], k - i)\n", " return out\n", "\n", "def nthash(s, k):\n", " fval = f(s, 0, k)\n", " rval = r(s, 0, k)\n", " return min(fval, rval), fval, rval\n", "\n", "hval, fval, rval = nthash(sys.argv[1], len(sys.argv[1]))\n", "print('NTC64 0x{:0>16x}'.format(hval))\n", "print(\"fhVal 0x{:0>16x}\".format(fval))\n", "print(\"rhVal 0x{:0>16x}\".format(rval))\n" ] } ], "source": [ "!cat nthash.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So I ran it, and... it matches my Rust result, but doesn't match the original implementation. Hmm.\n", "\n", "While re-reading the paper I looked into the [supplementary materials](https://academic.oup.com/bioinformatics/article/32/22/3492/2525588), and they have the `NT64` function defined (which is missing in the implementation on GitHub). So I used that to make the canonical version `NTC64`, which takes the minimum hash from both forward (`NT64F`) and reverse (`NT64R`) strands with the same parameters from the [current implementation](https://github.com/bcgsc/ntHash/blob/57af16a972a553ecccea0cda25af85ac1f96a94b/nthash.hpp) (so we can compare them easily).\n", "\n", "Here is the simple implementation, without worrying about optimizations:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "#include \n", "\n", "using namespace std;\n", "\n", "namespace nthash {\n", "\n", "uint64_t h(char i) {\n", " switch (i) {\n", " case 'A': return 0x3c8bfbb395c60474;\n", " case 'C': return 0x3193c18562a02b4c;\n", " case 'G': return 0x20323ed082572324;\n", " case 'T': return 0x295549f54be24456;\n", " case 'N': return 0x0000000000000000;\n", " default: break;\n", " }\n", " return 0;\n", "}\n", "\n", "uint64_t rc(char i) {\n", " switch (i) {\n", " case 'A': return 0x295549f54be24456;\n", " case 'C': return 0x20323ed082572324;\n", " case 'G': return 0x3193c18562a02b4c;\n", " case 'T': return 0x3c8bfbb395c60474;\n", " case 'N': return 0x0000000000000000;\n", " default: break;\n", " }\n", " return 0;\n", "}\n", "\n", "uint64_t rol(uint64_t v, unsigned k) {\n", " return (v << k) | (v >> (64 - k));\n", "}\n", "\n", "inline uint64_t NT64(const char * kmerSeq, const unsigned k) {\n", " uint64_t hVal = 0;\n", " for(unsigned i=0; i < k; i++)\n", " hVal ^= rol(h(kmerSeq[i]), k-1-i);\n", " return hVal;\n", "}\n", "\n", "inline uint64_t NTF64(const char * kmerSeq, const unsigned k) {\n", " uint64_t hVal = 0;\n", " for(unsigned i=0; i < k; i++)\n", " hVal ^= rol(h(kmerSeq[i]), k-1-i);\n", " return hVal;\n", "}\n", "\n", "inline uint64_t NTR64(const char * kmerSeq, const unsigned k) {\n", " uint64_t hVal = 0;\n", " for(unsigned i=0; i < k; i++)\n", " hVal ^= rol(rc(kmerSeq[k-i-1]), k-1-i);\n", " return hVal;\n", "}\n", "\n", "inline uint64_t NTC64(const char * kmerSeq, const unsigned k, uint64_t& fhVal, uint64_t& rhVal) {\n", " fhVal=NTF64(kmerSeq, k);\n", " rhVal=NTR64(kmerSeq, k);\n", " return (rhVal\n", "#include \n", "#include \n", "\n", "using namespace std;\n", "\n", "int main(int argc, char** argv) {\n", " string seq = argv[1];\n", "\n", " uint64_t hVal, fhVal, rhVal;\n", "\n", " hVal = NTC64(seq.c_str(), seq.size(), fhVal, rhVal); // initial hash value\n", " cout << \"NTC64 0x\" << hex << setfill('0') << setw(16) << hVal << endl;\n", " cout << \"fhVal 0x\" << hex << setfill('0') << setw(16) << fhVal << endl;\n", " cout << \"rhVal 0x\" << hex << setfill('0') << setw(16) << rhVal << endl;\n", "\n", " return 0;\n", "}\n" ] } ], "source": [ "!cat nt_main.cpp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Python version is at the end of the `nthash.py` file above. And finally the Rust version (using the `nthash` crate from my repo):" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "extern crate nthash;\n", "\n", "use nthash::{f, nthash, r};\n", "use std::env;\n", "\n", "fn main() {\n", " let seq = env::args().nth(1).unwrap();\n", " println!(\n", " \"NTC64 0x{:0>16x}\",\n", " nthash(seq.as_bytes(), seq.len() as u8)[0]\n", " );\n", " println!(\"fhVal 0x{:0>16x}\", f(seq.as_bytes(), 0, seq.len() as u32));\n", " println!(\"rhVal 0x{:0>16x}\", r(seq.as_bytes(), 0, seq.len() as u32));\n", "}\n" ] } ], "source": [ "!cat src/main.rs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quickly run a test: Makefile\n", "\n", "I also put everything in one `Makefile`, so it's easy to compile and run a basic test:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "g++ -O3 -DNTHASH_OPT -I. nt_main.cpp -o nt_opt\n", "g++ -O3 -I. nt_main.cpp -o nt_article\n", "g++ -O3 -DNTHASH_104 -I. nt_main.cpp -o nt_104\n" ] } ], "source": [ "!make -B" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./nt_opt TGCAG\n", "NTC64 0x0bafa6628fc6dab7\n", "fhVal 0x0bafa6628fc6dab7\n", "rhVal 0x8cf2d41f2cca4802\n", "./nt_article TGCAG\n", "NTC64 0x0bafa6728fc6dabf\n", "fhVal 0x0bafa6728fc6dabf\n", "rhVal 0x8cf2d4072cca480e\n", "python nthash.py TGCAG\n", "NTC64 0x0bafa6728fc6dabf\n", "fhVal 0x0bafa6728fc6dabf\n", "rhVal 0x8cf2d4072cca480e\n", "cargo run -q TGCAG\n", "NTC64 0x0bafa6728fc6dabf\n", "fhVal 0x0bafa6728fc6dabf\n", "rhVal 0x8cf2d4072cca480e\n" ] } ], "source": [ "!make test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But for running more tests it is a bit annoying to go in the Makefile and change it, so we can benefit from the IPython `%%bash` magic:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NTC64 0x01542d2a1299ba7e\n", "fhVal 0x9b5384ab1b0279a4\n", "rhVal 0x01542d2a1299ba7e\n", "NTC64 0x01542d341299ba71\n", "fhVal 0x9b5384bf1b0279ae\n", "rhVal 0x01542d341299ba71\n", "NTC64 0x01542d341299ba71\n", "fhVal 0x9b5384bf1b0279ae\n", "rhVal 0x01542d341299ba71\n", "NTC64 0x01542d341299ba71\n", "fhVal 0x9b5384bf1b0279ae\n", "rhVal 0x01542d341299ba71\n" ] } ], "source": [ "%%bash -l\n", "./nt_opt AAAAA\n", "./nt_article AAAAA\n", "python nthash.py AAAAA\n", "cargo run -q AAAAA" ] }, { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": false }, "source": [ "## Comparing the implementations\n", "\n", "But we still have to set the sequence in 4 different places. Let's make a function to do that for us:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from IPython.display import display_html\n", "\n", "from functools import partial, reduce\n", "\n", "def color_same(unique, s):\n", " # 4 different colors here, because we have at most 4\n", " # (HOPEFULLY) different values\n", " colors = dict(zip(unique, ('#ff7f0e', '#2ca02c', \n", " '#1f77b4', '#d62728')))\n", " return ['color: {}'.format(colors[v]) for v in s]\n", "\n", "def run_for_seq(seq):\n", " ''' Returns a dataframe and a formatted styler (good for display) '''\n", " opt = %sx ./nt_opt {seq}\n", " simple = %sx ./nt_article {seq}\n", " py = %sx python nthash.py {seq}\n", " rust = %sx . ~/.cargo/env && cargo run -q {seq}\n", " \n", " def parse(result):\n", " out = {}\n", " for line in result:\n", " k, v = line.strip().split()\n", " out[k] = int(v, 16)\n", " return out\n", "\n", " df = pd.DataFrame.from_dict({\n", " 'opt': parse(opt),\n", " 'simple': parse(simple),\n", " 'py': parse(py),\n", " 'rust': parse(rust)\n", " }).T\n", " \n", " unique_values = np.unique(df.values.flatten())\n", " formatter = (df.style.format('0x{:0>16x}')\n", " .apply(partial(color_same, unique_values), axis=1))\n", " return df, formatter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With out new function `run_for_seq` it is a bit easier to see patterns (using colors to highlight same values). For example, the output for our previous example (`AAAAA`) is now" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x01542d2a1299ba7e0x9b5384ab1b0279a40x01542d2a1299ba7e
simple0x01542d341299ba710x9b5384bf1b0279ae0x01542d341299ba71
py0x01542d341299ba710x9b5384bf1b0279ae0x01542d341299ba71
rust0x01542d341299ba710x9b5384bf1b0279ae0x01542d341299ba71
" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df, form = run_for_seq('AAAAA')\n", "form" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1-mers\n", "\n", "Let's start with the simple case: what is the hash for the 1-mers?" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sequence: A\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x295549f54be244560x3c8bfbb395c604740x295549f54be24456
simple0x295549f54be244560x3c8bfbb395c604740x295549f54be24456
py0x295549f54be244560x3c8bfbb395c604740x295549f54be24456
rust0x295549f54be244560x3c8bfbb395c604740x295549f54be24456
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: C\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x20323ed0825723240x3193c18562a02b4c0x20323ed082572324
simple0x20323ed0825723240x3193c18562a02b4c0x20323ed082572324
py0x20323ed0825723240x3193c18562a02b4c0x20323ed082572324
rust0x20323ed0825723240x3193c18562a02b4c0x20323ed082572324
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: G\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x20323ed0825723240x20323ed0825723240x3193c18562a02b4c
simple0x20323ed0825723240x20323ed0825723240x3193c18562a02b4c
py0x20323ed0825723240x20323ed0825723240x3193c18562a02b4c
rust0x20323ed0825723240x20323ed0825723240x3193c18562a02b4c
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: T\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x295549f54be244560x295549f54be244560x3c8bfbb395c60474
simple0x295549f54be244560x295549f54be244560x3c8bfbb395c60474
py0x295549f54be244560x295549f54be244560x3c8bfbb395c60474
rust0x295549f54be244560x295549f54be244560x3c8bfbb395c60474
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for nt in \"ACGT\":\n", " df, form = run_for_seq(nt)\n", " print(f\"Sequence: {nt}\")\n", " display(form)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So far so good, all values in the columns are the same,\n", "and `fhVal` is the canonical form for `A` and `C` and `rhVal` is the canonical form for `T` and `G` (since it ends up being `h(A)` and `h(C)`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2-mers: palindromes \n", "\n", "Let's start with 2-mers then! One case to analyze is for palindrome 2-mers like `GC`, where both forward and reverse strands are the same. We expect that all values will be the same, and indeed that's what we see for `GC`:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x71f7bc24660e6d040x71f7bc24660e6d040x71f7bc24660e6d04
simple0x71f7bc24660e6d040x71f7bc24660e6d040x71f7bc24660e6d04
py0x71f7bc24660e6d040x71f7bc24660e6d040x71f7bc24660e6d04
rust0x71f7bc24660e6d040x71f7bc24660e6d040x71f7bc24660e6d04
" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df, form = run_for_seq('GC')\n", "form" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But it's not what we see for the other cases (but at least `opt` is consistent and is giving the same value for `fhVal` and `rhVal`):" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sequence: AT\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x5042be90606e4cbf0x5042be90606e4cbf0x5042be90606e4cbf
simple0x5042be92606e4cbe0x5042be92606e4cbe0x5042be92606e4cbe
py0x5042be92606e4cbe0x5042be92606e4cbe0x5042be92606e4cbe
rust0x5042be92606e4cbe0x5042be92606e4cbe0x5042be92606e4cbe
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: TA\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x6e21685b02028cd90x6e21685b02028cd90x6e21685b02028cd9
simple0x6e21685902028cd80x6e21685902028cd80x6e21685902028cd8
py0x6e21685902028cd80x6e21685902028cd80x6e21685902028cd8
rust0x6e21685902028cd80x6e21685902028cd80x6e21685902028cd8
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: GC\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x71f7bc24660e6d040x71f7bc24660e6d040x71f7bc24660e6d04
simple0x71f7bc24660e6d040x71f7bc24660e6d040x71f7bc24660e6d04
py0x71f7bc24660e6d040x71f7bc24660e6d040x71f7bc24660e6d04
rust0x71f7bc24660e6d040x71f7bc24660e6d040x71f7bc24660e6d04
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: CG\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x4315bdd8471775bd0x4315bdd8471775bd0x4315bdd8471775bd
simple0x4315bdda471775bc0x4315bdda471775bc0x4315bdda471775bc
py0x4315bdda471775bc0x4315bdda471775bc0x4315bdda471775bc
rust0x4315bdda471775bc0x4315bdda471775bc0x4315bdda471775bc
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for seq in ('AT', 'TA', 'GC', 'CG'):\n", " df, form = run_for_seq(seq)\n", " print(f\"Sequence: {seq}\")\n", " display(form)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2-mers: same base\n", "\n", "When we check 2-mers with the same base, something interesting: `rhVal(CC)` matches `fhVal(GG)` for all implementations, but for all the other cases we see the same pattern (of `opt` disagreeing with the other implementations)." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sequence: AA\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x459c0cd6be4a0c9d0x459c0cd6be4a0c9d0x7bffda1ddc26ccfb
simple0x459c0cd4be4a0c9c0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa
py0x459c0cd4be4a0c9c0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa
rust0x459c0cd4be4a0c9c0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: TT\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x459c0cd6be4a0c9d0x7bffda1ddc26ccfb0x459c0cd6be4a0c9d
simple0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa0x459c0cd4be4a0c9c
py0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa0x459c0cd4be4a0c9c
rust0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa0x459c0cd4be4a0c9c
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: CC\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x52b4428da7e07dd50x52b4428da7e07dd50x6056437186f9656c
simple0x52b4428fa7e07dd40x52b4428fa7e07dd40x6056437186f9656c
py0x52b4428fa7e07dd40x52b4428fa7e07dd40x6056437186f9656c
rust0x52b4428fa7e07dd40x52b4428fa7e07dd40x6056437186f9656c
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: GG\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
opt0x52b4428da7e07dd50x6056437186f9656c0x52b4428da7e07dd5
simple0x52b4428fa7e07dd40x6056437186f9656c0x52b4428fa7e07dd4
py0x52b4428fa7e07dd40x6056437186f9656c0x52b4428fa7e07dd4
rust0x52b4428fa7e07dd40x6056437186f9656c0x52b4428fa7e07dd4
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for seq in ('AA', 'TT', 'CC', 'GG'):\n", " df, form = run_for_seq(seq)\n", " print(f\"Sequence: {seq}\")\n", " display(form)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It seems like there is something off with the optimizations implemented in `ntHash`, but I'm not sure how to help to track down more than providing some reduced test cases." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to avoid similar problems?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testing (unit and property-based)\n", "\n", "From reading the original code we see that there are no tests (the [`nttest.cpp`](https://github.com/bcgsc/ntHash/blob/57af16a972a553ecccea0cda25af85ac1f96a94b/nttest.cpp) code is for comparing `ntHash` with other hash functions). Unit testing would be a good start (because it helps to catch simple bugs), but because the codebase went through an optimization phase there is an option that really shines for this use case: property based testing.\n", "\n", "Unit tests are usually implementing by giving an input to a function and checking if the return value is correct. Property based testing extend this idea further by defining properties that the function must respect, and then generating random inputs to try to falsify the property. [This post](https://fsharpforfunandprofit.com/posts/property-based-testing/) has a great explanation of the process.\n", "\n", "I first heard about property based testing via [Hypothesis](https://hypothesis.works/), but sadly it is a Python library and I wasn't planning to write bindings for ntHash. There is an alternative for C++, [autocheck](https://github.com/thejohnfreeman/autocheck/), which is more similar to [QuickCheck](https://en.wikipedia.org/wiki/QuickCheck) and, for this purpose, is good enough to demonstrate the idea.\n", "\n", "For this specific use case, we can use an oracle to implement a property: the unoptimized implementation is our oracle, and the results for the optimized version must match the results for the unoptimized version. In `autocheck` this property can be written like this:\n", "```c++\n", "struct prop_nt_oracle {\n", " bool operator () (const string& seq) {\n", " uint64_t fhVal, rhVal, hValOpt, hValArticle;\n", "\n", " hValOpt = NTC64(seq.c_str(), seq.size(), fhVal, rhVal);\n", " hValArticle = nthash::NTC64(seq.c_str(), seq.size(), fhVal, rhVal);\n", "\n", " return hValOpt == hValArticle;\n", " }\n", "};\n", "```\n", "\n", "The full code (using `catch` as a test runner) is in `test/test.cpp`:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "#define CATCH_CONFIG_MAIN // This tells Catch to provide a main() - only do this in one cpp file\n", "#include \"catch.hpp\"\n", "\n", "#include \"autocheck/autocheck.hpp\"\n", "#include \"autocheck/check.hpp\"\n", "\n", "namespace ac = autocheck;\n", "\n", "#include \"nthash.hpp\"\n", "#include \"nthash_simple.hpp\"\n", "\n", "static const char alnts[] = \"ACGTN\";\n", "\n", "class seq_generator {\n", " public:\n", " typedef std::basic_string result_type;\n", "\n", " result_type operator() (size_t size = 1) {\n", " result_type rv;\n", " rv.reserve(size);\n", "\n", " std::uniform_int_distribution dist(0, 5);\n", "\n", " std::generate_n(std::back_insert_iterator(rv), size, \n", " [&]{ return alnts[dist(ac::rng())]; });\n", "\n", " return rv;\n", " }\n", "};\n", "\n", "\n", "struct prop_nt_oracle {\n", " bool operator () (const string& seq) {\n", " uint64_t fhVal, rhVal, hValOpt, hValArticle;\n", "\n", " hValOpt = NTC64(seq.c_str(), seq.size(), fhVal, rhVal);\n", " hValArticle = nthash::NTC64(seq.c_str(), seq.size(), fhVal, rhVal);\n", "\n", " return hValOpt == hValArticle;\n", " }\n", "};\n", "\n", "\n", "TEST_CASE( \"oracle\", \"[nthash]\" ) {\n", " ac::catch_reporter rep;\n", "\n", " SECTION(\"optimized matches simple implementation outputs?\") {\n", " ac::check(prop_nt_oracle(), 100,\n", " ac::make_arbitrary(seq_generator()), rep);\n", " }\n", "}\n" ] } ], "source": [ "!cat test/test.cpp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I defined a new input generator, `seq_generator`, to create relevant genomic sequences for our test cases. It just takes a `size` and randomly generate a string of that size with `ACTGN` characters.\n", "\n", "To run this test case, there is a Makefile in the `test` directory that compiles everything:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "make: 'test' is up to date.\n", "\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n", "\u001b[0;37mtest is a Catch v2.4.0 host application.\n", "Run with -? for options\n", "\n", "\u001b[0m-------------------------------------------------------------------------------\n", "\u001b[0moracle\n", "\u001b[0m\u001b[0m optimized matches simple implementation outputs?\n", "\u001b[0m-------------------------------------------------------------------------------\n", "\u001b[0;37mtest.cpp:47\n", "\u001b[0m...............................................................................\n", "\n", "\u001b[0;37mautocheck/reporter.hpp:111: \u001b[0m\u001b[1;31mFAILED:\n", "\u001b[0mexplicitly with message:\n", " Falsifiable, after 7 tests:\n", " (NTC)\n", "\n", "\u001b[1;31m===============================================================================\u001b[0m\u001b[1;33m\u001b[0m\u001b[0;32m\u001b[0m\n", "test cases: 1\u001b[0;37m | \u001b[0m\u001b[1;31m1 failed\u001b[0m\n", "assertions: 1\u001b[0;37m | \u001b[0m\u001b[1;31m1 failed\u001b[0m\n", "\n" ] } ], "source": [ "! cd test/ && make\n", "! test/test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And if the implementations don't match, it will keep finding inputs that falsify our property (in this case, any input where the optimized implementation doesn't match the simple one)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Update: ~~optimization~~ versioning problem\n", "\n", "I opened an [issue](https://github.com/bcgsc/ntHash/issues/7) and seems like the current `master` version is the culprit (more specifically [57af16a](https://github.com/bcgsc/ntHash/tree/57af16a972a553ecccea0cda25af85ac1f96a94b/)). Reverting back to the [`1.0.4` release](https://github.com/bcgsc/ntHash/releases/tag/1.0.4) fixes the problem.\n", "\n", "So, in the end it wasn't a case of an optimization bug, but the case of a versioning bug =]\n", "\n", "Re-running the analysis code with the `1.0.4` matches expectations:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "def run_for_seq(seq):\n", " ''' Returns a dataframe and a formatted styler (good for display) '''\n", " opt = %sx ./nt_104 {seq}\n", " simple = %sx ./nt_article {seq}\n", " py = %sx python nthash.py {seq}\n", " rust = %sx . ~/.cargo/env && cargo run -q {seq}\n", " \n", " def parse(result):\n", " out = {}\n", " for line in result:\n", " k, v = line.strip().split()\n", " out[k] = int(v, 16)\n", " return out\n", "\n", " df = pd.DataFrame.from_dict({\n", " '1.0.4': parse(opt),\n", " 'simple': parse(simple),\n", " 'py': parse(py),\n", " 'rust': parse(rust)\n", " }).T\n", " \n", " unique_values = np.unique(df.values.flatten())\n", " formatter = (df.style.format('0x{:0>16x}')\n", " .apply(partial(color_same, unique_values), axis=1))\n", " return df, formatter" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x01542d341299ba710x9b5384bf1b0279ae0x01542d341299ba71
simple0x01542d341299ba710x9b5384bf1b0279ae0x01542d341299ba71
py0x01542d341299ba710x9b5384bf1b0279ae0x01542d341299ba71
rust0x01542d341299ba710x9b5384bf1b0279ae0x01542d341299ba71
" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df, form = run_for_seq('AAAAA')\n", "form" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1-mers" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sequence: A\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x295549f54be244560x3c8bfbb395c604740x295549f54be24456
simple0x295549f54be244560x3c8bfbb395c604740x295549f54be24456
py0x295549f54be244560x3c8bfbb395c604740x295549f54be24456
rust0x295549f54be244560x3c8bfbb395c604740x295549f54be24456
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: C\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x20323ed0825723240x3193c18562a02b4c0x20323ed082572324
simple0x20323ed0825723240x3193c18562a02b4c0x20323ed082572324
py0x20323ed0825723240x3193c18562a02b4c0x20323ed082572324
rust0x20323ed0825723240x3193c18562a02b4c0x20323ed082572324
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: G\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x20323ed0825723240x20323ed0825723240x3193c18562a02b4c
simple0x20323ed0825723240x20323ed0825723240x3193c18562a02b4c
py0x20323ed0825723240x20323ed0825723240x3193c18562a02b4c
rust0x20323ed0825723240x20323ed0825723240x3193c18562a02b4c
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: T\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x295549f54be244560x295549f54be244560x3c8bfbb395c60474
simple0x295549f54be244560x295549f54be244560x3c8bfbb395c60474
py0x295549f54be244560x295549f54be244560x3c8bfbb395c60474
rust0x295549f54be244560x295549f54be244560x3c8bfbb395c60474
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for nt in \"ACGT\":\n", " df, form = run_for_seq(nt)\n", " print(f\"Sequence: {nt}\")\n", " display(form)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2-mers: palindromes " ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sequence: AT\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x5042be92606e4cbe0x5042be92606e4cbe0x5042be92606e4cbe
simple0x5042be92606e4cbe0x5042be92606e4cbe0x5042be92606e4cbe
py0x5042be92606e4cbe0x5042be92606e4cbe0x5042be92606e4cbe
rust0x5042be92606e4cbe0x5042be92606e4cbe0x5042be92606e4cbe
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: TA\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x6e21685902028cd80x6e21685902028cd80x6e21685902028cd8
simple0x6e21685902028cd80x6e21685902028cd80x6e21685902028cd8
py0x6e21685902028cd80x6e21685902028cd80x6e21685902028cd8
rust0x6e21685902028cd80x6e21685902028cd80x6e21685902028cd8
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: GC\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x71f7bc24660e6d040x71f7bc24660e6d040x71f7bc24660e6d04
simple0x71f7bc24660e6d040x71f7bc24660e6d040x71f7bc24660e6d04
py0x71f7bc24660e6d040x71f7bc24660e6d040x71f7bc24660e6d04
rust0x71f7bc24660e6d040x71f7bc24660e6d040x71f7bc24660e6d04
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: CG\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x4315bdda471775bc0x4315bdda471775bc0x4315bdda471775bc
simple0x4315bdda471775bc0x4315bdda471775bc0x4315bdda471775bc
py0x4315bdda471775bc0x4315bdda471775bc0x4315bdda471775bc
rust0x4315bdda471775bc0x4315bdda471775bc0x4315bdda471775bc
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for seq in ('AT', 'TA', 'GC', 'CG'):\n", " df, form = run_for_seq(seq)\n", " print(f\"Sequence: {seq}\")\n", " display(form)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2-mers: same base" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sequence: AA\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x459c0cd4be4a0c9c0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa
simple0x459c0cd4be4a0c9c0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa
py0x459c0cd4be4a0c9c0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa
rust0x459c0cd4be4a0c9c0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: TT\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x459c0cd4be4a0c9c0x7bffda1fdc26ccfa0x459c0cd4be4a0c9c
simple0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa0x459c0cd4be4a0c9c
py0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa0x459c0cd4be4a0c9c
rust0x459c0cd4be4a0c9c0x7bffda1fdc26ccfa0x459c0cd4be4a0c9c
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: CC\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x52b4428fa7e07dd40x52b4428fa7e07dd40x6056437186f9656c
simple0x52b4428fa7e07dd40x52b4428fa7e07dd40x6056437186f9656c
py0x52b4428fa7e07dd40x52b4428fa7e07dd40x6056437186f9656c
rust0x52b4428fa7e07dd40x52b4428fa7e07dd40x6056437186f9656c
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Sequence: GG\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NTC64fhValrhVal
1.0.40x52b4428fa7e07dd40x6056437186f9656c0x52b4428fa7e07dd4
simple0x52b4428fa7e07dd40x6056437186f9656c0x52b4428fa7e07dd4
py0x52b4428fa7e07dd40x6056437186f9656c0x52b4428fa7e07dd4
rust0x52b4428fa7e07dd40x6056437186f9656c0x52b4428fa7e07dd4
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for seq in ('AA', 'TT', 'CC', 'GG'):\n", " df, form = run_for_seq(seq)\n", " print(f\"Sequence: {seq}\")\n", " display(form)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "make: 'test_104' is up to date.\n", "OK, passed 10000 tests.\n", "\u001b[1;31m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;32m===============================================================================\u001b[0m\n", "\u001b[1;32mAll tests passed\u001b[0m (1 assertion in 1 test case)\n", "\n" ] } ], "source": [ "! cd test && make test_104\n", "! test/test_104" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" }, "toc-autonumbering": false, "toc-showcode": false, "toc-showmarkdowntxt": false, "toc-showtags": false }, "nbformat": 4, "nbformat_minor": 2 }