{ "cells": [ { "metadata": { "ExecuteTime": { "start_time": "2019-08-09T08:12:01.710949Z", "end_time": "2019-08-09T08:12:01.740203Z" } }, "cell_type": "markdown", "source": "Document sous [licence CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.fr)" }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.318763Z", "start_time": "2019-08-08T16:33:20.315892Z" }, "trusted": false }, "cell_type": "markdown", "source": "alphabet = ['G','T','C','A']\nmotif = 'GATTACA'" }, { "metadata": {}, "cell_type": "markdown", "source": "On crée un texte aléatoire `texte` contenant le motif et un autre `texte2` qui ne le contient pas, à fin de tester nos fonctions." }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.336002Z", "start_time": "2019-08-08T16:33:20.321593Z" }, "trusted": false }, "cell_type": "code", "source": "import random as rd\nN = 10000\n\n# on tire au hasard un texte sur l'alphabet, jusqu'à trouver un texte contenant le motif\n\nfor _ in range(10000):\n texte = ''\n for i in range(N):\n texte += rd.choice(alphabet)\n if texte.find(motif) != -1: break", "execution_count": 2, "outputs": [] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.393862Z", "start_time": "2019-08-08T16:33:20.338006Z" }, "trusted": false }, "cell_type": "code", "source": "for _ in range(1000):\n texte2 = ''\n for i in range(N):\n texte2 += rd.choice(alphabet)\n if texte2.find(motif) == -1: break", "execution_count": 3, "outputs": [] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.406144Z", "start_time": "2019-08-08T16:33:20.395549Z" }, "trusted": false }, "cell_type": "code", "source": "texte.find(motif),texte2.find(motif)", "execution_count": 4, "outputs": [ { "data": { "text/plain": "(4264, -1)" }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.414252Z", "start_time": "2019-08-08T16:33:20.408935Z" }, "trusted": false }, "cell_type": "code", "source": "texte[4094:4104] + '...'", "execution_count": 5, "outputs": [ { "data": { "text/plain": "'AATACGGGCT...'" }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.422448Z", "start_time": "2019-08-08T16:33:20.416812Z" }, "trusted": false }, "cell_type": "code", "source": "fichier = open('LeRougeEtLeNoir.txt','r')\nstendhal = fichier.read()\nfichier.close()", "execution_count": 6, "outputs": [] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.429353Z", "start_time": "2019-08-08T16:33:20.424813Z" }, "trusted": false }, "cell_type": "code", "source": "stendhal.find('Julien tremblait')", "execution_count": 7, "outputs": [ { "data": { "text/plain": "161411" }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.435528Z", "start_time": "2019-08-08T16:33:20.43136Z" }, "trusted": false }, "cell_type": "code", "source": "stendhal[161411:161500] + '...'", "execution_count": 8, "outputs": [ { "data": { "text/plain": "'Julien tremblait que sa demande ne fût accordée son rôle de séducteur lui pesait si horri...'" }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.441841Z", "start_time": "2019-08-08T16:33:20.437308Z" }, "trusted": false }, "cell_type": "code", "source": "def nbOccurences(texte,motif):\n r,i = 0,0\n while True:\n k = texte[i:].find(motif)\n if k == -1: return r\n else: \n r += 1\n i += k + 1", "execution_count": 9, "outputs": [] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.529024Z", "start_time": "2019-08-08T16:33:20.444373Z" }, "trusted": false }, "cell_type": "code", "source": "nbOccurences(stendhal,'Julien tremblait'),nbOccurences(stendhal,'amour'),\\\nnbOccurences(stendhal,'haine'),nbOccurences(stendhal,'mort'),\\\nnbOccurences(stendhal,'Julien')", "execution_count": 10, "outputs": [ { "data": { "text/plain": "(1, 225, 36, 178, 1908)" }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ] }, { "metadata": {}, "cell_type": "markdown", "source": "Recherche naïve : à chaque échec on décale d'une unité la fenêtre vers la droite.\nOn teste chaque fenêtre de droite à gauche.\n\nLe décalage de la fenêtre se fait en ligne 12." }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.535254Z", "start_time": "2019-08-08T16:33:20.530728Z" }, "trusted": false }, "cell_type": "code", "source": "def cherche(texte,motif):\n n = len(texte)\n p = len(motif)\n if p > n: return -1\n debut = 0\n while debut <= n-p:\n j = p-1\n while j >= 0:\n if texte[debut+j] != motif[j]: break\n j -= 1\n if j<0: return debut\n debut += 1\n return -1", "execution_count": 11, "outputs": [] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.545807Z", "start_time": "2019-08-08T16:33:20.537012Z" }, "trusted": false }, "cell_type": "code", "source": "cherche(texte,motif),cherche(texte2,motif)", "execution_count": 12, "outputs": [ { "data": { "text/plain": "(4264, -1)" }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.596497Z", "start_time": "2019-08-08T16:33:20.547617Z" }, "trusted": false }, "cell_type": "code", "source": "cherche(stendhal,'Julien tremblait')", "execution_count": 13, "outputs": [ { "data": { "text/plain": "161411" }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ] }, { "metadata": {}, "cell_type": "markdown", "source": "## Boyer-Moore" }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.601195Z", "start_time": "2019-08-08T16:33:20.598132Z" }, "trusted": false }, "cell_type": "code", "source": "def calculeDecalages(motif):\n d = {}\n p = len(motif)\n for c in motif: d[c] = -1\n for i in range(p):\n d[motif[i]] = i\n return d", "execution_count": 14, "outputs": [] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.605357Z", "start_time": "2019-08-08T16:33:20.602943Z" }, "trusted": false }, "cell_type": "code", "source": "def decale(c,decalages):\n try:\n return decalages[c]\n except KeyError:\n return -1", "execution_count": 15, "outputs": [] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.611848Z", "start_time": "2019-08-08T16:33:20.607188Z" }, "trusted": false }, "cell_type": "code", "source": "d = calculeDecalages('aababab')\ndecale('a',d),decale('b',d),decale('w',d)", "execution_count": 16, "outputs": [ { "data": { "text/plain": "(5, 6, -1)" }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.619896Z", "start_time": "2019-08-08T16:33:20.615381Z" }, "trusted": false }, "cell_type": "code", "source": "def BM(texte, motif):\n n = len(texte)\n p = len(motif)\n if p > n: return -1\n decalages = calculeDecalages(motif)\n debut = 0\n while debut <= n-p:\n j = p-1\n while j >= 0:\n if texte[debut+j] != motif[j]: break\n j -= 1\n if j < 0: return debut\n k = j - decale(texte[debut+j],decalages)\n if k < 1: k = 1\n debut += k\n return -1 ", "execution_count": 17, "outputs": [] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.626275Z", "start_time": "2019-08-08T16:33:20.622731Z" }, "trusted": false }, "cell_type": "code", "source": "BM('aabbbababacaabbabaabababaab','aabba')", "execution_count": 18, "outputs": [ { "data": { "text/plain": "11" }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.634472Z", "start_time": "2019-08-08T16:33:20.628008Z" }, "trusted": false }, "cell_type": "code", "source": "BM(texte,motif),BM(texte2,motif)", "execution_count": 19, "outputs": [ { "data": { "text/plain": "(4264, -1)" }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.650151Z", "start_time": "2019-08-08T16:33:20.636265Z" }, "trusted": false }, "cell_type": "code", "source": "BM(stendhal,'Julien tremblait')", "execution_count": 20, "outputs": [ { "data": { "text/plain": "161411" }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:33:20.65528Z", "start_time": "2019-08-08T16:33:20.651906Z" }, "trusted": false }, "cell_type": "code", "source": "def nbOccurencesBM(texte,motif):\n r,i = 0,0\n while True:\n k = BM(texte[i:],motif)\n if k == -1: return r\n else: \n r += 1\n i += k + 1 ", "execution_count": 21, "outputs": [] }, { "metadata": { "ExecuteTime": { "end_time": "2019-08-08T16:45:12.056117Z", "start_time": "2019-08-08T16:45:11.491983Z" }, "trusted": false }, "cell_type": "code", "source": "nbOccurencesBM(stendhal,'Julien'),nbOccurencesBM(stendhal,'amour'),\\\nnbOccurencesBM(stendhal,'Napoléon'),nbOccurencesBM(stendhal,'Bonaparte'),nbOccurencesBM(stendhal,'Waterloo')", "execution_count": 27, "outputs": [ { "data": { "text/plain": "(1908, 225, 33, 17, 1)" }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ] }, { "metadata": {}, "cell_type": "markdown", "source": "__Références :__\n- Robert Sedgewick, _Algorithms_, 4th edition, Addison-Wesley, 2011, pages 770-773\n- Danièle Beauquier, Jean Berstel, Philippe Chrétienne, _Éléments d'algorithmique_, Masson 1992, pages 358–366\n\nRemarque : le Beauquier-Berstel-Chrétienne est en français, il décrit de plus l'algorithme plus avancé de Boyer \net Moore, non repris par Sedgewick ni présenté dans ce document. Sa lecture est sans doute plus difficile que celle de l'excellent Sedgewick, mais il est en revanche disponible [gratuitement en ligne](http://www-igm.univ-mlv.fr/~berstel/Elements/Elements.pdf) !" }, { "metadata": { "trusted": false }, "cell_type": "code", "source": "", "execution_count": null, "outputs": [] } ], "metadata": { "kernelspec": { "name": "python3", "display_name": "Python 3", "language": "python" }, "language_info": { "mimetype": "text/x-python", "nbconvert_exporter": "python", "name": "python", "pygments_lexer": "ipython3", "version": "3.5.4", "file_extension": ".py", "codemirror_mode": { "version": 3, "name": "ipython" } }, "varInspector": { "window_display": false, "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "library": "var_list.py", "delete_cmd_prefix": "del ", "delete_cmd_postfix": "", "varRefreshCmd": "print(var_dic_list())" }, "r": { "library": "var_list.r", "delete_cmd_prefix": "rm(", "delete_cmd_postfix": ") ", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ] } }, "nbformat": 4, "nbformat_minor": 2 }