{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Hash Taulak ( _Associative array_ , _Elkartze Arrayak_ )\n", "\n", " * Elementuak **taula** (zerrenda) batetan gordetzen dira\n", " * $Taularen \\; tamaina \\ge elementu \\; kopurua$\n", " * Elementuaren arabera, bakoitzari posizio bat dagokio\n", " * Ez dago elementu errepikaturik $\\to$ multzo bat\n", " * Hash funtzioa $\\to$ elementu bakoitzari dagokion posizioa\n", "\n", "
\"Hash
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Hash Funtzioa\n", "\n", " * Edozein informazioren gain aplikatzean, tamaina finkoko balio bat (zenbaki bat) lortu.\n", " * Informazio baten _sinadura_ (laburpen) digitala.\n", " * Propietateak:\n", " * Hash balioen banaketa uniformea\n", " * Berdinak diren bi objektuen hash balioak berdinak dira\n", " * Kontrakoak ez du zertan egia izan behar. \n", " * Adibide bat:\n", " * [SHA256:](https://andersbrownworth.com/blockchain/hash) _informazioa_ → 256 bit" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Hash Taulen inplementazioa\n", "\n", " * $N$ tamainako Zerrenda bat erabili.\n", " * Okupazio faktorea (*load factor*): $LF = {n_{stored}}/{N}$\n", " * $e$ elementu bati taulan dagokion posizioa: $i \\; = \\; hash(e)\\; \\% \\; N$\n", " * *Talkak* edo *kolisioak* gerta daitezke\n", " * **Kolisioa**: Bi $e_1 , e_2$ elementu ezberdinei $i_1 = i_2$ posizio berdina esleitzea\n", " * $hash(e_1) = hash(e_2)$ balio berdina izan dezakete.\n", " * $hash(e_1) \\ne hash(e_2)$ izanda ere, $(hash(e_1)\\, \\% \\, N) = (hash(e_2)\\, \\% \\, N)$ gerta daiteke." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[None, None, None, None, None, None, None, None, None, None]\n", "Ane --> 3 -4629060251877988377\n", "[None, None, None, 'Ane', None, None, None, None, None, None]\n", "Jon --> 0 7236508924607247770\n", "['Jon', None, None, 'Ane', None, None, None, None, None, None]\n", "Miren --> 4 4183904924672749424\n", "['Jon', None, None, 'Ane', 'Miren', None, None, None, None, None]\n", "Asier --> 7 843456204749208567\n", "['Jon', None, None, 'Ane', 'Miren', None, None, 'Asier', None, None]\n", "Nora --> 2 12817827696081252\n", "['Jon', None, 'Nora', 'Ane', 'Miren', None, None, 'Asier', None, None]\n" ] } ], "source": [ "N=10\n", "T = [None]*N\n", "print(T)\n", "for e in (\"Ane\",\"Jon\",\"Miren\",\"Asier\",\"Nora\"):\n", " i = hash(e)%N\n", " print(f'{e} --> {i} {hash(e)}')\n", " T[i] = e\n", " print(T)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ane --> 3\n", "[None, None, None, 'Ane', None, None, None, None, None, None]\n", "Jon --> 0\n", "['Jon', None, None, 'Ane', None, None, None, None, None, None]\n", "Miren --> 4\n", "['Jon', None, None, 'Ane', 'Miren', None, None, None, None, None]\n", "Asier --> 7\n", "['Jon', None, None, 'Ane', 'Miren', None, None, 'Asier', None, None]\n", "Nora --> 2\n", "['Jon', None, 'Nora', 'Ane', 'Miren', None, None, 'Asier', None, None]\n", "Aiala --> 5\n", "['Jon', None, 'Nora', 'Ane', 'Miren', 'Aiala', None, 'Asier', None, None]\n", "Eneritz --> 3\n", "Kolisio bat dago!!!!\n" ] } ], "source": [ "N=10\n", "T = [None]*N\n", "for e in (\"Ane\",\"Jon\",\"Miren\",\"Asier\",\"Nora\",\"Aiala\",\"Eneritz\",\"Igor\"):\n", " i = hash(e)%N\n", " print(f'{e} --> {i}')\n", " if T[i] != None :\n", " print(\"Kolisio bat dago!!!!\")\n", " break\n", " T[i] = e\n", " print(T)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ane -4629060251877988377 3\n", "Eneritz 2050524501275771253 3\n" ] } ], "source": [ "print(\"Ane\", hash(\"Ane\"), hash(\"Ane\")%N)\n", "print(\"Eneritz\", hash(\"Eneritz\"), hash(\"Eneritz\")%N)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Kolisioen ebazpena I : Taula berdimentsionatu\n", "\n", "* Kolisio bat ematen den bakoitzean, taularen tamaina handitu dezakegu.\n", "* Tamaina handitzean, talka berri baten probabilitatea txikiagoa da.\n", " * Baina... zenbatekoa da probabilitate hori?\n", " * 2450 gako baditugu 1.000.000 tamaina duen taula batetan (400-etik gelaxka bakarra okupatua), guztiz uniformea den hash funtzio bat erabilita ere, talka bat gertatzeko probabilitatea 95% da\n", " * _Birthday Problem_ edo [Urtebetetzeen Ebazkizuna](https://eu.wikipedia.org/wiki/Urtebetetzeen_ebazkizuna)\n", "* $LF \\; \\ll \\; 1$\n", "* Memoriaren erabilpen **oso kaxkarra**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Kolisioen ebazpena II : Helbideratze Irekia (*Open addressing*)\n", "\n", "* $e$ elementu bati taulan dagokion posizioa ez da $i \\; = \\; hash(e)\\; \\% \\; N$ izango\n", "* $i$ posiziotik abiatuko gara\n", " * elementua topatu arte\n", " * gelaxka huts bat topatu arte\n", "\n", "
\"Hash
\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ane --> 3\n", "[None, None, None, 'Ane', None, None, None, None, None, None]\n", "Jon --> 0\n", "['Jon', None, None, 'Ane', None, None, None, None, None, None]\n", "Miren --> 4\n", "['Jon', None, None, 'Ane', 'Miren', None, None, None, None, None]\n", "Asier --> 7\n", "['Jon', None, None, 'Ane', 'Miren', None, None, 'Asier', None, None]\n", "Nora --> 2\n", "['Jon', None, 'Nora', 'Ane', 'Miren', None, None, 'Asier', None, None]\n", "Aiala --> 5\n", "['Jon', None, 'Nora', 'Ane', 'Miren', 'Aiala', None, 'Asier', None, None]\n", "Eneritz --> 3\n", "Eneritz --> 4\n", "Eneritz --> 5\n", "Eneritz --> 6\n", "['Jon', None, 'Nora', 'Ane', 'Miren', 'Aiala', 'Eneritz', 'Asier', None, None]\n", "Igor --> 5\n", "Igor --> 6\n", "Igor --> 7\n", "Igor --> 8\n", "['Jon', None, 'Nora', 'Ane', 'Miren', 'Aiala', 'Eneritz', 'Asier', 'Igor', None]\n" ] } ], "source": [ "N=10\n", "T = [None]*N\n", "for e in (\"Ane\",\"Jon\",\"Miren\",\"Asier\",\"Nora\",\"Aiala\",\"Eneritz\",\"Igor\"):\n", " i = hash(e)%N\n", " print(f'{e} --> {i}')\n", " while T[i] != None :\n", " i = (i+1)%N\n", " print(f'{e} --> {i}')\n", " T[i] = e\n", " print(T)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Helbideratze Irekia**-ren ezaugarriak:\n", "\n", "* Hash balioen banaketa uniformeak garrantzi handia du\n", "* $LF \\; <= \\; 1$\n", "* $LF \\; > 0.7$ → **errendimendu eskasa** → Taula berdimentsionatu\n", "\n", "
\"Hash
\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Kolisioen ebazpena III : Kateatze Banatua (*Separate Chaining*)\n", "\n", "* $e$ elementu bati taulako $i \\; = \\; hash(e)\\; \\% \\; N$ posizioko **zerrenda** dagokio \n", "\n", "
\"Hash
\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[], [], []]\n", "Ane --> 0\n", "[['Ane'], [], []]\n", "Jon --> 2\n", "[['Ane'], [], ['Jon']]\n", "Miren --> 2\n", "[['Ane'], [], ['Jon', 'Miren']]\n", "Asier --> 0\n", "[['Ane', 'Asier'], [], ['Jon', 'Miren']]\n", "Nora --> 0\n", "[['Ane', 'Asier', 'Nora'], [], ['Jon', 'Miren']]\n", "Aiala --> 0\n", "[['Ane', 'Asier', 'Nora', 'Aiala'], [], ['Jon', 'Miren']]\n", "Eneritz --> 0\n", "[['Ane', 'Asier', 'Nora', 'Aiala', 'Eneritz'], [], ['Jon', 'Miren']]\n", "Igor --> 0\n", "[['Ane', 'Asier', 'Nora', 'Aiala', 'Eneritz', 'Igor'], [], ['Jon', 'Miren']]\n" ] } ], "source": [ "N=3\n", "T = [[] for _ in range(N)]\n", "print(T)\n", "for e in (\"Ane\",\"Jon\",\"Miren\",\"Asier\",\"Nora\",\"Aiala\",\"Eneritz\",\"Igor\") :\n", " i = hash(e)%N\n", " print(f'{e} --> {i}')\n", " T[i].append(e)\n", " print(T)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Kateatze Banatua**-ren ezaugarriak:\n", "\n", "* $LF \\; > \\; 1$ izan daiteke\n", "* $LF \\; \\gg 1$ → **errendimendu eskasa** → Taula berdimentsionatu" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "rise": { "autolaunch": false, "footer": "

Konputaziorako Sarrera

", "scroll": true } }, "nbformat": 4, "nbformat_minor": 2 }