{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Hash Taulak ( _Associative array_ , _Elkartze Arrayak_ )\n", "\n", " * _gako_ -etatik _balio_ -etarako aplikazioa\n", " * Python-eko `dict` eta `set` objektuak\n", " * _gako_ -aren gain _Hash funtzio_ bat aplikatzean, arrayari dagokion indize bat lortu.\n", "\n", "\"Hash" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Hash Funtzioa\n", "\n", " * Edozein informazioren gain aplikatzean, tamaina finkoko balio bat (zenbaki bat) lortu.\n", " * Informazio baten _sinadura_ (laburpen) digitala.\n", " * Propietateak:\n", " * Hash balioen banaketa uniformea\n", " * Berdinak diren bi objektuen hash balioak berdinak dira\n", " * Kontrakoak ez du zertan egia izan behar. \n", " * Adibide bat:\n", " * [SHA256:](https://andersbrownworth.com/blockchain/hash) _informazioa_ → 256 bit" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Hash Taulen bi erabilpen\n", "\n", "* **Hash Map** (hiztegiak) : `gako` → `(gako,balio)`\n", "* **Hash Set** (multzoak) : `gako` → `(gako,)`\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Hash Taulen inplementazioa\n", "\n", " * $N$ tamainako Taula/Array/Zerrenda bat erabili.\n", " * Okupazio faktorea (*load factor*): $LF : {n_{stored}}/{N}$\n", " * $g$ gako bati taulan dagokion posizioa: $i \\; = \\; hash(g)\\; \\% \\; N$\n", " * *Talkak* edo *kolisioak* gerta daitezke\n", " * **Kolisioa**: Bi $g_1 , g_2$ gako ezberdinei $i_1 = i_2$ posizio berdina esleitzea\n", " * $hash(g_1) = hash(g_2)$ balio berdina izan dezakete.\n", " * $hash(g_1) \\ne hash(g_2)$ izanda ere, $(hash(g_1)\\, \\% \\, N) = (hash(g_2)\\, \\% \\, N)$ gerta daiteke." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]\n", "[None, None, ('Ane', 656354364), None, None, None, None, None, None, None, None, None, None, None, None, None, None]\n", "[None, None, ('Ane', 656354364), None, None, None, None, None, None, None, None, None, None, None, None, ('Unai', 656455384), None]\n", "[None, None, ('Ane', 656354364), None, None, ('Raul', 656455384), None, None, None, None, None, None, None, None, None, ('Unai', 656455384), None]\n" ] } ], "source": [ "N=17\n", "T = [None]*N\n", "print(T)\n", "g,b = \"Ane\",656354364\n", "i = hash(g)%N\n", "T[i] = (g,b)\n", "print(T)\n", "g,b = \"Unai\",656455384\n", "i = hash(g)%N\n", "T[i] = (g,b)\n", "print(T)\n", "g,b = \"Raul\",656455384\n", "i = hash(g)%N\n", "T[i] = (g,b)\n", "print(T)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5440056905610189769 9\n", "1199183044890590629 9\n" ] } ], "source": [ "print(hash(\"kaixo\"), hash(\"kaixo\") % 10)\n", "print(hash(\"kaixoooooooooooooooo\"), hash(\"kaixoooooooooooooooo\") % 10)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Kolisioen ebazpena I : Taula berdimentsionatu\n", "\n", "* Kolisio bat ematen den bakoitzean, taularen tamaina handitu dezakegu.\n", "* Tamaina handitzean, talka berri baten probabilitatea txikiagoa da.\n", " * Baina... zenbatekoa da probabilitate hori?\n", " * 2450 gako baditugu 1.000.000 tamaina duen taula batetan (400-etik gelaxka bakarra okupatua), guztiz uniformea den hash funtzio bat erabilita ere, talka bat gertatzeko probabilitatea 95% da\n", " * _Birthday Problem_ edo [Urtebetetzeen Ebazkizuna](https://eu.wikipedia.org/wiki/Urtebetetzeen_ebazkizuna)\n", "* $LF \\; \\ll \\; 1$\n", "* Memoriaren erabilpen **oso kaxkarra**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Kolisioen ebazpena II : Helbideratze Irekia (*Open addressing*)\n", "\n", "* $g$ gako bati taulan dagokion posizioa ez da $i \\; = \\; hash(g)\\; \\% \\; N$ izango\n", "* $i$ posiziotik abiatuko gara\n", " * elementua topatu arte\n", " * gelaxka huts bat topatu arte\n", "\n", "\"Hash\n" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[None, None, None, None, None, None, None, None, None, None]\n", "[None, None, None, ('Ane', 656354364), None, None, None, None, None, None]\n", "[None, None, None, ('Ane', 656354364), ('Unai', 656455384), None, None, None, None, None]\n", "[None, None, None, ('Ane', 656354364), ('Unai', 656455384), ('Raul', 656455384), None, None, None, None]\n" ] } ], "source": [ "N=10\n", "T = [None]*N\n", "print(T)\n", "g,b = \"Ane\",656354364\n", "i = hash(g)%N\n", "T[i] = (g,b)\n", "print(T)\n", "g,b = \"Unai\",656455384\n", "i = hash(g)%N\n", "T[i] = (g,b)\n", "print(T)\n", "g,b = \"Raul\",656455384\n", "i = hash(g)%N\n", "# behar adina aldiz...\n", "i = (i+1)%N\n", "T[i] = (g,b)\n", "print(T)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Helbideratze Irekia**-ren ezaugarriak:\n", "\n", "* Hash balioen banaketa uniformeak garrantzi handia du\n", "* $LF \\; <= \\; 1$\n", "* $LF \\; > 0.7$ → **errendimendu eskasa** → Taula berdimentsionatu\n", "\n", "\"Hash\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Kolisioen ebazpena III : Kateatze Banatua (*Separate Chaining*)\n", "\n", "* $g$ gako bati taulako $i \\; = \\; hash(g)\\; \\% \\; N$ posizioko **zerrenda** dagokio \n", "\n", "\"Hash\n" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[], [], []]\n", "[[('Ane', 656354364)], [], []]\n", "[[('Ane', 656354364)], [('Unai', 656455384)], []]\n", "[[('Ane', 656354364), ('Raul', 656455384)], [('Unai', 656455384)], []]\n" ] } ], "source": [ "N=3\n", "T = [[] for _ in range(N)]\n", "print(T)\n", "g,b = \"Ane\",656354364\n", "i = hash(g)%N\n", "T[i].append((g,b))\n", "print(T)\n", "g,b = \"Unai\",656455384\n", "i = hash(g)%N\n", "T[i].append((g,b))\n", "print(T)\n", "g,b = \"Raul\",656455384\n", "i = hash(g)%N\n", "T[i].append((g,b))\n", "print(T)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Kateatze Banatua**-ren ezaugarriak:\n", "\n", "* $LF \\; > \\; 1$ izan daiteke\n", "* $LF \\; \\gg 1$ → **errendimendu eskasa** → Taula berdimentsionatu" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "rise": { "autolaunch": false, "footer": "

Konputaziorako Sarrera

", "scroll": true } }, "nbformat": 4, "nbformat_minor": 2 }