{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\"Barry\n", "\n", "#### PYTHON FOR DATA SCIENCE\n", "\n", "# HASHING THE DINOS\n", "\n", "This Notebook is admittedly a little bit weird in terms of the topics it mixes. We bring in a large number of dinosuar names, in the sense of species, as discovered from the fossil record, and perhaps from other records. However this set of strings only serves to fuel the purely mathematical process of performing ```hashlib.sha256``` magic on each one.\n", "\n", "Think of dino names as passwords. We may consider these insecure but lets not assume the game is that serious. For the purposes of today's exercise, they're secure enough.\n", "\n", "However, just because you've picked a password does not mean a DBA needs to keep it in her database, where it might get stolen. Rather, a hash of your password serves as a deterministic fingerprint. Just save the fingerprint. No one with a wrong password will get through the gate." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "dinos = pd.read_json(\"dino_hash.json\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You'll notice the hashing algorithm has already been applied by the time we import this JSON data. I'll be showing you the Python source code for scripts that aren't Jupyter Notebook scripts, for that part of the pipeline." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dinos.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember how the .loc attribute uses enhanced slice notation (\"enhanced\" in the sense core Python does not support it)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dinos.loc[\"Mo\":\"N\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dinos.dtypes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dinos.index.is_unique" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "code = dinos.loc['Mtapaiasaurus'][0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "code" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "len(code)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dinos.info()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "int(code, base=16)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "0xafe4c2b017ed3996bf5f4f3b937f0ae22e649df2f620787e136ed6bd3ea32e2d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### LAB\n", "\n", "Add a column to dinos that contains the decimal equivalent of the sha256 hash. [Hint](https://stackoverflow.com/questions/12555323/adding-new-column-to-existing-dataframe-in-python-pandas)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### LAB\n", "\n", "Sort ```dinos``` by the column ```sha256``` -- this will be an alphabetical sort. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How about in descending order?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }