{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "GNJ8DeRtAzaB" }, "source": [ "##### Python for High School (Summer 2022)\n", "\n", "* [Table of Contents](PY4HS.ipynb)\n", "* \n", "* [![nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/4dsolutions/elite_school/blob/master/Py4HS_August_26_2022.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "iYlfU09rAzaI" }, "source": [ "# Looking Back (and Ahead)\n", "\n", "Welcome to the last Chapter of Python for High School. Our chapters document a chronological sequence in that my workflow was to prepare Notebooks ahead of each meetup.\n", "\n", "However the Topics need follow no specific order, except some are prerequisite to the others, in a kind of directed graph. You may find yourself \"running up the down escalator\" (a figure of speech) if you tackle some levels before others." ] }, { "cell_type": "markdown", "metadata": { "id": "iYlfU09rAzaI" }, "source": [ "##
Review: Number Theory
\n", "\n", "For example, appreciation for the elegance of the RSA algorithm (public key crypto) deepens with one's appreciation for [Euler's Theorem](https://brilliant.org/wiki/eulers-theorem/), the one that generalizes [Fermat's Little Theorem](https://brilliant.org/wiki/fermats-little-theorem/).\n", "\n", "Fermat's Little Theorem: \n", "\n", "Let $p$ be a prime number, and a be any integer. Then $a^{p}−a$ is always divisible by $p$." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(518301258916440, 0)" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = 5 # the base\n", "p = 23 # try any prime here\n", "divmod(b**p - b, p) # no remainder, true if p is prime" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3, 0)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "divmod(36, 12)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "11920928955078120" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "5**23 - 5" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "518301258916440" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(5**23 - 5) // p" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The converse of Fermat's Little Theorem is not true however. Some numbers $p$ pass the \"Fermat Test\", no matter the base $b$, and yet are not prime." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(236161757013279006596205856527083708100196146653181706890446891437099112647726025559639680169088609951680950265194714570201302725740616911013462999989900903652916737725765458649007151093057800234250721538430156379493080234813016314639586372502409927294685286807480615872096596435637874175022915074729196884816391323389905397446092098430290978560132791452898294579400935163804447799655417920,\n", " 0)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "divmod(b**561 - b, 561) # 561 is a Carmichael Number, any b will do" ] }, { "cell_type": "markdown", "metadata": { "id": "iYlfU09rAzaI" }, "source": [ "##Skill Sets
\n", "\n", "The skill sets we have been most developing in the foreground encompass:\n", "\n", "* Jupyter -- how we keep our Notes and sometimes publish them\n", " - MarkDown\n", " - $LaTeX$\n", " - % magics\n", "* python -- there's always a next level\n", "\n", "In the background we've been looking at:\n", "\n", "* numpy -- the number crunching king of vectorized operations\n", "* matplotlib -- plotly imitates Matlab's way of doing things\n", "* pandas -- sophisticated DataFrames, isomorphic to spreadsheets\n", "* sympy -- computer algebra, high precision numbers" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "application/json": { "cell": { "!": "OSMagics", "HTML": "Other", "SVG": "Other", "bash": "Other", "capture": "ExecutionMagics", "debug": "ExecutionMagics", "file": "Other", "html": "DisplayMagics", "javascript": "DisplayMagics", "js": "DisplayMagics", "latex": "DisplayMagics", "markdown": "DisplayMagics", "perl": "Other", "prun": "ExecutionMagics", "pypy": "Other", "python": "Other", "python2": "Other", "python3": "Other", "ruby": "Other", "script": "ScriptMagics", "sh": "Other", "svg": "DisplayMagics", "sx": "OSMagics", "system": "OSMagics", "time": "ExecutionMagics", "timeit": "ExecutionMagics", "writefile": "OSMagics" }, "line": { "alias": "OSMagics", "alias_magic": "BasicMagics", "autoawait": "AsyncMagics", "autocall": "AutoMagics", "automagic": "AutoMagics", "autosave": "KernelMagics", "bookmark": "OSMagics", "cat": "Other", "cd": "OSMagics", "clear": "KernelMagics", "colors": "BasicMagics", "conda": "PackagingMagics", "config": "ConfigMagics", "connect_info": "KernelMagics", "cp": "Other", "debug": "ExecutionMagics", "dhist": "OSMagics", "dirs": "OSMagics", "doctest_mode": "BasicMagics", "ed": "Other", "edit": "KernelMagics", "env": "OSMagics", "gui": "BasicMagics", "hist": "Other", "history": "HistoryMagics", "killbgscripts": "ScriptMagics", "ldir": "Other", "less": "KernelMagics", "lf": "Other", "lk": "Other", "ll": "Other", "load": "CodeMagics", "load_ext": "ExtensionMagics", "loadpy": "CodeMagics", "logoff": "LoggingMagics", "logon": "LoggingMagics", "logstart": "LoggingMagics", "logstate": "LoggingMagics", "logstop": "LoggingMagics", "ls": "Other", "lsmagic": "BasicMagics", "lx": "Other", "macro": "ExecutionMagics", "magic": "BasicMagics", "man": "KernelMagics", "matplotlib": "PylabMagics", "mkdir": "Other", "more": "KernelMagics", "mv": "Other", "notebook": "BasicMagics", "page": "BasicMagics", "pastebin": "CodeMagics", "pdb": "ExecutionMagics", "pdef": "NamespaceMagics", "pdoc": "NamespaceMagics", "pfile": "NamespaceMagics", "pinfo": "NamespaceMagics", "pinfo2": "NamespaceMagics", "pip": "PackagingMagics", "popd": "OSMagics", "pprint": "BasicMagics", "precision": "BasicMagics", "prun": "ExecutionMagics", "psearch": "NamespaceMagics", "psource": "NamespaceMagics", "pushd": "OSMagics", "pwd": "OSMagics", "pycat": "OSMagics", "pylab": "PylabMagics", "qtconsole": "KernelMagics", "quickref": "BasicMagics", "recall": "HistoryMagics", "rehashx": "OSMagics", "reload_ext": "ExtensionMagics", "rep": "Other", "rerun": "HistoryMagics", "reset": "NamespaceMagics", "reset_selective": "NamespaceMagics", "rm": "Other", "rmdir": "Other", "run": "ExecutionMagics", "save": "CodeMagics", "sc": "OSMagics", "set_env": "OSMagics", "store": "StoreMagics", "sx": "OSMagics", "system": "OSMagics", "tb": "ExecutionMagics", "time": "ExecutionMagics", "timeit": "ExecutionMagics", "unalias": "OSMagics", "unload_ext": "ExtensionMagics", "who": "NamespaceMagics", "who_ls": "NamespaceMagics", "whos": "NamespaceMagics", "xdel": "NamespaceMagics", "xmode": "BasicMagics" } }, "text/plain": [ "Available line magics:\n", "%alias %alias_magic %autoawait %autocall %automagic %autosave %bookmark %cat %cd %clear %colors %conda %config %connect_info %cp %debug %dhist %dirs %doctest_mode %ed %edit %env %gui %hist %history %killbgscripts %ldir %less %lf %lk %ll %load %load_ext %loadpy %logoff %logon %logstart %logstate %logstop %ls %lsmagic %lx %macro %magic %man %matplotlib %mkdir %more %mv %notebook %page %pastebin %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %pip %popd %pprint %precision %prun %psearch %psource %pushd %pwd %pycat %pylab %qtconsole %quickref %recall %rehashx %reload_ext %rep %rerun %reset %reset_selective %rm %rmdir %run %save %sc %set_env %store %sx %system %tb %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode\n", "\n", "Available cell magics:\n", "%%! %%HTML %%SVG %%bash %%capture %%debug %%file %%html %%javascript %%js %%latex %%markdown %%perl %%prun %%pypy %%python %%python2 %%python3 %%ruby %%script %%sh %%svg %%sx %%system %%time %%timeit %%writefile\n", "\n", "Automagic is ON, % prefix IS NOT needed for line magics." ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%lsmagic" ] }, { "cell_type": "markdown", "metadata": { "id": "iYlfU09rAzaI" }, "source": [ "##Topics
\n", "\n", "\n", "Mostly, though, we have organized our thinking around Topics. \n", "\n", "Such as:\n", "\n", "* Cryptography\n", " - Number Theory\n", " - Primes versus Composities\n", " - Totative and Totient\n", "* Permutations\n", " - Finite Groups\n", " - Group Theory\n", "* Fractals\n", " - ASCII and Unicode Art\n", " - CropCircle Tractors\n", "* Logarithms\n", " - exponential function\n", " - bases\n", "* Graph Theory\n", " - adjacency matrix\n", " - weighted and directional graphs\n", " - polyedrons\n", "* Vectors\n", " - XYZ (\"Earthling Math\"), goes with unit cube\n", " - IVM (\"Martian Math\"), goes with unit tetrahedron\n", "* Ray Tracing\n", " - [POV-Ray](https://www.povray.org)\n", " - [Blender](https://www.blender.org/)\n", "* Machine Learning\n", " - history\n", " - future\n", " - relation to AI\n", "\n", "There's no need to be exhaustive and/or all-inclusive.\n", "\n", "Then come the skillsets we might optionally cultivate in the background. \n", "\n", "We might tackle one or more IDEs (vim, vscode, spyder...), remember our HTML (Requests package), and practice our Regular Expressions. \n", "\n", "Actually, we could add Regexes to the foreground list of skills.\n", "\n", "$LaTeX$ has been a subskill under Jupyter. To maximize our Markdown, we want to master typesetting convertional mathematical expressions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What are some other Topics we might reasonably suggest have a footprint in high school? I mention Machine Learning above, but we're only getting to it now, in the last chapter.\n", "\n", "Some of these build on Topics already addressed:\n", "\n", "* ml (machine learning)\n", " - sklearn (native Python)\n", " - tensorflow (from Google)\n", " - pytorch (from Facebook)\n", " - other (keep scanning)\n", "* sql (structured query language)\n", "* python (there's always a next level)\n", "* geospatial data (in the cards for us all)\n", "* [jupyter books](https://jupyterbook.org/en/stable/intro.html) (extending notebook skills)\n", "\n", "In Preview mode then, lets talk about: *Machine Learning*." ] }, { "cell_type": "markdown", "metadata": { "id": "E5TMW6XaAzaJ" }, "source": [ "##Preview: What is ML?
\n", "\n", "Does Machine Learning belong in high school?\n", "\n", "Since the dawn of history, a core aim of both logic and superstition has been to predict the future in some way. We have always needed to divine the future, and yet encounter limits on predictability.\n", "\n", "The way a casino is set up, the house will win money in the long run, but given individuals may prove the exception, a fact the individuals count on when risking betting against the house.\n", "\n", "The age old project to distill our intuitions about such concepts of \"likelihood\", \"expectations\", \"confidance\", into a science is part of the heritage of data science. Statistics married Computer Science, and Data Science was their offspring.\n", "\n", "In ordinary, everyday language we may ask with what measure of confidance do I expect X to happen? Does my measure of confidance change with respect to some \"by when?\". \n", "\n", "I might expect X to happen someday with 100% confidance, yet with no confidance at all about X happening tomorrow or next week. \n", "\n", "These may sound like obvious truths, however in getting clear on such topics, we learn to think more logically. We may also come to invent new methods of computation. \n", "\n", "We don't stop with some hazy notion of \"average\"; we break it out into mean, medium and mode, and ways of computing each. Standard deviation follows, and variance. The concept of a Bell Curve begins to emerge.\n", "\n", "We hope to intelligently anticipate (rather than wildly or blindly guess), based on the many models we have developed. \n", "\n", "By model, we could mean a simulation. Does our simulation run as a computer program? Not necessarily. People were simulating complex systems long before they had invented silicon chip computers.\n", "\n", "We could also mean, my \"model\", some actual Python object developed for us by one of the model makers (KM, SVM...).\n", "\n", "Who are the model makers?\n", "\n", "The model makers are well-known and/or still experimental algorithms. We categorize them in various ways, such as into supervised and unsupervised. A good example of a working model would be a recommendation engine attached to a website. \"Based on your choice of books so far, a next one of interest might be...\". \n", "\n", "A model need not be mysterious and opaque. A simple line through a bunch of dots well summarizes many of them. However some models attain their high powers of prediction at the expense of being able to give us a set of rules.\n", "\n", "[A Quick List of ML Algorithms](https://howtolearnmachinelearning.com/articles/a-quick-list-of-machine-learning-algorithms/)\n", "\n", "More concretely, in the supervised learning setting, we show the features (X) and the right answers (y) to a \"learner\" or \"recognizer\" known as a neural net, consider deep after N layers. The feedback of getting it wrong or right is used to \"weight\" the \"neurons\" through a process known as \"gradient descent\" (uses calculus, partial derivatives, finds a downward path for an error function).\n", "\n", "The above paragraph puts into words a lot of number crunchy math, which numpy is good at. What we get back from such a process is a Python object that has powers of prediction. But to what degree? How good is it? Should we turn it loose in the real world?" ] }, { "cell_type": "markdown", "metadata": { "id": "mN0CH7VfAzaK" }, "source": [ "##THE ML PROCESS
\n", "\n", "Think about how you yourself learn from experience? Having a strong sense of a right answer or how you want things to go, is motivational. Machine Learning sets up a similar feedback loop in the algebra, especially in a supervised setting.\n", "\n", "The main question before us is one of reliability. Does our model do the job? Is a newer model an improvement?\n", "\n", "One standard approach, to test reliability, is as follows:\n", "\n", "* give the right answers on a percentage of the total data (training)\n", "* try the model against the balance (the rest), never seen\n", "* do more testing\n", "\n", "As you explore Sci-Kit Learn, you will find this standard approach is baked in, through time-saving functions that automatically divvy the data into training and testing sets.\n", "\n", "The process is akin to shuffling cards as the same total data may be divvyed into testing and training in multiple ways, allowing for more averaging and perhaps fine tuning. We looked at similar bootstrapping ideas in connection with the Confidance Interval (Seaborn barplot etc.).\n", "\n", "What's left out of the above description is the whole matter of selecting the appropriate ML engine (model maker), with fine tuned hyperparameters. \n", "\n", "That process itself suggests a feedback loop: why not try several ML engines on the same data and find by experiment which seems to work best. Indeed, that's a thing.\n", "\n", "You will find sklearn support \"ensembles\" of model makers to work together. Sometimes they each reach a conclusion then hold a vote. Random Forests made of Decision Trees behave in ensemble fashion." ] }, { "cell_type": "markdown", "metadata": { "id": "K9eNE2ugAzaK" }, "source": [ "##Cluster Finding Machines
\n", "\n", "Finding clusters is sometimes an \"unsupervised\" form of ML, meaning we do not have a dataset with \"right answers\" for training purposes. We are looking for patterns in target data without knowing ourselves what they are beforehand. \n", "\n", "Other times, we have our clusters defined for training data, in which case the process is \"supervised\".\n", "\n", "For example, we might want to cluster tweets and short reports into \"topics\" (what the cluster around). We don't know in advance what those topics will be. Here's a tutorial \n", "\n", "When learning about cluster finding, we also learn about cluster making. Here's a data science teacher mixing up some clustered data, in order to discover if SVM (one of the model makers) will find them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Citing the tutorial at [General Abstract Nonsense](https://generalabstractnonsense.com/2017/03/A-quick-look-at-Support-Vector-Machines/):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.31178367, 0.72900392],\n", " [ 0.21782079, -0.8990918 ],\n", " [-2.48678065, 0.91325152],\n", " [ 1.12706373, -1.51409323],\n", " [ 1.63929108, -0.4298936 ],\n", " [ 2.63128056, 0.60182225],\n", " [-0.33588161, 1.23773784],\n", " [ 0.11112817, 0.12915125],\n", " [ 0.07612761, -0.15512816],\n", " [ 0.63422534, 0.810655 ]])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.svm import SVC\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "# from mlxtend.plotting import plot_decision_regions\n", "\n", "# create some V shaped data\n", "np.random.seed(6)\n", "# normalized floats, 200 rows, 2 columns\n", "X = np.random.randn(200, 2)\n", "X[:10,:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The rule here is: y is True if the right column exceeds the absolute value of the left. Throw away negative signs on the left and ask if the resulting number is less than what's to its right. y is False if not." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ True, False, False, False, False, False, True, True, False,\n", " True])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create a \"predict me\" vector\n", "y = X[:, 1] > np.absolute(X[:, 0])\n", "y[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can recast y as 1s and negative 1s using the ever-useful `np.where`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, -1, -1, -1, -1, -1, 1, 1, -1, 1])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = np.where(y, 1, -1)\n", "y[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Based on the provided data, we find the Support Vector Machine is pretty good at teasing apart the y=1 from the y=-1, but will not get it right 100% of the time.\n", "\n", "We're not requiring `mlxtend.plotting` to run this Notebook, but Google colab has it, so here's a screen shot:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "SVC(C=10.0, gamma=0.5, random_state=0)" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# train a Support Vector Classifier using the rbf kernel\n", "svm = SVC(kernel='rbf', random_state=0, gamma=0.5, C=10.0)\n", "svm.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make up some new X points. Point = any number of features (columns)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.99, 1.1 ],\n", " [ 0.12, 0.33],\n", " [-3. , 1. ]])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = np.array([[-0.99, 1.1],\n", " [.12, .33],\n", " [-3, 1]])\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What would the model predict?" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, 1, -1])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "svm.predict(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What would we consider correct?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ True, True, False])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_y = data[:, 1] > np.absolute(data[:, 0])\n", "new_y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets do more study into the reliability of svm in this instance." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "from sklearn.metrics import accuracy_score" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, -1, -1, -1, -1, -1, 1, 1, -1, 1, 1, -1, -1, -1, 1, -1, 1,\n", " -1, -1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, 1, -1, 1, -1, -1,\n", " -1, -1, 1, -1, -1, -1, 1, -1, 1, -1, -1, -1, -1, -1, 1, -1, -1,\n", " -1, -1, -1, -1, 1, -1, -1, -1, 1, -1, -1, -1, -1, -1, -1, -1, 1,\n", " -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, 1, -1, -1,\n", " 1, -1, -1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, -1, 1, 1,\n", " 1, -1, -1, -1, -1, -1, 1, 1, -1, -1, 1, -1, 1, -1, -1, -1, 1,\n", " -1, -1, -1, 1, -1, -1, -1, -1, -1, 1, -1, -1, -1, 1, 1, -1, 1,\n", " 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, -1, -1, 1, 1, 1, -1,\n", " -1, -1, -1, -1, 1, 1, -1, 1, 1, -1, -1, -1, -1, -1, 1, -1, 1,\n", " -1, 1, -1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, 1, -1,\n", " 1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "svm.predict(X)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, -1, -1, -1, -1, -1, 1, 1, -1, 1, 1, -1, -1, -1, 1, -1, 1,\n", " -1, -1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, 1, -1, 1, -1, -1,\n", " -1, -1, 1, -1, -1, -1, 1, -1, 1, -1, -1, -1, -1, -1, 1, -1, -1,\n", " -1, -1, -1, -1, 1, -1, -1, -1, 1, -1, -1, -1, -1, -1, -1, -1, 1,\n", " -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, 1, -1, -1,\n", " 1, -1, -1, -1, -1, 1, -1, -1, -1, 1, 1, 1, -1, -1, -1, 1, 1,\n", " 1, -1, -1, -1, -1, -1, 1, 1, -1, -1, 1, -1, 1, -1, -1, -1, 1,\n", " -1, -1, 1, 1, -1, -1, -1, -1, -1, 1, -1, -1, -1, 1, 1, -1, 1,\n", " 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1,\n", " -1, -1, -1, -1, 1, 1, -1, 1, 1, -1, -1, -1, -1, -1, 1, -1, 1,\n", " -1, 1, -1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1, 1, -1,\n", " 1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1, -1, -1])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.99" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "accuracy_score(y, svm.predict(X))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, False, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, False, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y == svm.predict(X)" ] }, { "cell_type": "markdown", "metadata": { "id": "uSlEdW1A20Io" }, "source": [ "##Categorizing Machines
\n", "**Clickbait Versus Headlines**\n", "\n", "We will need pandas for this next example, with its amazing ability to read csv files over the web, taking a URL as input.\n", "\n", "The file below is actually just a txt file, a csv with no headers, and it's delimited by tab. No problemo:" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "id": "GL3nsXNYAzaL" }, "outputs": [], "source": [ "df_clickbait = pd.read_csv(\"https://raw.githubusercontent.com/sixhobbits/sklearn-intro/master/clickbait.txt\", sep=\"\\t\", header=None)" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " | 1 | \n", "
---|---|
count | \n", "10000.000000 | \n", "
mean | \n", "0.500000 | \n", "
std | \n", "0.500025 | \n", "
min | \n", "0.000000 | \n", "
25% | \n", "0.000000 | \n", "
50% | \n", "0.500000 | \n", "
75% | \n", "1.000000 | \n", "
max | \n", "1.000000 | \n", "
\n", " | 0 | \n", "1 | \n", "
---|---|---|
0 | \n", "Egypt's top envoy in Iraq confirmed killed | \n", "0 | \n", "
1 | \n", "Carter: Race relations in Palestine are worse ... | \n", "0 | \n", "
2 | \n", "After Years Of Dutiful Service, The Shiba Who ... | \n", "1 | \n", "
3 | \n", "In Books on Two Powerbrokers, Hints of the Future | \n", "0 | \n", "
4 | \n", "These Horrifyingly Satisfying Photos Of \"Baby ... | \n", "1 | \n", "
... | \n", "... | \n", "... | \n", "
9995 | \n", "What Is Your Weirdest Fear | \n", "1 | \n", "
9996 | \n", "Felipe Massa wins 2008 French Grand Prix | \n", "0 | \n", "
9997 | \n", "Bottled water concerns health experts | \n", "0 | \n", "
9998 | \n", "Death of Nancy Benoit rumour posted on Wikiped... | \n", "0 | \n", "
9999 | \n", "US Dept. of Justice IP address blocked after '... | \n", "0 | \n", "
10000 rows × 2 columns
\n", "\n", " | Headline | \n", "Category | \n", "
---|---|---|
0 | \n", "Egypt's top envoy in Iraq confirmed killed | \n", "0 | \n", "
1 | \n", "Carter: Race relations in Palestine are worse ... | \n", "0 | \n", "
2 | \n", "After Years Of Dutiful Service, The Shiba Who ... | \n", "1 | \n", "
3 | \n", "In Books on Two Powerbrokers, Hints of the Future | \n", "0 | \n", "
4 | \n", "These Horrifyingly Satisfying Photos Of \"Baby ... | \n", "1 | \n", "
Preview: Linear Regression
\n", "\n", "One of the many famous datasets, which you can find on Kaggle, is [tips](https://www.kaggle.com/datasets/jsphyg/tipping).\n", "\n", "Assuming we have seaborn, that's a dataset we can access locally." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 203 }, "id": "BtAf43Va20Iq", "outputId": "5e76994f-847a-4efd-c3db-8c2bb994a7d5" }, "outputs": [ { "data": { "text/html": [ "\n", " | total_bill | \n", "tip | \n", "sex | \n", "smoker | \n", "day | \n", "time | \n", "size | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "16.99 | \n", "1.01 | \n", "Female | \n", "No | \n", "Sun | \n", "Dinner | \n", "2 | \n", "
1 | \n", "10.34 | \n", "1.66 | \n", "Male | \n", "No | \n", "Sun | \n", "Dinner | \n", "3 | \n", "
2 | \n", "21.01 | \n", "3.50 | \n", "Male | \n", "No | \n", "Sun | \n", "Dinner | \n", "3 | \n", "
3 | \n", "23.68 | \n", "3.31 | \n", "Male | \n", "No | \n", "Sun | \n", "Dinner | \n", "2 | \n", "
4 | \n", "24.59 | \n", "3.61 | \n", "Female | \n", "No | \n", "Sun | \n", "Dinner | \n", "4 | \n", "
\n", " | Name | \n", "Park | \n", "State | \n", "Country | \n", "Duration | \n", "Speed | \n", "Height | \n", "VertDrop | \n", "Length | \n", "Yr_Opened | \n", "Inversions | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Top Thrill Dragster | \n", "Cedar Point | \n", "Ohio | \n", "USA | \n", "60 | \n", "120.0 | \n", "420.0 | \n", "400.00 | \n", "2800.00 | \n", "2003 | \n", "0 | \n", "
1 | \n", "Superman The Escape | \n", "Six Flags Magic Mountain | \n", "California | \n", "USA | \n", "28 | \n", "100.0 | \n", "415.0 | \n", "328.10 | \n", "1235.00 | \n", "1997 | \n", "0 | \n", "
2 | \n", "Millennium Force | \n", "Cedar Point | \n", "Ohio | \n", "USA | \n", "165 | \n", "93.0 | \n", "310.0 | \n", "300.00 | \n", "6595.00 | \n", "2000 | \n", "0 | \n", "
3 | \n", "Goliath | \n", "Six Flags Magic Mountain | \n", "California | \n", "USA | \n", "180 | \n", "85.0 | \n", "235.0 | \n", "255.00 | \n", "4500.00 | \n", "2000 | \n", "0 | \n", "
4 | \n", "Titan | \n", "Space World | \n", "Kitakyushu | \n", "Japan | \n", "180 | \n", "71.5 | \n", "166.0 | \n", "178.00 | \n", "5019.67 | \n", "1994 | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
71 | \n", "Oblivion | \n", "Alton Towers | \n", "Alton | \n", "England | \n", "75 | \n", "68.0 | \n", "65.0 | \n", "180.00 | \n", "1222.00 | \n", "1998 | \n", "0 | \n", "
72 | \n", "Stunt Fall | \n", "Warner Bros. Movie World | \n", "San Martin de la Vega | \n", "Spain | \n", "92 | \n", "65.6 | \n", "191.6 | \n", "177.00 | \n", "1204.00 | \n", "2002 | \n", "3 | \n", "
73 | \n", "Hayabusa | \n", "Tokyo SummerLand | \n", "Tokyo | \n", "Japan | \n", "108 | \n", "60.3 | \n", "137.8 | \n", "124.67 | \n", "2559.10 | \n", "1992 | \n", "0 | \n", "
74 | \n", "Top Gun | \n", "Paramount Canada's Wonderland | \n", "Vaughan | \n", "Canada | \n", "125 | \n", "56.0 | \n", "102.0 | \n", "93.00 | \n", "2170.00 | \n", "1995 | \n", "5 | \n", "
75 | \n", "Wild Beast | \n", "Paramount Canada's Wonderland | \n", "Vaughan | \n", "Canada | \n", "150 | \n", "56.0 | \n", "415.0 | \n", "78.00 | \n", "3150.00 | \n", "1981 | \n", "0 | \n", "
76 rows × 11 columns
\n", "\n", " | Name | \n", "Park | \n", "State | \n", "Country | \n", "
---|---|---|---|---|
0 | \n", "Tower of Terror | \n", "Dreamworld | \n", "Coomera | \n", "Australia | \n", "
1 | \n", "Top Gun | \n", "Paramount Canada's Wonderland | \n", "Vaughan | \n", "Canada | \n", "
2 | \n", "Wild Beast | \n", "Paramount Canada's Wonderland | \n", "Vaughan | \n", "Canada | \n", "
3 | \n", "Oblivion | \n", "Alton Towers | \n", "Alton | \n", "England | \n", "
4 | \n", "Fujiyama | \n", "Fuji-Q Highlands | \n", "FujiYoshida-shi | \n", "Japan | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
71 | \n", "Alpengeist | \n", "Busch Gardens Williamsburg | \n", "Virginia | \n", "USA | \n", "
72 | \n", "Apollo's Chariot | \n", "Busch Gardens Williamsburg | \n", "Virginia | \n", "USA | \n", "
73 | \n", "HyperSonic XLC | \n", "Paramount's Kings Dominion | \n", "Virginia | \n", "USA | \n", "
74 | \n", "Volcano | \n", "Paramount's Kings Dominion | \n", "Virginia | \n", "USA | \n", "
75 | \n", "Coaster Thrill Ride | \n", "Puyallup Fair | \n", "Washington | \n", "USA | \n", "
76 rows × 4 columns
\n", "Review: Python's Context Manager Construct
\n", "\n", "We went down this rabbit hole at least once. Or call it \"following a trail\" (they all branch into each other, in a large dark forest).\n", "\n", "The context manager is part of everyday Python. There's a `contextlib` library that leverages their power." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "class CM:\n", " \n", " def __init__(self, myname): # initializer\n", " self.name = myname\n", " \n", " def __enter__(self):\n", " print(\"with me('') as it: do something with it\")\n", " return self\n", " \n", " def __exit__(self, *oops):\n", " print(\"done doing\")" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "with me('') as it: do something with it\n", "I'm inside the context -- __enter__ has run\n", "talker\n", "I will exit now...\n", "done doing\n" ] } ], "source": [ "with CM(\"talker\") as it:\n", " print(\"I'm inside the context -- __enter__ has run\")\n", " print(it.name)\n", " print(\"I will exit now...\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets reuse that same basic skeleton to talk to databases." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "import sqlite3 as sql\n", "\n", "class CM:\n", " \n", " def __init__(self, myname):\n", " self.name = myname\n", " \n", " def __enter__(self):\n", " try:\n", " self.conn = sql.connect(self.name)\n", " except:\n", " print(\"No connection\")\n", " raise\n", " return self\n", " \n", " def __exit__(self, *oops): # trap error data if any\n", " self.conn.close()\n", " if oops[0]: # hoping oops is (None, None, None)\n", " return False # something exceptional happened\n", " return True # all OK" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Afterburner\n", "Alpengeist\n", "American Eagle\n", "Apollo's Chariot\n", "Batman Knight Flight\n", "Beast\n", "Blue Streak\n", "Boss\n", "Cannon Ball\n", "Canyon Blaster\n", "Chang\n", "Cheetah\n", "Coaster Thrill Ride\n", "Colossus\n", "Comet\n", "Corkscrew\n", "Deja Vu\n", "Desperado\n", "Fujiyama\n", "Goliath\n", "Great American Scream Machine\n", "Hangman\n", "Hayabusa\n", "Hercules\n", "Hurricane\n", "HyperSonic XLC\n", "Incredible Hulk\n", "Invertigo\n", "Iron Wolf\n", "Kong\n", "Kraken\n", "Magnum XL-200\n", "Mamba\n", "Manhattan Express\n", "Mean Streak\n", "Medusa\n", "Millennium Force\n", "Mind Eraser\n", "New Mexico Rattler\n", "Nitro\n", "Oblivion\n", "Orient Express\n", "Phantom's Revenge\n", "Raging Bull\n", "Rattler\n", "Riddler's Revenge\n", "Scream!\n", "Screamin' Eagle\n", "Silver Bullet\n", "Son Of Beast\n", "Starliner\n", "Steel Dragon 2000\n", "Steel Eel\n", "Steel Force\n", "Stunt Fall\n", "Superman - Ride Of Steel\n", "Superman The Escape\n", "T2\n", "Tennessee Tornado\n", "Texas Giant\n", "Thunder Dolphin\n", "Thunderbolt\n", "Timber Wolf\n", "Titan\n", "Top Gun\n", "Top Thrill Dragster\n", "Tower of Terror\n", "Viper\n", "Volcano\n", "Whizzer\n", "Wild Beast\n", "Wild One\n", "Wild Thing\n", "Wildfire\n", "X\n", "Xcelerator\n", "--- GN\n", "--- Connection closed\n" ] }, { "data": { "text/plain": [ "