{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[Oregon Curriculum Network](http://www.4dsolutions.net/ocn)
\n", "[Discovering Math with Python](Introduction.ipynb)\n", "\n", "\n", "# Chapter 1: WELCOME TO PYTHON\n", "\n", "Have a seat, relax. Enjoy learning new things. \n", "\n", "\"Welcome,\n", "\n", "Lets quote from the [Python website](https://www.python.org/doc/essays/blurb/), why not, about what Python itself actually is:\n", "\n", "\n", "### What is Python? Executive Summary\n", "\n", "Python is an interpreted, object-oriented, high-level programming language with dynamic \n", "semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.\n", "\n", "Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance.\n", "\n", "Python supports modules and packages, which encourages program modularity and code reuse. \n", "The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.\n", "\n", "### Who uses Python? \n", "\n", "Lets check [some testimonials](https://www.python.org/about/quotes/).\n", "\n", "Is Python already installed on your computer? If so, [where does it live](https://docs.python.org/3/tutorial/interpreter.html)?" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# what do they mean by \"high-level built in data structures\"? \n", "# Lets see (notice how we use the pound sign to do comments)\n", "\n", "# square brackets for a list (like an array)\n", "zoo = [\"monkey\", \"bear\", \"otter\"] \n", "\n", "# curly braces with key:value pairs, for a dict (lookup table)\n", "ages = {\"monkey\":3, \"bear\":2.5, \"otter\":1.5} \n", "\n", "# curly braces with elements for a set\n", "tools = {\"hammer\", \"screwdriver\", \"drill\", \"wrench\"}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Think of data structures as like tools. You'll use them to store and retrieve values, to implement more complicated procedures, or algorithms. Data structures hold data.\n", "\n", "As a preview of how we might get that data in the first place, lets skip ahead and start using the Python library. \n", "\n", "Actually, lets start outside the Standard Library using a 3rd party tool, pandas. [Here's a link](http://shop.oreilly.com/product/0636920023784.do) to a well-known book: *Python for Data Analysis* by William McKinney, who started pandas. You can also [watch him on Youtube](https://youtu.be/wdmf1msbtVs).\n", "\n", "Python ships with a Standard Library if it's a full blown Python, consisting of \"namespaces\" you may take for granted are just one import statement away." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "5.0" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import math\n", "hypotenuse = math.hypot(3, 4) # try your own numbers here!\n", "hypotenuse" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "b'69206c696b6520656174696e6720707974686f6e206173206d7563682061732069206c696b6520636f64696e6720696e20707974686f6e'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import binascii\n", "in_hex = binascii.hexlify(b\"i like eating python as much as i like coding in python\")\n", "in_hex # show ascii bytes in terms of the underlying hex codes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However a lot of what makes Python so great are the modules and packages we might get from the Python repository (aka \"Cheese shop\"), more formally known as PyPI, the [Python Package Index](https://pypi.python.org/pypi). \n", "\n", "What you don't find in the Standard Library, you may, in most cases, add to your Python path using [pip3 install](https://docs.python.org/3/installing/index.html). \n", "\n", "If you've used Linux, then you might want to think of *pip3 install* as the *apt-get* of the Python universe. \n", "\n", "However that's not the end of the story. Python distributions such as Anaconda provide their own way of updating and upgrading. You might also find yourself using Git.\n", "\n", "Lets skip ahead to where you've already downloaded pandas, enhancing your Python ecosystem with this powerful free tool." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n", "url = \"https://raw.githubusercontent.com/dariusk/corpora/master/data/animals/dinosaurs.json\"\n", "df = pd.read_json(url)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descriptiondinosaurs
0A list of dinosaurs.Kangnasaurus
1A list of dinosaurs.Lophostropheus
2A list of dinosaurs.Spinophorosaurus
3A list of dinosaurs.Epachthosaurus
4A list of dinosaurs.Coelurosauria
\n", "
" ], "text/plain": [ " description dinosaurs\n", "0 A list of dinosaurs. Kangnasaurus\n", "1 A list of dinosaurs. Lophostropheus\n", "2 A list of dinosaurs. Spinophorosaurus\n", "3 A list of dinosaurs. Epachthosaurus\n", "4 A list of dinosaurs. Coelurosauria" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head() # just show the first five lines of a much taller table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happened just there? Pandas is all about Dataframe objects, which McKinney is hoping will be the basis of a more generalized object that works across computer languages, such as R, Python, and those using the Java Virtual Machine (JVM). \n", "\n", "We just created a Dataframe object by reading in data over the web, from a public stash of dinosaur names out there in the cloud, on Github. The first column isn't really adding any value though. We know it's a list of dinosaurs, no need to say that over and over. A first step after harvesting raw data is usually cleaning and/or massaging it into the shape we need. Lets drop that \"description\" column..." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dinosaurs
0Kangnasaurus
1Lophostropheus
2Spinophorosaurus
3Epachthosaurus
4Coelurosauria
\n", "
" ], "text/plain": [ " dinosaurs\n", "0 Kangnasaurus\n", "1 Lophostropheus\n", "2 Spinophorosaurus\n", "3 Epachthosaurus\n", "4 Coelurosauria" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.drop('description', axis=1, inplace=True) # you won't be able to run this twice, why?\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that our dataframe is this simple, might we convert it to a native list, the data structure we started out with, with the square brackets? Sure we might." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dinos = df[\"dinosaurs\"].tolist() # yep, it's that easy" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['Qiaowanlong',\n", " 'Rhynchosaur',\n", " 'Ningyuansaurus',\n", " 'Palaeolimnornis',\n", " 'Anabisetia',\n", " 'Talarurus',\n", " 'Sphenodontia',\n", " 'Tianyulong',\n", " 'Aepisaurus',\n", " 'Neuquenraptor']" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dinos[10:20] # this is called \"slicing\", getting items 10 to 19" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1449" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(dinos) # how many dinosaurs are we talking about actually?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wow. \n", "\n", "Yes, you could start using this list to harvest pictures, for example, maybe starting with the some [Dinosaur Database](http://dinosaurpictures.org/Spinophorosaurus-pictures).\n", "\n", "![Spinophorosaurus](http://images.dinosaurpictures.org/Spinophorosaurus_NT_5d92.jpg \"Spinophorosaurus\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dinos.sort() # I notice these are not alphabetized. We might sort in place." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['Zhejiangosaurus',\n", " 'Zhongyuansaurus',\n", " 'Zhuchengceratops',\n", " 'Zhuchengosaurus',\n", " 'Zhuchengtyrannus',\n", " 'Zigongosaurus',\n", " 'Zizhongosaurus',\n", " 'Zuniceratops',\n", " 'Zuolong',\n", " 'Zupaysaurus']" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dinos[-10:] # now lets look from 10th from the end, to the end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Slice notation is important because it's used with numpy arrays as well, in addition to pandas DataFrames and ordinary Python lists. Numpy arrays are like Python lists on steriods, meaning they have enhanced capabilities and multiple dimensions.\n", "\n", "Lets go back to an ordinary list and see test this feature more." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['bear', 'otter']" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zoo[1:] # all but 0th element (addressing begins with 0)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'otter'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zoo[-1] # last item in the list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets end this section with a quick look at a numpy array. Python lists may be \"heterogenous\" meaning their elements may be of many different types. Numpy needs its arrays to have all elements the same type, whatever type that may be (floats, ints, complex numbers are all typical)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np # notice how we rename the module as we import it\n", "test_data = np.random.randint(1, 100, size=(5, 5)) # all integers, in a 5x5 matrix" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[41, 40, 75, 45, 18],\n", " [80, 51, 14, 44, 62],\n", " [88, 33, 33, 65, 15],\n", " [61, 60, 78, 9, 64],\n", " [36, 65, 34, 37, 7]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_data" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[1681, 1600, 5625, 2025, 324],\n", " [6400, 2601, 196, 1936, 3844],\n", " [7744, 1089, 1089, 4225, 225],\n", " [3721, 3600, 6084, 81, 4096],\n", " [1296, 4225, 1156, 1369, 49]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_data ** 2 # we can raise all these numbers to a 2nd power in one line!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using numpy arrays can save a lot of looping code. As an example of that, you might want to eyeball [this Notebook](https://github.com/4dsolutions/Python5/blob/master/Mandelbrot%20Set.ipynb) on making Fractals (the Mandelbrot Set in particular).\n", "\n", "## Versions of Python\n", "\n", "Python is still advancing and any given Notebook or tutorial will tend to use a recent Python for its time. When I first wrote these Notebooks, Python 3.6 was the default kernel.\n", "\n", "Will there be a Python 4? Lets [listen to Guido](http://www.curiousefficiency.org/posts/2014/08/python-4000.html):\n", "\n", "#### What are the current expectations for Python 4.0?\n", "\n", "My current expectation is that Python 4.0 will merely be \"the release that comes after Python 3.9\". That's it. No profound changes to the language, no major backwards compatibility breaks - going from Python 3.9 to 4.0 should be as uneventful as going from Python 3.3 to 3.4 (or from 2.6 to 2.7). I even expect the stable Application Binary Interface (as first defined in PEP 384) to be preserved across the boundary.\n", "\n", "Here's my picture of Guido (middle) at Pycon2017, held in Portland, my city of residence at that time. -- Kirby\n", "\n", "\"BDFL\"\n", "\n", "Continue to Chapter 2: [Functions At Work](Functions%20At%20Work.ipynb)
\n", "[Introduction](Introduction.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }