{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Started\n", "\n", "[//]: # (This is a markdown comment.)\n", "\n", "\n", "You are expected to have strong familiarity with [git](https://git-scm.com/) and know the [basics of python](https://learnxinyminutes.com/docs/python3/). \n", "\n", "We do not expect you to have experience with Jupyter or writing educational resources.\n", "\n", "Note: this guide was not written for Jupyter lab. Jupyter lab is still in beta.\n", "\n", "As you run this notebook I want you to pay close attention to the state issues. Whether the code in the notebook cells runs successfully is dependant on the order in which you run the notebook cells. We want to mitigate this as much as possible so, unlike this notebook, I encourage you to avoid inter-cell dependencies. We are currently working on functions to improve stability." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Index\n", "\n", "+ [Basics](#Basics) \n", " + [Cell Inputs and Outputs](#Cell-Inputs-and-Outputs)\n", " \n", "+ [Importing Data](#Data)\n", "\n", "+ [Tricky Python](#Tricky-Python)\n", " + [Equality vs. Identity](#Equality-vs-Identity)\n", " + [Logical `and` and `or`](#Logical-Operators)\n", " + [`bool` is a subclass of `int`](#bool-is-a-subclass-of-int)\n", " + [Modifying a List in a Loop](#Modifying-a-List-in-a-Loop)\n", " + [Troubleshooting List Slicing](#Troubleshooting-List-Slicing)\n", " + [`*args` vs. `**kwargs`](#*args-vs-**kwargs)\n", "+ [Miscellaneous](#Misc)\n", "\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hotkeys are under the menu item: Help, Keyboard Shortcuts" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "?who # Put a ? in front of a word to access the Jupyter Docs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls # Putting ! in front of a command tells Jupyter that it is a bash command" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You will probably get permission errors when installing on the hub, use --user\n", "!pip3 install plotly --user;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cell Inputs and Outputs" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 + 4" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6\n" ] } ], "source": [ "print(2 + 4) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the print function does not produce a cell output.\n", "As seen below you can use a `;` to suppress output however this only works if there is a cell `Out[]` to suppress. This may seem simple but it can have important ramifications, for example when trying to capture the output of javascript visualizations." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n" ] } ], "source": [ "print(3 + 7);" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "3 + 7; " ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "11" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_ + 5 # takes the output of the previous cell and adds 5. \n", "# It is not taking the result from 3 + 7; because that cell does not have output." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can access cell outputs through the `_n` and `Out[n]` variables:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name '_4' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0m_4\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mOut\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;31m# If you get a \"name '_4' is not defined\" error\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;31m# go back to the cell labelled In[4] and see if it has an output. How can you fix this issue?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;31m# What happens if you rerun a cell you already ran?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name '_4' is not defined" ] } ], "source": [ "_4 == Out[4] \n", "# If you get a \"name '_4' is not defined\" error\n", "# go back to the cell labelled In[4] and see if it has an output. \n", "# How can you fix this issue?\n", "# What happens if you rerun a cell you already ran?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Be careful when hardcoding inputs/outputs into your notebooks because the indexing depends on the order in which you execute the cells.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Out" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are shorthands for the previous cells' output:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('previous cells output:', _)\n", "print('the one before the previous cell :', __)\n", "print('the output three cells back :', ___)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly you can access the cell inputs:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "In[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('previous cells input:', _i)\n", "print('the one before the previous cell :', _ii)\n", "print('the input three cells back :', _iii)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%history # history of inputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use the %load magic for general imports.\n", "\n", "Pandas is important for data manipulation in Python, check out this [pandas guide](http://nbviewer.jupyter.org/github/rasbt/python_reference/blob/master/tutorials/things_in_pandas.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read remote files from an url" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This code will not actually run because foo is not a valid url. \n", "# You can try replacing foo.xlsx with https://education.alberta.ca/media/3680582/diploma-multiyear-auth-list-annual.xlsx\n", "\n", "excelUrl = 'foo.xlsx'\n", "csvUrl = 'bar.csv'\n", "\n", "pd.read_excel('url') \n", "pd.read_csv('url') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Advanced File Reading Example, csv into a pandas dataframe\n", "This will read through csv files whose urls differ by year, add them all to a pandas dataframe, and then select specific columns of the dataframe.\n", "All data here is open source." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "df = pd.DataFrame()\n", "\n", "startYear = 1995\n", "endYear = 1997 # The last year is not included, so if it was 2017 it means we include the 2016 collection but not 2017.\n", "\n", "for year in range(startYear, endYear):\n", " file = 'https://s3.ca-central-1.amazonaws.com/open-data-ro/NSERC/NSERC_GRT_FYR' + str(year) + '_AWARD.csv.gz'\n", " df = df.append(pd.read_csv(file, \n", " compression='gzip', # .gz file extension because it is compressed to speed up the transfer\n", " usecols = [1, 2, 3, 4, 5, 7, 9, 11, 12, 13, 17, 28], # only add these columns to the dataframe\n", " encoding='latin-1'\n", " )\n", " ) \n", " print(year) # Print each year as it reads it so you can see the progress.\n", " \n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.head() # Show the contents at the head of the dataframe.\n", "#df # show the entire dataframe, commented out to keep things tidy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you are unfamiliar with data science in general, you will have a huge learning curve that cannot be covered in a simple tutorial. Feel free to reach out on the developer slack channel if you need help. Google is your friend." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tricky Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Equality vs Identity\n", "\n", "\"==\" for equality, \"is\" for identity!" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a is b, False\n", "a == b, True\n" ] } ], "source": [ "a = 'hello world'\n", "b = 'hello world'\n", "print('a is b,', a is b)\n", "print('a == b,', a == b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "Identity does not imply equality." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a is a, True\n", "a == a, False\n" ] } ], "source": [ "a = float('nan')\n", "print('a is a,', a is a)\n", "print('a == a,', a == a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "Python keeps an array of small integer objects (i.e., integers between -5 and 256, [see the doc](https://docs.python.org/2/c-api/int.html#PyInt_FromLong))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "256 is 257-1 True\n", "257 is 258-1 False\n", "-5 is -6+1 True\n", "-7 is -6-1 False\n" ] } ], "source": [ "print('256 is 257-1', 256 is 257-1)\n", "print('257 is 258-1', 257 is 258 - 1)\n", "print('-5 is -6+1', -5 is -6+1)\n", "print('-7 is -6-1', -7 is -6-1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Logical Operators\n", "\n", "`a or b == a if a else b`\n", "\n", "`a and b == b if a else a`" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3 * 9 = 27\n" ] } ], "source": [ "result = (3 or 5) * (7 and 9)\n", "print('3 * 9 =', result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `bool` is a subclass of `int`" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "isinstance(True, int): True\n", "True + True: 2\n", "3*True + True: 4\n", "3*True - False: 3\n" ] } ], "source": [ "print('isinstance(True, int):', isinstance(True, int))\n", "print('True + True:', True + True)\n", "print('3*True + True:', 3*True + True)\n", "print('3*True - False:', 3*True - False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Modifying a List in a Loop" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 3, 5]\n" ] } ], "source": [ "a = [1, 2, 3, 4, 5]\n", "for i in a:\n", " if not i % 2:\n", " a.remove(i)\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[4, 5]\n" ] } ], "source": [ "b = [2, 4, 5, 6]\n", "for i in b:\n", " if not i % 2:\n", " b.remove(i)\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hopefully this example will make it clear why this happens:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 2\n", "1 5\n", "2 6\n", "[4, 5]\n" ] } ], "source": [ "b = [2, 4, 5, 6]\n", "for index, item in enumerate(b):\n", " print(index, item)\n", " if not item % 2:\n", " b.remove(item)\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Troubleshooting List Slicing " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get an index error as expected:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "ename": "IndexError", "evalue": "list index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mmy_list\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmy_list\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mIndexError\u001b[0m: list index out of range" ] } ], "source": [ "my_list = [1, 2, 3, 4, 5]\n", "print(my_list[5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No index error, hard to troubleshoot:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n" ] } ], "source": [ "my_list = [1, 2, 3, 4, 5]\n", "print(my_list[5:])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `*args` vs `**kwargs`\n", "Both of these are used to allow function inputs of arbitrary length.\n", "\n", "Here are the differences between the two of them:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "type of args: \n", "args contents: (0, 1, 'a', 'b', 'c')\n", "1st argument: 0\n" ] } ], "source": [ "def a_func(*args):\n", " print('type of args:', type(args))\n", " print('args contents:', args)\n", " print('1st argument:', args[0])\n", "\n", "a_func(0, 1, 'a', 'b', 'c')" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "type of kwargs: \n", "kwargs contents: {'a': 1, 'b': 2, 'c': 3, 'd': 4}\n", "value of argument a: 1\n" ] } ], "source": [ "def b_func(**kwargs):\n", " print('type of kwargs:', type(kwargs))\n", " print('kwargs contents: ', kwargs)\n", " print('value of argument a:', kwargs['a'])\n", " \n", "b_func(a=1, b=2, c=3, d=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Misc\n", "You won't necessarily need this." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Select Jupyter Cells](https://stackoverflow.com/questions/39613822/how-to-select-current-cell-with-javascript-in-jupyter)\n", "\n", "[What is the difference between window, screen, and document in Javascript?](https://stackoverflow.com/questions/9895202/what-is-the-difference-between-window-screen-and-document-in-javascript)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%javascript\n", "document.getElementById('').contentWindow" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read in an HTML file\n", "from IPython.display import HTML\n", "with open('index.html', 'r') as f:\n", " inputForm = f.read()\n", "HTML(inputForm)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# JS variable that python can access\n", "IPython.notebook.kernel.execute('query = \"AU='.concat(typed, '\"'));" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Iterate through and execute the jupyter cells in order. Does it by index, not necessarily starting with the top cell.\n", "# textContent call to figure out which cell has the content you want to run\n", "for (i = 1; i < 15; i++) { \n", " IPython.notebook.get_cell(i).execute();\n", " }" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }