{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> All content here is under a Creative Commons Attribution [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) and all source code is released under a [BSD-2 clause license](https://en.wikipedia.org/wiki/BSD_licenses).\n", ">\n", ">Please reuse, remix, revise, and [reshare this content](https://github.com/kgdunn/python-basic-notebooks) in any way, keeping this notice." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Course overview\n", "\n", "This is the first module of several (11, 12, 13, 14, 15 and 16), which refocuses the course material in the [prior 10 modules](https://github.com/kgdunn/python-basic-notebooks) in a slightly different way. It places more emphasis on\n", "\n", "* dealing with data: importing, merging, filtering;\n", "* calculations from the data;\n", "* visualization of it.\n", "\n", "In short: ***how to extract value from your data***." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Module 11: Overview\n", "\n", "This is the first of 6 modules. We cover\n", "\n", "* Printing output to the screen\n", "* Creating variables\n", "* Types of variables\n", "* Basic calculations with variables\n", "* Lists\n", "* Tips on commenting your code and choosing variable names\n", "\n", "**Requirements before starting**\n", "\n", "* Have a basic Python installation that works as expected." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Printing to the screen\n", "\n", "In all the cases below, we show an example. Copy these into the empty cell below, edit the code where necessary, then hit the Run button (or Ctrl-Enter)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "print('Hi, my name is ______.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "long_string = \"\"\"If you really want to write paragraphs,\n", "and paragraphs of text, you do it with the triple quotes. Try it\"\"\"\n", "\n", "print(long_string)\n", "```\n", "\n", "* Verify how the above variable ``long_string`` will be printed. Does Python put a line break where you expect it?\n", "* Can you use single quotes instead of double quotes ?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also create longer strings in Python using the bracket construction. Try this:\n", "\n", "```python\n", "print('Here is my first line.',\n", " 'Then the second.',\n", " 'And finally a third.',\n", " 'But did you expect that?')\n", "```\n", "The reason for this is stylistic. Python, unlike other languages, has some recommended rules, \n", "which we will introduce throughout these modules. One of these rules is that you don't exceed 79 characters per line (more recently we see source code going to 99 characters per line as a guide).\n", "\n", "It helps to keep your code width narrow: you can then put two or three code files side-by-side on a widescreen monitor." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating variables\n", "\n", "\n", "We already saw above how a variable was created: ``long_string = \"\"\"If you really..... Try it.\"\"\"``.\n", "\n", "You've created variables plenty of times in other programming languages; almost always with an \"=\". We prefer to refer to \"=\" as the \"assignment\" operator; not as \"equals\".\n", "\n", "What goes on the left hand side of the assignment must be a '*valid variable name*'.\n", "\n", "Which of the following are valid variable names, or valid ways to create variables in Python?\n", "\n", "```python\n", "my_integer = 3.1428571 \n", "_my_float = 3.1428571 # variables like this have a special use in Python\n", "__my_float__ = 3.1428571 # variables like this have a special use in Python\n", "€value = 42.95 \n", "cost_in_€ = 42.95\n", "cost_in_dollars = 42.95 \n", "42.95 = cost_in_dollars \n", "dollar.price = 42.95 \n", "favourite#tag = '#like4like'\n", "favourite_hashtag = '#일상'\n", "x = y = z = 1\n", "a, b, c = 1, 2, 3 # tuple assignment\n", "a, b, c = (1, 2, 3)\n", "i, f, s = 42, 12.94, 'spam'\n", "from = 'my lover'\n", "raise = 'your glass'\n", "pass = 'you shall not'\n", "fail = 'you will not'\n", "True = 'not False'\n", "pay day = 'Thursday'\n", "NA = 'not applicable' # for R users\n", "a = 42; # for MATLAB users: semi-colons are never required in Python\n", "A = 13 # like most languages, Python is also case-sensitive \n", "```\n", "\n", "What's the most interesting idea/concept you learned from the above examples?\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variable types\n", "\n", "\n", "Do you know C, C++ or Java? With those languages each variable must have a ``type``, which is must match what is on the right hand side of the \"=\" sign. In these languages, you **must** write something like:\n", "\n", "```c\n", "int a, b; // first declare your variables\n", "float result;\n", "a = 5; // then you get to use them\n", "b = 2;\n", "result = a / b; // you will get an unexpected value if you had defined \"result\" as \"int\"\n", "```\n", "\n", "**It is different in Python**, where there is _dynamic typing_. Python figures it out from the context:\n", "```python\n", "a = 5\n", "b = 25.1\n", "result = a / b\n", "```\n", "\n", "Repeat these lines of Python code below, then add the following:\n", "```python\n", "type(a)\n", "type(result)\n", "```\n", "\n", "What is the output you see?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each variable always has a **type**. Usually you know what the type is, because you created the variable yourself at some point.\n", "\n", "But on occasion you use someone else's code and you get back an answer that you don't know the type of. Then it is useful to check it with the ``type(...)`` function.\n", "\n", "Try these lines in Python:\n", "```python\n", "type(99)\n", "type(99.)\n", "type(9E9)\n", "type('It\\'s raining cats and dogs today!') # How can you rewrite this line better?\n", "type(r'Brexit will cost you £8E8. Thank you.')\n", "type(['this', 'is', 'a', 'vector', 'of', 7, 'values'])\n", "type([])\n", "type(4 > 5)\n", "type(True)\n", "type(False)\n", "type(None)\n", "type({'this': 'is what is called a', 'dictionary': 'variable!'}) # we learn about dictionaries later\n", "type(('this', 'is', 'called', 'a', 'tuple')) # tuples are another data type in Python\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can convert most variables to a string type, as follows: ``str(...)``\n", "\n", "Try these conversions to make sure you get what you expect:\n", "```python\n", "str(45)\n", "type(str(45))\n", "str(92.5)\n", "str(None)\n", "str(print)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculations with variables\n", "\n", "The next step is to perform some calculations with the variables. \n", "\n", "The standard expressions exist in Python:\n", "\n", "| Operation | Symbol |\n", "|----------------|--------|\n", "| Addition | + |\n", "| Subtraction | - |\n", "| Multiplication | * |\n", "| Division | / |\n", "| Power of | ** |\n", "\n", "\n", "Please note: \"power of\" is not with the ^ operator, and can mislead you. Try this:\n", "* ``print(2 ^ 4)``\n", "* ``print(2**4)``" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given the above, use Python as a calculator to find the values of these expressions:\n", "\n", "If ``a = 5`` and ``b = 9``\n", "\n", "* ``a / b``\n", "* What type is the result of the above expression?\n", "* ``a * b`` \n", "* What type is the result of the above expression?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The distance $d$ travelled by an object falling for time $t$, given in seconds, is $$d=\\frac {1}{2}gt^{2}$$ where $g$ is the gravitational constant = $9.8\\, \\text{m.s}^{-2}$. Calculate the distance that you will travel in free-fall gravity in 10 seconds:\n", "\n", "```python\n", "t = ____ # seconds\n", "d = ____ # meters\n", "print('The distance fallen is ' + str(d) + ' meters.')\n", "\n", "# The better way to do the above in recent versions of Python is to use an \"f-string\" (format string):\n", "print(f'The distance fallen is {d} meters after {t} seconds.')\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try it now the other way around: the time taken for an object to fall is: $$ t= \\sqrt {\\frac {2d}{g}}$$\n", "\n", "We will introduce the ``sqrt`` function in the next section, but for now you can also calculate the square root using a power of 0.5: as in $\\sqrt{x} = x^{0.5}$.\n", "\n", "Using that knowledge, how long will it take for an object to fall from the top of the building you are currently in:\n", "\n", "```python\n", "# Creates a string value in variable 'd'. Verify that it is a string type.\n", "d = input('The height of the building, in meters, which I am currently in is: ') \n", "d = float(d) # convert the string variable to a floating point value\n", "t = ____ # seconds\n", "\n", "# You might also want to investigate the \"round\" function at this point\n", "# to improve the output for variable t.\n", "\n", "print('The time for an object to fall from a building',\n", " 'of ' + str(d) + ' meters tall is ' + str(t) + \\\n", " ' seconds.')\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python, like other languages, has the order of operations rules (same as the PEMDAS rules you might have learned in school):\n", "\n", "1. **P**arentheses (round brackets)\n", "2. **E**xponentiation or powers, left to right\n", "3. **M**ultiplication and Division, left to right\n", "4. Addition and **S**ubtraction, left to right\n", "\n", "So what is the result of these statements?\n", "```python\n", "a = 1 + 3 ** 2 * 4 / 2\n", "b = 1 + 3 ** (2 * 4) / 2\n", "c = (1 + 3) ** 2 * 4 / 2\n", "```\n", "\n", "While it is good to know these rules, the general advice is to always use brackets to clearly show your actual intention. \n", "> Never leave the reader of your code guessing: someone will have to maintain your code after you; including yourself, a few years/months later 😉 " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Test yourself*: Write code for the following: \n", "\n", ">Divide the sum of a and _b_ by the product of c and *d*, and store the result in x.\n", "\n", "You can start with the code below, and edit it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a, b, c, d = 2, 3, 5, 6\n", "# write your code here\n", "x = _\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above operators return results which are either ``int`` or ``float``. \n", "\n", "There are another set of operators which return ***bool***ean values: ``True`` or ``False``. We will use these frequently when we make decisions in our code. For example:\n", "> if \\_\\_<condition> \\_\\_ then \\_\\_<action\\>\\_\\_\n", "\n", "We cover **if-statements** in a later module:\n", "\n", "But for now, try out these ```` statements:\n", "\n", "```python\n", "3 < 5\n", "5 < 3\n", "4 <= 4\n", "4 <= 9.2\n", "5 == 5\n", "5. == 5 # float on the left, and int on the right. Does it matter?\n", "5. != 5 # does that make sense?\n", "True != False\n", "False < True\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Related to these operators are some others, which you can use to combine up: ``and`` and ``not`` and ``or``\n", "\n", "Try these out. What do you get in each case?\n", "\n", "```python\n", "True and not False\n", "True and not(False)\n", "True and True\n", "not(False) or False\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the quadratic equation $$ax^2 + bx + c=0$$ the short-cut solution is given by $$ x= -\\frac{b}{2a}$$\n", "but only if two conditions are met: $b^2 - 4ac=0$ and $a \\neq 0$.\n", "\n", "Verify if you can use this short-cut solution for these combinations:\n", "\n", "* ``a, b, c = 3, -1, 2 # using tuple-assignment here to create these 3 variables in 1 line of code!``\n", "* ``a, b, c = 0, -1, 2`` \n", "* ``a, b, c = 3, 6, 3`` \n", "\n", "Write the single line of Python code that will return ``True`` if you can use the shortcut, or ``False`` if you cannot." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Built-in constants and mathematical functions\n", "\n", "You will certainly need to calculate logs, exponentials, square roots, or require the value of $e$ or $\\pi$ at some point.\n", "\n", "In this section we get a bit ahead, and load a Python library to provide this for us. Libraries - we will see later - are a collection of functions and variables that pre-package useful tools. Libraries can be large collections of code, and are for special purposes, so they are not loaded automatically when you launch Python.\n", "\n", "\n", "In MATLAB you can think of *Toolboxes* as being equivalent; in R you have *Packages*; in C++ and Java you also use the word *Library* for the equivalent concept.\n", "\n", "In Python, there are several libraries that come standard, and one is the ``math`` library. Use the ``import`` command to load the library. The ``math`` library can be used as follows:\n", "\n", "```python\n", "import math\n", "radius = 3 # cm\n", "area_of_circle = math.pi * radius**2\n", "print('The area of the circle is ' + str(area_of_circle))\n", "```\n", "\n", "Now that you know how to use the ``math`` library, it is worth searching what else is in it:\n", "\n", "> https://www.google.com/search?q=python+math+library\n", "\n", "All built-in Python libraries are documented in the same way. Searching this way usually brings up the link near the top. Make sure you look at the documentation for Python version 3.x.\n", "\n", "Now that you have the documentation ready, use functions from that `math` library to calculate:\n", "\n", "* the *ceiling* of a number, for example ``a = 3.7``\n", "* the *floor* of a number, for example ``b = 3.7``\n", "* the *absolute* value of ``c = -2.9``\n", "* the log to the base $e$ of ``d = 100``\n", "* the log to the base 10 of ``e = 100``\n", "* the Golden ratio ${\\dfrac {1+{\\sqrt {5}}}{2}}$ \n", "* check that the factorial of $9! = 9 \\times 8 \\times 7 \\ldots \\times 1$ is equal to 362880\n", "* and finally, check that the Stirling's approximation, $n! \\approx \\sqrt{2\\pi n} \\cdot n^n e^{-n}$ for a factorial matches closely [you will use 4 different methods from the ``math`` library to calculate this!] (``math.sqrt``, ``math.pi``, ``math.exp`` and ``math.pow``.)\n", "\n", "```python\n", "\n", "print('The true value of 9! is ' + ___ + ', while the Stirling approximation is ' + ___)\n", "```\n", "\n", "* verify that the cosine of ``g`` = $2\\pi$ radians is indeed 1.0\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mini-exercise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The population of a country could be approximated by the formula $$ p(t) = \\dfrac{197 273 000}{1 + e^{− 0.03134(t − 1913)}}$$\n", "where the time $t$ is in years.\n", "\n", " * What is the population in 1913?\n", " * What is the population in 2013?\n", " * Does the population grow and grow without bounds, or does it reach steady state, stabilizing at some constant value eventually?\n", " \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lists\n", "\n", "We will cover creating, adding, accessing and using lists of objects.\n", "\n", "A list is a basic Python type: it is a collection of objects.\n", "\n", "Create a list with the square bracket characters: ``[`` and ``]``.\n", "\n", "For example: ``words = ['Mary', 'loved', 'chocolate.']``\n", "\n", "Try it: one of the most useful functions in Python is ``len(...)``. Verify that it returns an integer value of 3. Does it have the **type** you expect?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The entries in the list can be mixed types (contrast this to most other programming languages where all entries in the list must have the same type!)\n", " \n", "```python\n", "group = ['yeast', 'bacillus', 994, 'aspergillus' ]\n", "```\n", "\n", "An important test is to check if the list contains something. Try these pieces of code below.\n", "\n", "```python\n", "'aspergillus' in group\n", "499 in group\n", "```\n", "\n", "Like we saw with strings, you can use the ``*`` and ``+`` operators:\n", "\n", "```python\n", "group * 3\n", "group + group # might not do what you expect!\n", "group - group # oooops\n", "```\n", "\n", "And like strings, you refer to them based on the position counter of 0:\n", "```python\n", "group[0]\n", "\n", "# but this is also possible:\n", "group[-3]\n", "\n", "# however, is this expected?\n", "group[4]\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists, also have have some methods that you can use. Lists in fact have far fewer methods than strings. To get a list of methods: \n", "\n", "```python\n", "dir(___) # and then fill in an example of the object you want to know the methods\n", "dir('sometext')\n", "dir([]) # even an empty list is OK\n", "```\n", "\n", "How many methods do you see which you can apply to a list? \n", "\n", "Let's try a few of them out:\n", "1. Try ``append`` a new entry to the ``group`` list you created above: add the entry \"Candida albicans\"\n", "1. Create a new list ``reptiles = ['crocodile', 'turtle']`` and then try: ``group.extend(reptiles)``.\n", "1. Print the list. Remove the ``crocodile`` entry from the list. Print it again to verify it succeeded. \n", "1. Now try to remove the entry again. What happens?\n", "1. Use the following command: ``group.reverse()``, and print the ``group`` variable to the screen.\n", "1. Now try this instead: ``group = group.reverse()`` and print the ``group`` variable to the screen. What happened this time?\n", "1. So you are back to square one: make a new list variable ``group = ['yeast', 'bacillus', 'aspergillus' ]`` and try ``group.sort()``. Notice that ``.sort()``, like the ``.reverse()`` method operate *in-place*: there is no need to assign the output of the action to a new variable. In fact, you cannot.\n", "1. Here's something to be aware of: create ``group = ['yeast', 'bacillus', 994, 'aspergillus' ]``; and now try ``group.sort()``. What does the error message tell you?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists behave like a stack, or a queue: you can add things to the end of the queue using ``.append()`` and you can remove them again with ``.pop()``.\n", "\n", "Think of a stack of plates: last appended, first removed.\n", "\n", "Try it:\n", "```python\n", "species = ['chimp', 'bacillus', 'aspergillus']\n", "species.append('hoooman')\n", "first_out = species.pop()\n", "print(first_out)\n", "```\n", "* What is the length of the list after running this code?\n", "* Try adding a new entry ``arachnid`` between ``chimp`` and ``bacillus`` using the ``.insert()`` command. Print the list to verify it. \n", "> If you don't know how to use the ``.insert()`` method, but you know if exists, you can type ``help([].insert)`` at the command prompt to get a quick help. Or you can search the web which gives more comprehensive help, with examples.\n", "* First use the ``.index()`` function to find the index of \"bacillus\". Then use the ``.pop()`` method to remove it. In other words, do not directly provide ``.pop()`` the integer index to remove. Assign the popped entry to a new variable.\n", "* Overwrite the entry that is currently in the second position with a new value: \"neanderthalensis\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Commenting \n", "\n", "Comments are often as important as the code itself. But it takes time to write them.\n", "\n", " \n", "Comments should be added in these places and cases:\n", "* At the top of your file: name and date, and a few sentences on the purpose of the code. It is also helpful to note which Python version you use, or expect.\n", "* Refer to any publications or internal company reports for algorithms implemented \n", "* Refer to a website if you use any interesting/unusual shortcut code or non-obvious code. This is more for yourself, and your future colleagues. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variable names\n", "The choice of variable names is related to the topic of comments. In many ways, the syntax of Python makes the code self-documenting, meaning you do not need to add comments at all. But it definitely is assisted by choosing meaningful variable names:\n", "\n", "```python\n", "for genome in genome_list:\n", " command_to_do_something_with_genome_goes_here\n", "```\n", "\n", "This quite clearly shows that we are iterating over the all genomes in some iterable (it could be a list, tuple, or set, for example) container variables of sequenced genomes.\n", "\n", "Now compare it with this code:\n", "\n", "```python\n", "for k in seq:\n", " \n", "```\n", "\n", "It is not clear what ``k`` represents. It is also not clear what ``seq`` is either. Choosing good variable names helps the reader.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" }, "toc": { "base_numbering": "1", "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "349px" }, "toc_section_display": true, "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }