{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> All content here is under a Creative Commons Attribution [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) and all source code is released under a [BSD-2 clause license](https://en.wikipedia.org/wiki/BSD_licenses).\n", ">\n", ">Please reuse, remix, revise, and [reshare this content](https://github.com/kgdunn/python-basic-notebooks) in any way, keeping this notice." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Module 2: Overview\n", "\n", "We cover a diverse range of topics: \n", "\n", "* strings, \n", "* lists [also called *vectors* , if you are used to MATLAB or C or Java]\n", "* for-loops.\n", "\n", "They seem unrelated, but they hang together conceptually: they are all about sequences, or collections: characters in a strings, items in a list, and loops to process the sequence. We will formally compare all sequence types later. For now let us just use them.\n", "\n", "At the end, and in between these sections we will cover some topics related to commenting.\n", "\n", "## Preparing for this module\n", "\n", "You should cover these resources (it can take quite some time!)\n", "* https://runestone.academy/runestone/static/fopp/Sequences/toctree.html and go through the entire chapter 6. You can interactively code on this website. Please also answer the \"Check your understanding\" questions as you go.\n", "* https://runestone.academy/runestone/static/fopp/Iteration/toctree.html and go through all of chapter 7, skipping section 7.8 unless you are interested in image analysis 😊\n", "* https://runestone.academy/runestone/static/fopp/Files/toctree.html and only complete up to section 10.5 (reading a file). We will cover writing to files in a later session.\n", "* https://www.w3schools.com/python/python_lists.asp and go through the presented examples on lists.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Strings\n", "\n", "\n", "Strings are some of the simplest objects in Python. In the [prior module](https://yint.org/pybasic01) you created several strings. Now create this string in Python:\n", "\n", "```python\n", "s = \"\"\"Secretly under development for the past three years, Bezos said the \n", "\"Blue Moon\" lander, using a powerful new hydrogen-powered engine generating up\n", "to 10,000 pounds of thrust, will be capable of landing up to 6.5 metric tons \n", "of equipment on the lunar surface.\"\"\"\n", "```\n", "Now use the above string to perform the following actions. Look up the Standard library help files for ``strings`` (like we showed last time) to find the methods required.\n", "\n", "1. Print it to screen completely in upper case.\n", "1. Print it to screen but with lower and uppercase characters switched around.\n", "1. Try the following: ``print(s * 8)``.\n", "1. Try the following: ``print(s + s)``. *Do these two mathematical operations make sense for strings?*\n", "1. What is the length of this string?\n", "1. How many times does the word \"the\" appear in the string?\n", "1. At which position in the string does the word ``Secretly`` appear? *How does this differ with MATLAB?*\n", "1. At which position in the string does the word ``Bezos`` appear?\n", "1. Return a boolean ``True`` or ``False`` if the string ``endswith`` a full stop.\n", "1. Return the string, replacing the instance of 'hydrogen' with 'nuclear'.\n", "1. Replace every space in the above sentence with a newline character, and reprint the sentence to the screen." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above are all effectively done using what are called ***methods***.\n", "\n", "> A method an *attribute* of an *object*.\n", "\n", "In the above, a ``string`` is your *object* and objects have one or more attributes.\n", "\n", "Some tips:\n", "\n", "1. You can get a **list** [we cover lists next!] of all attributes using the ``dir(...)`` command.\n", "\n", "```python\n", "s = \"\"\"Secretly under development for ... the lunar surface.\"\"\"\n", "dir(s)\n", "['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', \n", "'__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__',\n", "'__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', \n", "'__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', \n", "'__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', \n", "'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', \n", "'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', \n", "'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', \n", "'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', \n", "'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', \n", "'swapcase', 'title', 'translate', 'upper', 'zfill']\n", "```\n", "\n", "* You can ignore all the attributes beginning and ending with a double underscore, for example ``__add__``. The attributes which are of practical use to you are the ones starting from ``capitalize``, all the way to the end.\n", "\n", "* You don't need to create a string ``s`` first to get a list of the attributes. You can also use this shortcut:\n", "\n", "```python\n", "dir('')\n", "dir(str)\n", "```\n", "\n", "* If you see an attribute that looks interesting, you can request help on it: ``help(''.startswith)`` or ``help(\"\".startswith)``. Notice the ``''`` in the brackets: it creates an empty string, and then accesses the attribute ``.startswith`` and then asks for help on that. \n", "\n", "* You will get a piece of help text printed to the screen. This is helpful later on when you are comfortable with Python. In the beginning it is more helpful to search in a search engine, which will give you a page with examples. The built-in Python help is usually very very brief.\n", "\n", "\n", "Use this knowledge know to figure out what the difference is between ``s.find`` and ``s.index``. Make sense?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can do what is called *slicing* on a string. Slicing is the ability to get sub-parts of a string:\n", "\n", "```python\n", "word = 'landing'\n", "print(word[1:4])\n", "```\n", "\n", "* How many characters are in the text which is printed on the screen? \n", "* Again, for MATLAB users: how does that differ with what you are used to?\n", "* What is returned with ``word[3:]``?\n", "* What is returned with ``word[3:99]``?\n", "* What is returned with ``word[2:6:3]``?\n", "* And try this: ``word[6:2:-1]``\n", "* And lastly ``word[-4:-7:-1]``" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Speaking of DNA ... create this sequence in Python:\n", "\n", "```python\n", "seq = \"\"\"TAGGGGCCTCCAATTCATCCAACACTCTACGCCTTCTCCAAGAGCTAGTAGGGCACCCTGCAGTTGGAAAGGGAACTATTTCGTAGGGCGAGCCCATACCGTCTCTCTTGCGGAAGACTTAACACGATAGGAAGCTGGAATAGTTTCGAACGATGGTTATTAATCCTAATAACGGAACGCTGTCTGGAGGATGAGTGTGACGGAGTGTAACTCGATGAGTTACCCGCTAATCGAACTGGGCGAGAGATCCCAGCGCTGATGCACTCGATCCCGAGGCCTGACCCGACATATCAGCTCAGACTAGAGCGGGGCTGTTGACGTTTGGGGTTGAAAAAATCTATTGTACCAATCGGCTTCAACGTGCTCCACGGCTGGCGCCTGAGGAGGGGCCCACACCGAGGAAGTAGACTGTTGCACGTTGGCGATGGCGGTAGCTAACTAAGTCGCCTGCCACAACAACAGTATCAAAGCCGTATAAAGGGAACATCCACACTTTAGTGAATCGAAGCGCGGCATCAGAATTTCCTTTTGGATACCTGATACAAAGCCCATCGTGGTCCTTAGACTTCGTGCACATACAGCTGCACCGCACGCATGTGGAATTAGAGGCGAAGTACGATTCCTAGACCGACGTACGATACAACTATGTGGATGTGACGAGCTTCTTTTATATGCTTCGCCCGCCGGACCGGCCTCGCGATGGCGTAG\"\"\"\n", "```\n", "\n", "* What is the first occurrence of ``GATTAG`` in the sequence?\n", "* How many times does ``TTTT`` occur?\n", "* Replace all ``A`` entries with ``T``'s and all ``C`` entries with ``G``'s.\n", "* Reset the string back again, but now try something a bit more advanced: switch all ``T`` entries to ``A`` and all ``A`` entries to ``T``." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lists\n", "\n", "We will cover creating, adding, accessing and using lists of objects.\n", "\n", "You have seen this before: create a list with the square bracket characters: ``[`` and ``]``.\n", "\n", "For example: ``words = ['Mary', 'loved', 'chocolate.']``\n", "\n", "One of the most useful functions in Python is ``len(...)``. Verify that it returns an integer value of 3. Does it have the **type** you expect?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The entries in the list can be mixed types (contrast this to most other programming languages!)\n", " \n", "```python\n", "group = ['yeast', 'bacillus', 994, 'aspergillus' ]\n", "```\n", "\n", "An important test is to check if the list contains something:\n", "\n", "```python\n", "'aspergillus' in group\n", "499 in group\n", "```\n", "\n", "Like we saw with strings, you can use the ``*`` and ``+`` operators:\n", "\n", "```python\n", "group * 3\n", "group + group # might not do what you expect!\n", "group - group # oooops\n", "```\n", "\n", "And like strings, you refer to them based on the position counter of 0:\n", "```python\n", "group[0]\n", "\n", "# but this is also possible:\n", "group[-3]\n", "\n", "# however, is this expected?\n", "group[4]\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists, also have have some methods that you can use. Lists in fact have far fewer methods than strings. Remember how to get a list of methods from the [prior module](https://yint.org/pybasic01)?\n", "\n", "```python\n", "dir(....) # what do you fill in here?\n", "```\n", "\n", "How many methods do you see which you can apply to a list? \n", "\n", "Let's try a few of them out:\n", "1. Try ``append`` a new entry to the ``group`` list you created above: add the entry \"Candida albicans\"\n", "1. Create a new list ``reptiles = ['crocodile', 'turtle']`` and then try: ``group.extend(reptiles)``.\n", "1. Print the list. Remove the ``crocodile`` entry from the list. Print it again to verify it succeeded. \n", "1. Now try to remove the entry again. What happens?\n", "1. Use the following command: ``group.reverse()``, and print the ``group`` variable to the screen.\n", "1. Now try this instead: ``group = group.reverse()`` and print the ``group`` variable to the screen. What happened this time?\n", "1. So you are back to square one: make a new list variable ``group = ['yeast', 'bacillus', 'aspergillus' ]`` and try ``group.sort()``. Notice that ``.sort()``, like the ``.reverse()`` method operate *in-place*: there is no need to assign the output of the action to a new variable. In fact, you cannot.\n", "1. Here's something to be aware of: create ``group = ['yeast', 'bacillus', 994, 'aspergillus' ]``; and now try ``group.sort()``. What does the error message tell you?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists behave like a stack: you can add things to the end using ``.append()`` and you can remove them again with ``.pop()``.\n", "\n", "Think of a stack of plates: last appended, first removed.\n", "\n", "Try it:\n", "```python\n", "species = ['chimp', 'bacillus', 'aspergillus']\n", "species.append('hoooman')\n", "first_out = species.pop()\n", "print(first_out)\n", "```\n", "* What is the length of the list after running this code?\n", "* Try adding a new entry ``arachnid`` between ``chimp`` and ``bacillus`` using the ``.insert()`` command. Print the list to verify it. \n", "> If you don't know how to use the ``.insert()`` method, but you know if exists, you can type ``help([].insert)`` at the command prompt to get a quick help. Or you can search the web which gives more comprehensive help, with examples.\n", "* First use the ``.index()`` function to find the index of \"bacillus\". Then use the ``.pop()`` method to remove it. In other words, do not directly provide ``.pop()`` the integer index to remove. Assign the popped entry to a new variable.\n", "* Overwrite the entry that is currently in the second position with a new value: \"neanderthalensis\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## For loops: iterating\n", "\n", "The ``for`` loop is used to run a piece of code a certain number of times. The basic structure is shown, with an example that prints the integer values from 3 up to, and including, 8:\n", "\n", "```python\n", "# This is one way to do it:\n", "for i in range(3, 9):\n", " # You can have many lines of code in the for-loop.\n", " # As an example, two for-loop statements are shown here.\n", " print(i)\n", " print('-----')\n", " \n", "```\n", "\n", "Before the command ``print(i)`` is a tab character or 4 spaces. Please use spaces, and not tabs. Especially if you will interact with other colleagues writing code. Therefore the letter ``p`` from ``print`` goes exactly under the ``i``. \n", "\n", "That ``i`` is the *loop counter*. The ``range(3, 9)`` tells how many times the loop will iterate.\n", "\n", "Use ``list(range(3, 9))`` to see a list representation of the ``range()`` function. Try creating these ranges:\n", "\n", "* Every integer from 0, up to and including 12.\n", "* Every integer from 0, up to and including 12, in steps of 2\n", "* Every integer from 12 down to and including 0, in steps of -3\n", "* Use a ``range`` command to create the values ``[-10, -40, -70]``\n", "* Values between 0.5, up to and including 9.5, in steps of 0.5\n", "\n", "Notice how these behave exactly as the string slices seen above." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Inside the for loop you can write one or more statements. In the above there are 2 statements and a comment. It is usual to start your comment - if it is required -- with an indent as well. This way it is clear the comment refers to the contents of the for-loop.\n", "\n", "2. You can call the *loop counter* anything you like, as long as it is a valid variable name. Remember those from [last time](https://yint.org/pybasic01)?\n", "\n", "You can loop over many types of objects in Python. Try this:\n", "\n", "```python\n", "reptiles = ['crocodile', 'turtle', 12.34, 'lizard', 'snake', False]\n", "for animal in reptiles:\n", " print('The \"animal\" object is of type ' + str(type(animal)))\n", "```\n", "\n", "and here you can see *dynamic typing* at its finest: the ``animal`` variable is dynamically changing its type in the loop.\n", "\n", "You can also iterate over the entries of a string!\n", "```python\n", "sequence = \"TAGGGGCCTCCA\"\n", "number = 1\n", "for base in sequence:\n", " print('Base number {} is {}'.format(number, base))\n", " number += 1\n", "```\n", "\n", "In the above we introduced another concept: that you can print with the ``.format()`` command. We will see more of this later, but then it won't be a surprise.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that you have seen how you can iterate over the items of a list, let's try to put this to use:\n", "\n", "1. Print the 3-times table, from 1 up till 12, like you learned in school:\n", "> 3 times 1 is 3\n", ">\n", "> 3 times 2 is 6\n", ">\n", "> 3 times 3 is 9\n", "> ...\n", "2. If you haven't done so already, re-write your code to use the ``.format()`` command, as demonstrated above.\n", "3. With 1 line of code find at which position in the list the value of 42 appears: ``[0, 3, 9, 12, 27, 35, 42, 50, 66]``\n", "4. *Based on a real example that I had to code last week*: find the value in the previous list closest to ``19``. Note: don't worry about short code, or efficiency. Just find the answer. In the real example the list was thousands of entries long and was to find the closest time within $\\pm$ 5 minutes. Then you need to worry about efficiency.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Advanced tip:** sometimes you want to iterate through a list, but also know which entry you are iterating on. You can do both simultaneously with the ``enumerate`` command.\n", "\n", "```python\n", "names = ['Leonardo', 'Carl', 'Amiah', 'Yaretzi', 'Destiny', 'Alan']\n", "for index, name in enumerate(names):\n", " print('{} is number {} in the list'.format(name, index+1))\n", "```\n", "\n", "What ``enumerate`` does is to create a ``tuple`` with 2 entries. These two entries are dynamically assigned: the first one is an ``integer`` assigned to ``index`` and the second one is assigned to ``name`` in this example. You are free to choose both variable names.\n", "\n", "Further self-development:\n", "1. Rewrite the code above for DNA bases using the ``enumerate`` function, eliminating the manual ``number`` tracking.\n", "2. Look up the ``reversed`` keyword, which can be used inside ``enumerate`` to run your for-loop in reverse." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Commenting and variable names\n", "\n", "Comments are often as important as the code itself. But it takes time to write them.\n", "\n", "The choice of variable names is related to the topic of comments. In many ways, the syntax of Python makes the code self-documenting, meaning you do not need to add comments at all. But it definitely is assisted by choosing meaningful variable names:\n", ">```python\n", ">for genome in genome_list:\n", "> \n", "```\n", "\n", "This quite clearly shows that we are iterating over the all genomes in some iterable (it could be a list, tuple, or set, for example) container variables of sequenced genomes.\n", "\n", "But here the code structure is identical:\n", ">```python\n", ">for k in seq:\n", "> \n", "```\n", "Later on in the code it might not be clear what ``k`` represents. It is also not clear what ``seq`` is, or contains.\n", " \n", "Comments should be added in these places and cases:\n", "* At the top of your file: name and date, and a few sentences on the purpose of the code. It is also helpful to note which Python version you use, or expect.\n", "* Refer to any publications or internal company reports for algorithms implemented \n", "* Refer to a website if you use any interesting/unusual shortcut code or non-obvious code. This is more for yourself, and your future colleagues. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To cover during the interactive session:\n", "\n", "* Creating code cells in Spyder: ``# %% text here (in Spyder)``\n", "* The differences between lists and strings: *mutable* and *immutable*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# IGNORE this. Execute this cell to load the notebook's style sheet.\n", "from IPython.core.display import HTML\n", "css_file = './images/style.css'\n", "HTML(open(css_file, \"r\").read())" ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 2 }