{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "1. Lists\n", "\n", "**At the end of this section, you will be able to:**\n", "* create a list\n", "* add items to a list\n", "* extract/inspect items in a list\n", "* perform basic list operations\n", "* use built-in functions on lists " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists resemble strings: both are a **sequence** of values. But whereas a string was a sequence of characters, a list can contain values of any type. These values we call **elements** or **items**." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "this_is_a_string = 'Hello Newman'\n", "this_is_a_list = ['Hello','Jerry',42,3.1415]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 1.1 Introduction\n", "\n", "Consider the first sentence (represented as a string) from Franz Kafka's book 'The Trial'. Image for a moment we would have assigned the whole book to the `trial` variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "trial = \"Someone must have slandered Josef K., for one morning, without having done anything truly wrong, he was arrested. \"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**A string is a sequence of characters.**\n", "\n", "How can we select specific words from this book? For the sentence above, it might seem more natural for humans to describe it as a series of words, rather than as a series of characters. Say, we want to access the first word in our sentence. If we enter:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "first_word = trial[0]\n", "print(first_word)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**`split()` converts this string to a list of words.**\n", "\n", "Python only prints the first character of our sentence. (Think about this if you do not understand why.) We can, however, transform our sentence into a list of words (represented by strings) using the split() function as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "words = trial.split()\n", "print(words)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The variable `trial` now holds the first line of Kafka's Trial as a list. Each element in this list is now (approximately) a word. Run the code below to see the difference." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "first_word = words[0]\n", "print(first_word)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We apply the `split()` function to the variable `trial` and we assign the result of the function (we call this the 'return value' of the function) to the new variable `words`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Exercise: split the first line of Genesis and assign the list to the variable `bible`\n", "# In the beginning God created the heaven and the earth." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**`split()` takes other delimiters as arguments.**\n", "\n", "By default, the split() function in Python will split strings on the **spaces** between consecutive words and it will return a list of words. However, we can pass an **argument** to `split()` that specifies explicitly the string we would like to split on. \n", "\n", "This is often useful for **parsing** information from a CSV file (or other structured data). For example, the line below has the structure of the [Google Ngram](https://books.google.com/ngrams). The Ngram Viewer allows the researcher to explore long-term cultural [trends](https://books.google.com/ngrams/graph?content=king%2C+queen&case_insensitive=on&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t4%3B%2Cking%3B%2Cc0%3B%2Cs0%3B%3Bking%3B%2Cc0%3B%3BKing%3B%2Cc0%3B%3BKING%3B%2Cc0%3B.t4%3B%2Cqueen%3B%2Cc0%3B%2Cs0%3B%3BQueen%3B%2Cc0%3B%3Bqueen%3B%2Cc0%3B%3BQUEEN%3B%2Cc0). The source data of this corpus comprises the yearly word and document frequencies for ~ 5 million books printed between 1500 and 2008. The lines are separated by hard returns (\"\\n\"). Each line holds four elements: word, year, word frequency, document frequency.\n", "\n", "Using the split function, we can easily **parse** this file, i.e. recognize and read its content. First, we split the string by their \"\\n\" and then each line by their \"\\t\" which is stands for a \"tab\".\n", "\n", "In the code block below, we will split a string on tabs, instead of spaces. Do you get the syntax?" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "queen\t1900\t20394\t3435\n", "queen\t1901\t23340\t2935\n", "queen\t1902\t23120\t3035\n", "['queen\\t1900\\t20394\\t3435', 'queen\\t1901\\t23340\\t2935', 'queen\\t1902\\t23120\\t3035']\n", "['queen', '1900', '20394', '3435']\n" ] } ], "source": [ "# note the `\\n` symbol\n", "google_ngram = \"queen\\t1900\\t20394\\t3435\\nqueen\\t1901\\t23340\\t2935\\nqueen\\t1902\\t23120\\t3035\"\n", "print(google_ngram)\n", "google_ngram = google_ngram.split(\"\\n\")\n", "print(google_ngram)\n", "first_line = google_ngram[0]\n", "print(first_line.split('\\t'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**`join()` is the reverse of `split()`**\n", "\n", "The reverse of the `split()` function can be accomplished with `join()`, it turns a list into a **string**, with a specific 'delimiter' or the string you want to use to join the items." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "observation = ['queen', '1900', '20394', '3435']\n", "delimiter = ', '\n", "csv_string = delimiter.join(observation)\n", "print(csv_string)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous chapter, we argued that variables operate as \"boxes\"--you put a value in there, to save it for later. Until now the box could only contain **one item**, a string or a number. Lists expand the possibilities, they serve as \"containers\". **With lists, you can stuff your box with as many different elements as you'd like**. Let's have a look at how this works." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.2 Creating a list--the basic rules " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To store an empty list in variable `x`, simply assign `x` to ``[]`` (square brackets)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# create an empty list\n", "x = []" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also create lists with some content: enclose the individual items within square brackets, separated by a comma." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "my_grades = [8,9,6,7]\n", "print(my_grades)\n", "my_garbage = ['Potatoe',[1,2,3],9.03434,'frogs']\n", "print(my_garbage)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### General rules:\n", "* Lists are surrounded by square brackets and the elements in the list are separated by commas\n", "* A list element can be **any Python object** - even another list (e.g. * List can be an collection of numbers, strings, floats (or a combination thereof))\n", "* A list can store values with different types\n", "* A list can be empty" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.3 List operations\n", "Python allows at least the following very useful list operations:\n", "\n", "Arithmetic operators:\n", "* **concatenation**\n", "* **repetition**\n", "\n", "but also includes comparison and membership operators (more about this in the next Notebook)!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Arithmetic operators" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Similar to strings, Python comes with specific operations (``*`` and ``+``) that you can apply to a list." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "The ``+`` operator **concatenates** lists:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "a = [1, 2, 3]\n", "b = [4, 5, 6]\n", "c = a + b\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Similarly, the ``*`` operator repeats a list for a given number of times:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# First example of the * operator\n", "print([0]*4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# First example of the * operator\n", "a = ['spam','Spam','SPAMMM']\n", "b = a * 5\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first example multiplies the single-item-list four times. The second repeats the list with typographic variations on the word 'spam' five times." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "You can use lists in **membership boolean expressions** (See Notebook 2.2). The `in` operator checks whether the items 'God' appears in the variable `bible`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "bible = ['In', 'the', 'beginning', 'God', 'created', 'the', 'heaven', 'and', 'the', 'earth.']\n", "print('God' in bible)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is often useful for checking if a text contains a specific word. For sure, the line below returns the same result." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "'God' in 'In the beginning God created the heaven and the earth.'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But it matches the string **not** the \"word\"!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "print('God' in 'My Godess created the heaven and the earth.')\n", "print('God' in ['My', 'Godess', 'created', 'the', 'heaven', 'and', 'the', 'earth.'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.4 Indexing, slicing and replacing\n", "\n", "**Indexing** and **slicing** works the same way as with strings. Every item in the list has hence its own index number. We start counting at 0! The indices for our\n", "list ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn'] are as follows:\n", "\n", "\n", "J.S. Bach|W.A. Mozart|F. Mendelssohn\n", "---|---\n", "0|1|2\n", "-3|-2|-1\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[VU] We can hence use this index number to extract items from a list (just as with strings)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']\n", "print(composer_list[0])\n", "print(composer_list[1])\n", "print(composer_list[2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Obviously, we can also use **negative indices**:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']\n", "print(composer_list[-1])\n", "print(composer_list[-2])\n", "print(composer_list[-3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can extract one part of a list using **slicing**:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']\n", "list_with_less_composers = composer_list[:2]\n", "print(list_with_less_composers)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "A common error is to retrieve elements by indices greater than the length of the list (minus -1)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "print(composer_list[5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `IndexError` tells you that it could not find an items at position five, as the range of the positions only goes from 0 till 2." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Index notation can be used to replace elements in the list. Let's say, we want to get rid of Mozart (at position 1) en replace him with Elvis. As lists are **mutable** you replace the items." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']\n", "composer_list[1] = 'Elvis P.'\n", "print(composer_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, a **slice operator** on the left side of an assignment can update multiple elements:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']\n", "composer_list[1:] = ['L. van Beethoven','A. Webern']\n", "print(composer_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Intermezzo: Mutability" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The fact that you were able to replace an element by index (as in the above cell) relates to the mutability of lists. For example, performing a similar manipulation on a string object, will cause a `TypeError`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "misspelled = 'Pythvn'\n", "misspelled[4] = 'o'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we convert the string to a list we can get rid of this naughty typo." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "misspelled = 'Pythvn'\n", "print(misspelled)\n", "misspelled = list(misspelled)\n", "print(misspelled)\n", "misspelled[4] = 'o'\n", "print(misspelled)\n", "misspelled = ''.join(misspelled)\n", "print(misspelled)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In short: lists are **mutable**--you can manipulate the content of list variables--whereas strings are not. Question: can you predict whether the following code raises an error?\n", "\n", "### End of Intermezzo, back to indexing" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "The more general slicing notation have the form list[start:stop:step]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "till_twenty = list(range(0,21))\n", "print(till_twenty)\n", "\n", "# Exercise: how can we print all even numbers?\n", "evens = till_twenty[]\n", "print(evens)\n", "\n", "# Exercise: how can we print all odd numbers?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.5 List methods\n", "\n", "As lists are mutable, they provide a much more **flexible** data type. Lists come with specific **methods**, a set of powerful tools that Python already pre-cooked for you. These tools help you with building and manipulating lists.\n", "\n", "### Adding items to a list\n", "Most of the crucial list functionalities are provided by the inbuilt list **methods**: functions attached to the list object. For an overview of the available methods run the code below (scroll down, for this course you can ignore the methods starting and ending with double underscores.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "writers_list = []\n", "print(type(a_list))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We learn, unsurprisingle to that the variable a_list is of type `list`. Let's inspect the functionalities Python provides for working with lists." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "help(list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.5.1 append() and extend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**``append()`` and ``extend()`` extend the list with other values**\n", "\n", "The first method we encounter is ``append``. To see what this method does use the same `help` function as before" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "help(list.append)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`append` is a method that **adds new items** to the end of a list. It has one positional argument and **returns `None`** (we come back to this a few blocks below)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "composer_list = ['J.S. Bach', 'W.A. Mozart', 'F. Mendelssohn']\n", "print(composer_list)\n", "composer_list.append('L. van Beethoven')\n", "print(composer_list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Exercise: add some other composers to the composer_list here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Python help functionalities helps you exploring the methods attached to an object. \n", "\n", "**Exercise**: find out what the method ``extend`` does, and how to apply it to the writers_list." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on method_descriptor:\n", "\n", "extend(...)\n", " L.extend(iterable) -> None -- extend list by appending elements from the iterable\n", "\n" ] } ], "source": [ "help(list.extend)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An example of `extend()` is given below:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['J.S. Bach', 'W.A. Mozart', 'B Bartok']\n", "['J.S. Bach', 'W.A. Mozart', 'B Bartok', 'L. van Beethoven', 'F. Mendelssohn']\n" ] } ], "source": [ "composer_list = ['J.S. Bach', 'W.A. Mozart','B Bartok']\n", "print(composer_list)\n", "composer_list.extend(['L. van Beethoven','F. Mendelssohn'])\n", "print(composer_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To understand the difference between `extend()` and `append()`, compare the output above with the result of the print of this cell:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['J.S. Bach', 'W.A. Mozart', 'B Bartok']\n", "['J.S. Bach', 'W.A. Mozart', 'B Bartok', ['L. van Beethoven', 'F. Mendelssohn']]\n" ] } ], "source": [ "composer_list = ['J.S. Bach', 'W.A. Mozart','B Bartok']\n", "print(composer_list)\n", "composer_list.append(['L. van Beethoven','F. Mendelssohn'])\n", "print(composer_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**dot notation in Python**\n", "\n", "Do you get the syntax that goes with the `append()` function? The list we wish to append the item to goes first and we join the `append()` function to this list using a dot (`.`). In between the round brackets that go with the function name, we place the actual string that we wish to add to the list. \n", "\n", "We call such a input **value** an **'argument'** that we **'pass'** to the `append()` function. \n", "\n", "Please reread the previous sentence, to get used to the terminology.\n", "\n", "Make sure that you are familiar with this terminology because you will often come across such terms when you look for help online!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**fruitful and void functions**\n", "\n", "Functions in Python are generally divided into **fruitful** and **void** functions? `append` is a **void** function: similar to `print`, it performs an operation (adds one element to the list) but **returns nothing**. Understanding this distinction may help you tracing bugs in future code." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "a = composer_list.append('J. des Prez')\n", "print(composer_list)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It might be a bit confusing at first that a list method returns None. Please carefully look at the difference between the two following examples. To repeat: Please predict what will be printed in each code snippet below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "a_list = [1, 3, 4]\n", "a_list.append(5)\n", "print(a_list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "a_list = [1, 3, 4]\n", "a_list = a_list.append(5)\n", "print(a_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is important to distinguish between operations that **modify** lists and operations that **create\n", "new lists**. For example, the `append()` method modifies a list, but the `+` operator creates a\n", "new list!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**The `append()` method is especially powerful in a `for` loop.**\n", "\n", "We have a closer look at loops later, but the code below shows a context in which the `append()` method is often applied. For example, we have a .tsv table which lists composers by their country of origin. Imagine, we want **study composers by nationality**. The code below demonstrates how to extract the relevant information from this table." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data = 'Justus Johann Friedrich Dotzauer\\tGermany\\nSaid Rustamov\\tAzerbaijan\\nFlor Alpaerts\\tBelgium\\nPetko Staynov\\tBulgaria\\nTheodor Ludwig Wiesengrund Adorno\\tGermany\\nAnna Amalia, Duchess of Brunswick-Wolfenbüttel\\tGermany'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "print(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As previously shown, we parse this table with the `split()` function. First we the identify the rows (separated by hard returns or \"\\n\") and later the cells within each row (separated by tabs or \"\\t\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rows = data.split('\\n')\n", "print(rows)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rows = data.split('\\n')\n", "# The rows are of type list\n", "print(type(rows))\n", "# But the first element in this list is still a string\n", "print(type(rows[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To process the separate cells, we first create an empty list called `table`. Then we iterate over each row created by `split('\\n')` and split each row by their tab-symbol. The last step converts each row (which is still a string) to a list. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code below makes clear what happens at every iteration in the `for` loop:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.5.2 count()\n", "\n", "Once we collected the information, we can start counting: how many of these composers come from Germany?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "help(list.count)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The *count()* method has one positional argument **value** and returns an integer. As the name suggests, the method returns an integer that represents how often the value occurs in the list." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "countries = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']\n", "print(countries.count('Germany'))\n", "print(countries.count('Belgium'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, try to print the songs which mention a search term twice or more:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.5.3 `sort()` and `sorted()`\n", "\n", "The `sort()` function is a void function that sorts strings in alphabetical and numbers ascending order." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "help(list.sort)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "countries.sort()\n", "print(countries)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "my_grades = [9,8,10,7,9,9]\n", "my_grades.sort()\n", "print(my_grades)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `reverse` argument allows you to sort in ascending (reverse=False) or descending (reverse=True) order." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "my_grades.sort(reverse=True)\n", "print(my_grades)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before evaluating the cell block below, can you guess what resulting order will look like?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "my_grades = [9,8,10,7,9,9]\n", "countries = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']\n", "\n", "grades_and_countries = my_grades + countries\n", "print(grades_and_countries)\n", "\n", "grades_and_countries.sort(reverse=False)\n", "print(grades_and_countries)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unfortunately the standard Python `order()` method is not smart enough to deal with a mixed list (type-wise). As an aside: a convenient solution would be to cast the integers as strings." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "my_grades = ['9','8','10','7','9','9']\n", "countries = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']\n", "\n", "grades_and_countries = my_grades + countries\n", "grades_and_countries.sort(reverse=False)\n", "\n", "print(grades_and_countries)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **`key`** argument allows you to further refine your sorting. As we have not covered yet enough Python concepts for you to properly understand how this works, we leave it for the moment at two examples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Basically, you pass a function for the argument `key`. For example `len` which counts how many items a value contains (the length of a value). If you pass the function `len` as an argument, this will count how many characters each string contains, and order the list by the length of each item." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Sorting by the length of string\n", "countries = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']\n", "countries.sort(key=len,reverse=True)\n", "print(countries)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# get the longest song title about love" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Extra**: Or sort the items by their frequency of occurrence." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Sorting by frequency of occurence\n", "countries_ref = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']\n", "# Actually this is more elegant:\n", "# from collections import Counter\n", "# countries.sort(key=Counter(countries).get,reverse=True)\n", "countries.sort(key=countries_ref.count,reverse=True)\n", "print(countries)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `sorted()`\n", "\n", "`sorted()` is a fruitful function that returns a sorted list:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Azerbaijan', 'Belgium', 'Bulgaria', 'Germany', 'Germany', 'Germany']\n" ] } ], "source": [ "countries_ref = ['Germany', 'Azerbaijan', 'Belgium','Germany', 'Bulgaria', 'Germany']\n", "countries_sorted = sorted(countries_ref)\n", "print(countries_sorted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.5.3 remove()\n", "\n", "Let's assume a collection has grown a lot and we would like to remove some of the items from the list. Python provides the function `remove()` that you can call on a list and which takes as argument the item we would like to remove. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "good_reads = [\"The Hunger games\", \"A Clockwork Orange\", \n", " \"Pride and Prejudice\", \"Water for Elephants\", \"Illias\"]\n", "print(good_reads)\n", "good_reads.remove(\"Water for Elephants\")\n", "print(good_reads)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we try to remove a book that is not in our collection, Python raises an error to signal that something is wrong." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "good_reads.remove(\"White Oleander\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note, however, that `remove()` will only delete the *first* item in the list that is identical to the argument which you passed to the function. Execute the code in the block below and you will see that only the first instance of \"Pride and Prejudice\" gets deleted." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "good_reads = [\"The Hunger games\", \"A Clockwork Orange\", \n", " \"Pride and Prejudice\", \"Water for Elephants\", \"Pride and Prejudice\"]\n", "good_reads.remove(\"Pride and Prejudice\")\n", "print(good_reads)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.6 Nested Lists\n", "\n", "### Tables\n", "\n", "Lists can even contain lists. Below we cover a few useful examples of nested lists." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Matrices\n", " A nested list can also represent a **matrix**, a notion we will often encounter further in this course.\n", "\n", "[FROM Wikipedia] In mathematics, a matrix (plural: matrices) is a rectangular array[1] of numbers, symbols, or expressions, arranged in rows and columns. For example, the dimensions of the matrix below are 2 × 3 (read \"two by three\"), because there are two rows and three columns:\n", "![An example of a matrix](https://wikimedia.org/api/rest_v1/media/math/render/svg/d16330f5f99566fa754114ff04cd176d6185c796)\n", "\n", "We can represent this matrix as a nested list an assign it to the variable `nested_list`." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1, 9, -13], [20, 5, -6]]\n" ] } ], "source": [ "nested_list = [[1,9,-13],[20,5,-6]]\n", "print(nested_list)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "To retrieve elements of the matrix, we use indexing and slicing techniques from the previous course." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "print(nested_list[0])\n", "print(nested_list[0][0])\n", "print(nested_list[1][2])\n", "print(nested_list[0][:-2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To finish this section, here is an overview of the new concepts and functions you have learnt. Go through them and make sure you understand them all.\n", "\n", "- list\n", "- `.split()`\n", "- `.append()`\n", "- `.count()`\n", "- `.remove()`\n", "- `.sort()`\n", "- `sorted()`\n", "- nested lists\n", "- *mutable* versus *immutable*" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }