{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 1 - Find the pattern" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "In this exercise you will look for patterns in text strings. You should express the pattern as a regular expression.\n", "\n", "You will be given a set of strings to match, and another set of strings that your pattern must not match.\n", "\n", "Your pattern should match the whole string, from the beginning to the end.\n", "\n", "**Example:**\n", "\n", "The string \"example\" is matched by the patterns:\n", "\n", "- `example` (the exact pattern)\n", "- `.*` (matches everything)\n", "- `e.*` (matches all words starting in 'e')\n", "- `.*mp.*` (matches all words containing 'mp')\n", "\n", "It is not matched by the patterns:\n", "\n", "- `examples` (fails on the 's' in this pattern)\n", "- `exampl` (does not match the last 'e' in the string)\n", "- `o.*` (only matches words starting with 'o')\n", "- `mp` (only matches the word 'mp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- The script `retester.py` is available in the `downloads` folder. This is an interactive program, which will ask you to come up with patters as in the example above.\n", "- Run it like so: `python3 retester.py`.\n", "- The script will tell you what to do next.\n", "- To stop it: `ctrl+c`\n", "\n", "\n", "\n", "- Already done? Open `retester.py` in your editor and try to understand how it works.\n", " - add more exercises for the patterns you already know\n", " - find more [patterns](https://docs.python.org/3.6/howto/regex.html#regex-howto) and add exercises for them" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Want more?\n", "\n", "There is plenty of interactive online tools for learning to use regular expressions. Here are some:\n", "\n", "- https://regexone.com/\n", "- https://regexcrossword.com\n", "- https://regexr.com/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "------\n", "\n", "### Solutions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NB: For all exercises, there might be other solutions which works just as well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 1**\n", "\n", "*Positive*: `['abc', 'abcd', 'abcde']`\n", "\n", "*Negative*: `['xyz', 'def']`\n", "\n", "\n", "All positive examples starts with 'a'. No negative examples starts with 'a'. After the first 'a' we must allow more characters.\n", "\n", "Possible solutions:\n", "\n", "`a.*`\n", "\n", "`abc.*`\n", "\n", "`abcd?e?`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 2**\n", "\n", "*Positive*: `['abc', 'abbbc', 'abbbbc']`\n", "\n", "*Negative*: `['ac']`\n", "\n", "\n", "All positive examples starts with 'a' and ends with 'c'. Inbetween there are a number of 'b's. The negative example also starts with 'a' and ends with 'c', but contain no 'b's.\n", "\n", "Possible solutions:\n", "\n", "`ab+c`\n", "\n", "`a.+c`\n", "\n", "`abb*c`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 3**\n", "\n", "*Positive*: `['b', 'abbbc', 'abbbbc']`\n", "\n", "*Negative*: `['ac']`\n", "\n", "\n", "All positive examples contains 'b's. There might be something ('a') before the 'b's, and there might be something ('c') after. The negative example contains no 'b's.\n", "\n", "Possible solutions:\n", "\n", "`a?b+c?`\n", "\n", "`.?b+.?`\n", "\n", "`.*b+.*`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 4**\n", "\n", "*Positive*: `['abc', 'adc', 'axc']`\n", "\n", "*Negative*: `['abe', 'ay']`\n", "\n", "\n", "All positive examples starts with 'a' and ends with 'c'. Inbetween there is another letter ('b', 'd' or 'x'). No negative example end with 'c'.\n", "\n", "Possible solutions:\n", "\n", "`a.c`\n", "\n", "`.*c`\n", "\n", "`a[bdx]c`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 5**\n", "\n", "*Positive*: `['abc', 'ac']`\n", "\n", "*Negative*: `['abbc']`\n", "\n", "\n", "All positive examples starts with 'a' and ends with 'c'. Inbetween there might zero or one letter ('b'). The negative examples contains more than one 'b'.\n", "\n", "Possible solutions:\n", "\n", "`ab?c`\n", "\n", "`a.?c`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 6**\n", "\n", "*Positive*: `['a bc', 'a c d']`\n", "\n", "*Negative*: `['abbc']`\n", "\n", "\n", "All positive examples contains a space. The negativ exmaple contains no space.\n", "\n", "Possible solution:\n", "\n", "`.*\\s.*`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 7**\n", "\n", "*Positive*: `['cat', 'hat']`\n", "\n", "*Negative*: `['sat', 'rat', 'mat', 'at', 'gat']`\n", "\n", "\n", "All positive examples ends with 'at'. Before that, either 'c' or 'h' is allowed. The negative examples also ends with 'at', but allow other beginnings.\n", "\n", "Possible solution:\n", "\n", "`[ch]at`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 8**\n", "\n", "*Positive*: `['sat', 'rat', 'mat', 'gat', 'hat']`\n", "\n", "*Negative*: `['cat']`\n", "\n", "\n", "All positive examples ends with 'at'. The may not start with 'c'. The negative examples also ends with 'at', but allow 'c' in the beginning.\n", "\n", "Possible solution:\n", "\n", "`[^c]at`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 9**\n", "\n", "*Positive*: `['barn', 'grain', 'brat', 'sorry']`\n", "\n", "*Negative*: `['ban', 'gain', 'bat', 'soy']`\n", "\n", "\n", "The positive and negative examples are very similar, but all positive ones contain an 'r'. There are no 'r's in the negative examples.\n", "\n", "Possible solution:\n", "\n", "`.*r.*`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 10**\n", "\n", "*Positive*: `['dogs', 'cats', 'horses']`\n", "\n", "*Negative*: `['dog', 'cat', 'mice', 'cow']`\n", "\n", "\n", "Again, the positive and negative examples are very similar, but all positive ones ends with an 's'. No negative examples ends in 's'.\n", "\n", "Possible solution:\n", "\n", "`.*s`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 11**\n", "\n", "*Positive*: `['karlsson', 'carlson', 'carlzon', 'karlson']`\n", "\n", "*Negative*: `['larsson', 'karl', 'carlo']`\n", "\n", "\n", "The positive examples all start with 'k' or 'c'. They end in 'son' or 'zon'. The negative examples may start with 'k' or 'c' and may end with 'son', but they may not do both.\n", "\n", "Possible solutions:\n", "\n", "`[kc].*on`\n", "\n", "`[kc].*[sz]on`\n", "\n", "`[kc]arl[sz]+on`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 12**\n", "\n", "\n", "*Positive*: `['vision', 'explosion', 'fusion']`\n", "\n", "*Negative*: `['station', 'motion', 'region']`\n", "\n", "\n", "The positive examples all end with 'sion', while the negative ones end with 'tion'. Alternatively, we could separate the positives from the negatives by looking at their beginnings.\n", "\n", "Possible solutions:\n", "\n", "`.*sion`\n", "\n", "`.*[^t]ion`\n", "\n", "`[vef].*`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 13**\n", "\n", "*Positive*: `['TAG', 'TAA', 'TGA']`\n", "\n", "*Negative*: `['TCG', 'AGA', 'ACT']`\n", "\n", "\n", "The positive examples (stop codons!) all start with 'T'. After that, only 'A' and 'G' are allowed.\n", "\n", "Possible solution:\n", "\n", "`T[AG]+`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 14**\n", "\n", "*Positive*: `['words', 'letters', 'text']`\n", "\n", "*Negative*: `['not word', 'åå', 'work-shop', '88']`\n", "\n", "\n", "The positive examples only contains letters from the english alphabet. The negative exmaples all contain other characters (spaces, swedish letters, dashes or digits).\n", "\n", "Possible solution:\n", "\n", "`[a-z]+`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise 15**\n", "\n", "*Positive*: `['88', '337', '0']`\n", "\n", "*Negative*: `['elephant', 'two', '.99', '-22']`\n", "\n", "\n", "The positive examples only contains digits. The negative exmaples all contain other characters (letters, punctuation...).\n", "\n", "Possible solution:\n", "\n", "`\\d+`" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }