{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Unit Tests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview and Principles\n", "Testing is the process by which you exercise your code to determine if it performs as expected. The code you are testing is referred to as the **code under test**. \n", "\n", "There are two parts to writing tests.\n", "1. invoking the code under test so that it is exercised in a particular way;\n", "1. evaluating the results of executing code under test to determine if it behaved as expected.\n", "\n", "The collection of tests performed are referred to as the **test cases**. The fraction of the code under test that is executed as a result of running the test cases is referred to as **test coverage**.\n", "\n", "For dynamical languages such as Python, it's extremely important to have a high test coverage. In fact, you should try to get 100% coverage. This is because little checking is done when the source code is read by the Python interpreter. For example, the code under test might contain a line that has a function that is undefined. This would not be detected until that line of code is executed.\n", "\n", "Test cases can be of several types. Below are listed some common classifications of test cases.\n", "- *Smoke test*. This is an invocation of the code under test to see if there is an unexpected exception. It's useful as a starting point, but this doesn't tell you anything about the correctness of the results of a computation.\n", "- *One-shot test*. In this case, you call the code under test with arguments for which you know the expected result.\n", "- *Edge test*. The code under test is invoked with arguments that should cause an exception, and you evaluate if the expected exception occurrs.\n", "- *Pattern test* - Based on your knowledge of the *calculation* (not implementation) of the code under test, you construct a suite of test cases for which the results are known or there are known patterns in these results that are used to evaluate the results returned.\n", "\n", "Another principle of testing is to limit what is done in a single test case. Generally, a test case should focus on one use of one function. Sometimes, this is a challenge since the function being tested may call other functions that you are testing. This means that bugs in the called functions may cause failures in the tests of the calling functions. Often, you sort this out by knowing the structure of the code and focusing first on failures in lower level tests. In other situations, you may use more advanced techniques called *mocking*. A discussion of mocking is beyond the scope of this lecture.\n", "\n", "## Test-driven development\n", "\n", "A best practice is to develop your tests while you are developing your code. Indeed, one school of thought in software engineering, called **test-driven development**, advocates that you write the tests *before* you implement the code under test so that the test cases become a kind of specification for what the code under test should do.\n", "\n", "**This is how you should approach development going forward in this course.** Write your tests first. They all fail. Write the code for the functions to make the tests pass." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examples of Test Cases\n", "This section presents examples of test cases. The code under test is the calculation of entropy." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Entropy of a set of probabilities\n", "$$\n", "H = -\\sum_i p_i \\log(p_i)\n", "$$\n", "where $\\sum_i p_i = 1$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Code Under Test\n", "def entropy(ps):\n", " if any([(p < 0.0) or (p > 1.0) for p in ps]):\n", " raise ValueError(\"At least one input is out of range [0...1]\")\n", " else:\n", " pass\n", " if not np.isclose(1, np.sum(ps), atol=1e-08):\n", " raise ValueError(\"The list of input probabilities does not sum to 1\")\n", " else:\n", " pass\n", " items = ps * np.log2(ps)\n", " new_items = []\n", " for item in items:\n", " if np.isnan(item):\n", " new_items.append(0)\n", " else:\n", " new_items.append(item)\n", " return np.abs(-np.sum(new_items))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ps = [.8, .2]\n", "#ps = [ 1.00000001, 0]\n", "[(p < 0.0) or (p > 1.0) for p in ps]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Smoke test \n", "\n", "Does the function run when we call it or does it explode in flames and release the magic smoke." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Smoke test\n", "entropy([0.5, 0.5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### One shot test\n", "\n", "We know from previous discussions that when we have 4 states and they are all equally likely, the number of bits required should be 2." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "entropy([.25, .25, .25, .25])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another example is that the entropy of a random variable when there is only one possible outcome is 0, therefore:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "entropy([1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose that all of the probability of a distribution is at one point. An example of this is a coin with two heads. Whenever you flip it, you always get heads. That is, the probability of a head is 1.\n", "\n", "What is the entropy of such a distribution? From the calculation above, we see that the entropy should be $log(1)$, which is 0. This means that we have a test case where we know the result!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "entries = [\n", " [0, [1]],\n", "]\n", "\n", "for entry in entries:\n", " ans = entry[0]\n", " prob = entry[1]\n", " if not np.isclose(entropy(prob), ans):\n", " print(\"Test failed!\")\n", "print (\"Test completed!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**NEAT!** We can use this structure to do a bunch of these types all at once, e.g." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "entries = [\n", " [0, [1]],\n", " [2, [.25, .25, .25, .25]]\n", "]\n", "\n", "for idx, entry in enumerate(entries):\n", " ans = entry[0]\n", " prob = entry[1]\n", " if not np.isclose(entropy(prob), ans):\n", " print(f\"Test {idx+1} failed\")\n", " else:\n", " print(f\"Test {idx+1} passed\")\n", "print (\"Test completed!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question**: What is an example of another one-shot test? (Hint: You need to know the expected result.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Edge tests\n", "\n", "One edge test of interest is to provide an input that is *not* a distribution in that probabilities don't sum to 1. These should generate an exception of type ValueError" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "entropy([.9, .9])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another edge test is when we pass a probability that is out of range." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "entropy([-0.5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Important note for edge tests that raise exceptions!\n", "\n", "You often have to write your tests using `try` and `except` blocks being sure to catch the correct Exception type, e.g." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def test_entropy_parameter_checking():\n", " # first, let's try not summing to 1\n", " try:\n", " entropy([.9, .9])\n", " except (ValueError) as err:\n", " print(\"Test of probability inputs not summing to 1 passed\")\n", " try:\n", " entropy([-0.5])\n", " except (ValueError) as err:\n", " print(\"Test of probability input ranges passed\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_entropy_parameter_checking()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's consider a pattern test. Examining the structure of the calculation of $H$, we consider a situation in which there are $n$ equal probabilities. That is, $p_i = \\frac{1}{n}$.\n", "$$\n", "H = -\\sum_{i=1}^{n} p_i \\log(p_i) \n", "= -\\sum_{i=1}^{n} \\frac{1}{n} \\log(\\frac{1}{n}) \n", "= n (-\\frac{1}{n} \\log(\\frac{1}{n}) )\n", "= -\\log(\\frac{1}{n})\n", "$$\n", "For example, entropy([0.5, 0.5]) should be $-log(0.5)$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Pattern test\n", "def test_equal_probabilities(n):\n", " prob = 1.0/n\n", " ps = np.repeat(prob , n)\n", " if np.isclose(entropy(ps), -np.log2(prob)):\n", " print(f\"Test passed for n = {n}\")\n", " else:\n", " import pdb; pdb.set_trace()\n", " print (f\"Test failed for n = {n}\")\n", " \n", " \n", "# Run a test\n", "test_equal_probabilities(100000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You see that there are many, many cases to test. So far, we've been writing special codes for each test case. We can do better." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testing Data Producing Codes\n", "Much of your python (or R) codes will be creating and/or transforming dataframes. A dataframe is structured like a table with:\n", "\n", "- Columns that have values of the same type\n", "- Rows that have a value for each column\n", "- An index that uniquely identifies a row." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def makeProbabilityMatrix(column_names, nrows):\n", " \"\"\"\n", " Makes a dataframe with the specified column names such that each\n", " cell is a value in [0, 1] and columns sum to 1.\n", " :param list-str column_names: names of the columns\n", " :param int nrows: number of rows\n", " \"\"\"\n", " df = pd.DataFrame(np.random.uniform(0, 1, (nrows, len(column_names))))\n", " df.columns = column_names\n", " for column in df.columns:\n", " df[column] = df[column]/df[column].sum()\n", " return df\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Smoke test" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "makeProbabilityMatrix(['a', 'b'], 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Test 2: Check columns\n", "COLUMNS = ['a', 'b']\n", "df = makeProbabilityMatrix(COLUMNS, 3)\n", "set(COLUMNS) == set(df.columns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "Write a function that tests the following:\n", "- The returned dataframe has the expected columns\n", "- The returned dataframe has the expected rows\n", "- Values in columns are of the correct type and range\n", "- Values in column sum to 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Unittest Infrastructure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are several reasons to use a test infrastructure:\n", "- If you have many test cases (which you should!), the test infrastructure will save you from writing a lot of code.\n", "- The infrastructure provides a uniform way to report test results, and to handle test failures.\n", "- A test infrastructure can tell you about coverage so you know what tests to add.\n", "\n", "We'll be using the `unittest` framework. This is a separate Python package. Using this infrastructure, requires the following:\n", "1. import the unittest module\n", "1. define a class that inherits from unittest.TestCase\n", "1. write methods that run the code to be tested and check the outcomes.\n", "\n", "The last item has two subparts. First, we must identify which methods in the class inheriting from unittest.TestCase are tests. You indicate that a method is to be run as a test by having the method name begin with \"test\".\n", "\n", "Second, the \"test methods\" should communicate with the infrastructure the results of evaluating output from the code under test. This is done by using `assert` statements. For example, `self.assertEqual` takes two arguments. If these are objects for which `==` returns `True`, then the test passes. Otherwise, the test fails." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import unittest\n", "\n", "# Define a class in which the tests will run\n", "class UnitTests(unittest.TestCase):\n", "\n", " # Each method in the class to execute a test\n", " def test_success(self):\n", " self.assertEqual(1, 1)\n", " \n", " def test_success1(self):\n", " self.assertTrue(1 == 1)\n", "\n", " def test_failure(self):\n", " self.assertLess(1, 2)\n", " \n", "suite = unittest.TestLoader().loadTestsFromTestCase(UnitTests)\n", "_ = unittest.TextTestRunner().run(suite)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Function the handles test loading" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Code for homework or your work should use test files.** In this lesson, we'll show how to write test codes in a Jupyter notebook. This is done for pedidogical reasons. It is **NOT** not something you should do in practice, except as an intermediate exploratory approach. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected, the first test passes, but the second test fails." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "- Rewrite the above one-shot test for entropy using the unittest infrastructure." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Implementating a pattern test. Use functions in the test.\n", "import unittest\n", "\n", "# Define a class in which the tests will run\n", "class TestEntropy(unittest.TestCase):\n", " \n", " def test_equal_probability(self):\n", " def test(count):\n", " \"\"\"\n", " Invokes the entropy function for a number of values equal to count\n", " that have the same probability.\n", " :param int count:\n", " \"\"\"\n", " raise RuntimeError (\"Not implemented.\")\n", " #\n", " test(2)\n", " test(20)\n", " test(200)\n", "\n", "#test_setup(TestEntropy)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import unittest\n", "\n", "# Define a class in which the tests will run\n", "class TestEntropy(unittest.TestCase):\n", " \"\"\"Write the full set of tests.\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testing For Exceptions\n", "\n", "Edge test cases often involves handling exceptions. One approach is to code this directly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import unittest\n", "\n", "# Define a class in which the tests will run\n", "class TestEntropy(unittest.TestCase):\n", " \n", " def test_invalid_probability(self):\n", " try:\n", " entropy([0.1, 0.5])\n", " self.assertTrue(False)\n", " except ValueError:\n", " self.assertTrue(True)\n", " \n", "#test_setup(TestEntropy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`unittest` provides help with testing exceptions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import unittest\n", "\n", "# Define a class in which the tests will run\n", "class TestEntropy(unittest.TestCase):\n", " \n", " def test_invalid_probability(self):\n", " with self.assertRaises(ValueError):\n", " entropy([0.1, 0.5])\n", " \n", "suite = unittest.TestLoader().loadTestsFromTestCase(TestEntropy)\n", "_ = unittest.TextTestRunner().run(suite)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test Files\n", "Although I presented the elements of `unittest` in a notebook. **your tests should be in a file**. If the name of module with the code under test is `foo.py`, then the name of the test file should be `test_foo.py`.\n", "\n", "The structure of the test file will be very similar to cells above. You will import `unittest`. You must also import the module with the code under test. Take a look at `test_prime.py` in this directory to see an example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Discussion\n", "**Question**: What tests would you write for a plotting function?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test Driven Development\n", "Start by writing the tests. Then write the code.\n", "\n", "We illustrate this by considering a function geomean that takes a list of numbers as input and produces the geometric mean on output." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import unittest\n", "\n", "# Define a class in which the tests will run\n", "class TestEntropy(unittest.TestCase):\n", " \n", " def test_oneshot(self):\n", " self.assertEqual(geomean([1,1]), 1)\n", " \n", " def test_oneshot2(self):\n", " self.assertEqual(geomean([3, 3, 3]), 3)\n", " \n", "#test_setup(TestGeomean)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#def geomean(argument?):\n", "# return ?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other infrastructures\n", "- pytest\n", "- nose\n", "- Use binary functions that being with \"test\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "https://www.youtube.com/watch?v=GEqM9uJi64Q (Pydata 2015)\n", "https://www.youtube.com/watch?v=yACtdj1_IxE (Pycon 2017)\n", "\n", "The first talk mentions some packages:\n", "engarde - https://github.com/TomAugspurger/engarde\n", "Hypothesis - https://hypothesis.readthedocs.io/en/latest/\n", "Feature Forge - https://github.com/machinalis/featureforge\n", "\n", "\n", "Detlef Nauck talk: \n", "http://ukkdd.org.uk/2017/info/talks/nauck.pdf\n", "He also had a list of R tools but I could not find the slides form the talk I saw.\n", "\n", "Test Driven Data Analysis:\n", "https://www.youtube.com/watch?v=TGwZnZYg0jw\n", "\n", "Profiling for Pandas:\n", "https://github.com/pandas-profiling/pandas-profiling" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" } }, "nbformat": 4, "nbformat_minor": 2 }