{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#|hide\n", "from nbdev.showdoc import *\n", "from fastcore.all import *\n", "import numpy as np,numbers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A tour of fastcore" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a (somewhat) quick tour of a few higlights from fastcore." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Documentation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All fast.ai projects, including this one, are built with [nbdev](https://nbdev.fast.ai), which is a full literate programming environment built on Jupyter Notebooks. That means that every piece of documentation, including the page you're reading now, can be accessed as interactive Jupyter notebooks. In fact, you can even grab a link directly to a notebook running interactively on Google Colab - if you want to follow along with this tour, click the link below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "[Open `index` in Colab](https://colab.research.google.com/github/fastai/fastcore/blob/master/nbs/index.ipynb)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "colab_link('index')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The full docs are available at [fastcore.fast.ai](https://fastcore.fast.ai). The code in the examples and in all fast.ai libraries follow the [fast.ai style guide](https://docs.fast.ai/dev/style.html). In order to support interactive programming, all fast.ai libraries are designed to allow for `import *` to be used safely, particular by ensuring that [`__all__`](https://riptutorial.com/python/example/2894/the---all---special-variable) is defined in all packages. In order to see where a function is from, just type it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coll_repr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more details, including a link to the full documentation and source code, use `doc`, which pops up a window with this information:\n", "\n", "```python\n", "doc(coll_repr)\n", "```\n", "\n", "![](images/att_00000.png){width=\"499\"}\n", "\n", "The documentation also contains links to any related functions or classes, which appear like this: `coll_repr` (in the notebook itself you will just see a word with back-ticks around it; the links are auto-generated in the documentation site). The documentation will generally show one or more examples of use, along with any background context necessary to understand them. As you'll see, the examples for each function and method are shown as tests, rather than example outputs, so let's start by explaining that. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "fastcore's testing module is designed to work well with [nbdev](https://nbdev.fast.ai), which is a full literate programming environment built on Jupyter Notebooks. That means that your tests, docs, and code all live together in the same notebook. fastcore and nbdev's approach to testing starts with the premise that all your tests should pass. If one fails, no more tests in a notebook are run.\n", "\n", "Tests look like this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_eq(coll_repr(range(1000), 5), '(#1000) [0,1,2,3,4...]')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's an example from the docs for `coll_repr`. As you see, it's not showing you the output directly. Here's what that would look like:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'(#1000) [0,1,2,3,4...]'" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coll_repr(range(1000), 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, the test is actually showing you what the output looks like, because if the function call didn't return `'(#1000) [0,1,2,3,4...]'`, then the test would have failed.\n", "\n", "So every test shown in the docs is also showing you the behavior of the library --- and vice versa!\n", "\n", "Test functions always start with `test_`, and then follow with the operation being tested. So `test_eq` tests for equality (as you saw in the example above). This includes tests for equality of arrays and tensors, lists and generators, and many more:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_eq([0,1,2,3], np.arange(4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When a test fails, it prints out information about what was expected:\n", "\n", "```python\n", "test_eq([0,1,2,3], np.arange(3))\n", "```\n", "\n", "```\n", "----\n", " AssertionError: ==:\n", " [0, 1, 2, 3]\n", " [0 1 2]\n", "```\n", "\n", "If you want to check that objects are the same type, rather than the just contain the same collection, use `test_eq_type`.\n", "\n", "You can test with any comparison function using `test`, e.g test whether an object is less than:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test(2, 3, operator.lt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can even test that exceptions are raised:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def divide_zero(): return 1/0\n", "test_fail(divide_zero)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "...and test that things are printed to stdout:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_stdout(lambda: print('hi'), 'hi')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Foundations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "fast.ai is unusual in that we often use [mixins](https://en.wikipedia.org/wiki/Mixin) in our code. Mixins are widely used in many programming languages, such as Ruby, but not so much in Python. We use mixins to attach new behavior to existing libraries, or to allow modules to add new behavior to our own classes, such as in extension modules. One useful example of a mixin we define is `Path.ls`, which lists a directory and returns an `L` (an extended list class which we'll discuss shortly):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(#6) [Path('images/mnist3.png'),Path('images/att_00000.png'),Path('images/puppy.jpg'),Path('images/att_00005.png'),Path('images/att_00007.png'),Path('images/att_00006.png')]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = Path('images')\n", "p.ls()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can easily add you own mixins with the `patch` [decorator](https://realpython.com/primer-on-python-decorators/), which takes advantage of Python 3 [function annotations](https://www.python.org/dev/peps/pep-3107/#parameters) to say what class to patch:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@patch\n", "def num_items(self:Path): return len(self.ls())\n", "\n", "p.num_items()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also use `**kwargs` frequently. In python `**kwargs` in a parameter like means \"*put any additional keyword arguments into a dict called `kwargs`*\". Normally, using `kwargs` makes an API quite difficult to work with, because it breaks things like tab-completion and popup lists of signatures. `utils` provides `use_kwargs` and `delegates` to avoid this problem. See our [detailed article on delegation](https://www.fast.ai/2019/08/06/delegation/) on this topic.\n", "\n", "`GetAttr` solves a similar problem (and is also discussed in the article linked above): it's allows you to use Python's exceptionally useful `__getattr__` magic method, but avoids the problem that normally in Python tab-completion and docs break when using this. For instance, you can see here that Python's `dir` function, which is used to find the attributes of a python object, finds everything inside the `self.default` attribute here:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['author', 'cost', 'name', 'price']" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class Author:\n", " def __init__(self, name): self.name = name\n", "\n", "class ProductPage(GetAttr):\n", " _default = 'author'\n", " def __init__(self,author,price,cost): self.author,self.price,self.cost = author,price,cost\n", "\n", "p = ProductPage(Author(\"Jeremy\"), 1.50, 0.50)\n", "[o for o in dir(p) if not o.startswith('_')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking at that `ProductPage` example, it's rather verbose and duplicates a lot of attribute names, which can lead to bugs later if you change them only in one place. `fastcore` provides `store_attr` to simplify this common pattern. It also provides `basic_repr` to give simple objects a useful `repr`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "__main__.ProductPage(author='Jeremy', price=1.5, cost=0.5)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class ProductPage:\n", " def __init__(self,author,price,cost): store_attr()\n", " __repr__ = basic_repr('author,price,cost')\n", "\n", "ProductPage(\"Jeremy\", 1.50, 0.50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the most interesting `fastcore` functions is the `funcs_kwargs` decorator. This allows class behavior to be modified without sub-classing. This can allow folks that aren't familiar with object-oriented programming to customize your class more easily. Here's an example of a class that uses `funcs_kwargs`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hello\n" ] } ], "source": [ "@funcs_kwargs\n", "class T:\n", " _methods=['some_method']\n", " def __init__(self, **kwargs): assert not kwargs, f'Passed unknown args: {kwargs}'\n", "\n", "p = T(some_method = print)\n", "p.some_method(\"hello\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `assert not kwargs` above is used to ensure that the user doesn't pass an unknown parameter (i.e one that's not in `_methods`). `fastai` uses `funcs_kwargs` in many places, for instance, you can customize any part of a `DataLoader` by passing your own methods.\n", "\n", "`fastcore` also provides many utility functions that make a Python programmer's life easier, in `fastcore.utils`. We won't look at many here, since you can easily look at the docs yourself. To get you started, have a look at the docs for `chunked` (remember, if you're in a notebook, type `doc(chunked)`), which is a handy function for creating lazily generated batches from a collection.\n", "\n", "Python's `ProcessPoolExecutor` is extended to allow `max_workers` to be set to `0`, to easily turn off parallel processing. This makes it easy to debug your code in serial, then run it in parallel. It also allows you to pass arguments to your parallel function, and to ensure there's a pause between calls, in case the process you are running has race conditions. `parallel` makes parallel processing even easier to use, and even adds an optional progress bar." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### L" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like most languages, Python allows for very concise syntax for some very common types, such as `list`, which can be constructed with `[1,2,3]`. Perl's designer Larry Wall explained the reasoning for this kind of syntax:\n", "\n", "> In metaphorical honor of Huffman’s compression code that assigns smaller numbers of bits to more common bytes. In terms of syntax, it simply means that commonly used things should be shorter, but you shouldn’t waste short sequences on less common constructs.\n", "\n", "On this basis, `fastcore` has just one type that has a single letter name: `L`. The reason for this is that it is designed to be a replacement for `list`, so we want it to be just as easy to use as `[1,2,3]`. Here's how to create that as an `L`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(#3) [1,2,3]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "L(1,2,3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first thing to notice is that an `L` object includes in its representation its number of elements; that's the `(#3)` in the output above. If there's more than 10 elements, it will automatically truncate the list:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(#20) [5,1,9,10,18,13,6,17,3,16...]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = L.range(20).shuffle()\n", "p" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`L` contains many of the same indexing ideas that NumPy's `array` does, including indexing with a list of indexes, or a boolean mask list:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(#3) [9,18,6]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p[2,4,6]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It also contains other methods used in `array`, such as `L.argwhere`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(#5) [4,7,9,18,19]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p.argwhere(ge(15))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see from this example, `fastcore` also includes a number of features that make a functional style of programming easier, such as a full range of boolean functions (e.g `ge`, `gt`, etc) which give the same answer as the functions from Python's `operator` module if given two parameters, but return a [curried function](https://en.wikipedia.org/wiki/Currying) if given one parameter.\n", "\n", "There's too much functionality to show it all here, so be sure to check the docs. Many little things are added that we thought should have been in `list` in the first place, such as making this do what you'd expect (which is an error with `list`, but works fine with `L`):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(#4) [1,2,3,4]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 + L(2,3,4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transforms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A `Transform` is the main building block of the fastai data pipelines. In the most general terms a transform can be any function you want to apply to your data, however the `Transform` class provides several mechanisms that make the process of building them easy and flexible (see the docs for information about each of these):\n", "\n", "- Type dispatch\n", "- Dispatch over tuples\n", "- Reversability\n", "- Type propagation\n", "- Preprocessing\n", "- Filtering based on the dataset type\n", "- Ordering\n", "- Appending new behavior with decorators\n", "\n", "`Transform` looks for three special methods, encodes, decodes, and setups, which provide the implementation for [`__call__`](https://www.python-course.eu/python3_magic_methods.php), `decode`, and `setup` respectively. For instance:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class A(Transform):\n", " def encodes(self, x): return x+1\n", "\n", "A()(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For simple transforms like this, you can also use `Transform` as a decorator:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@Transform\n", "def f(x): return x+1\n", "\n", "f(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Transforms can be composed into a `Pipeline`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@Transform\n", "def g(x): return x/2\n", "\n", "pipe = Pipeline([f,g])\n", "pipe(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The power of `Transform` and `Pipeline` is best understood by seeing how they're used to create a complete data processing pipeline. This is explained in [chapter 11](https://github.com/fastai/fastbook/blob/master/11_midlevel_data.ipynb) of the [fastai book](https://www.amazon.com/Deep-Learning-Coders-fastai-PyTorch/dp/1492045527), which is [available for free](https://github.com/fastai/fastbook) in Jupyter Notebook format." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "jupytext": { "split_at_heading": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 4 }